frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Luminal – Open-source, search-based GPU compiler

https://github.com/luminal-ai/luminal
39•jafioti•1h ago
Hi HN, I’m Joe. My friends Matthew, Jake and I are building Luminal (https://luminalai.com/), a GPU compiler for automatically generating fast GPU kernels for AI models. It uses search-based compilation to achieve high performance.

We take high level model code, like you'd have in PyTorch, and generate very fast GPU code. We do that without using LLMs or AI - rather, we pose it as a search problem. Our compiler builds a search space, generates millions of possible kernels, and then searches through it to minimize runtime.

You can try out a demo in `demos/matmul` on mac to see how Luminal takes a naive operation, represented in our IR of 12 simple operations, and compiles it to an optimized, tensor-core enabled Metal kernel. Here’s a video showing how: https://youtu.be/P2oNR8zxSAA

Our approach differs significantly from traditional ML libraries in that we ahead-of-time compile everything, generate a large search space of logically-equivalent kernels, and search through it to find the fastest kernels. This allows us to leverage the Bitter Lesson to discover complex optimizations like Flash Attention entirely automatically without needing manual heuristics. The best rule is no rule, the best heuristic is no heuristic, just search everything.

We’re working on bringing CUDA support up to parity with Metal, adding more flexibility to the search space, adding full-model examples (like Llama), and adding very exotic hardware backends.

We aim to radically simplify the ML ecosystem while improving performance and hardware utilization. Please check out our repo: https://github.com/luminal-ai/luminal and I’d love to hear your thoughts!

Comments

AkashKarnatak•58m ago
Very cool project. Earlier tinygrad used to have ~25 ops but now it has grown to 86 and I believe it is primarily to support hardware feature like tensor core and tma. I don't think luminal supports tensor cores as of now, how do you think the ops will evolve as the library matures.
jafioti•54m ago
we do support tensor cores, but the ops are only part of the search space, so there's virtually no overhead for them. the frontend and main ir is only 12 ops, and we can add hardware-specific ops in to the search space and only add in a bit of code in the codegen pass to support them.
diggan•53m ago
> Luminal can run Q8 Llama 3 8B on M-series Macbooks at 15-25 tokens per second. The goal is to become the fastest ML framework for any model on any device.

Great that some numbers are provided, but in isolation, I'm not sure what they provide. It would be helpful to also share what tok/s you'd get with llama.cpp or something else on the same hardware, so we can actually understand if it's faster or not :) Also including the prompt processing would be a bonus!

jafioti•17m ago
a lot of the search is still being optimized so we don't match super hand-optimized kernels like llama.cpp has, so we def don't match their tps yet, but i want to make a perf tracking page to see improvements over time and prevent regressions
Alifatisk•48m ago
So wait, am I understanding this correctly?

Instead of applying just predetermined optimization rules or patterns, the compiler formulates the problem as searching through many possible configurations or versions of the code. Each possible version can have different arrangements, tiling sizes, thread block configurations, memory access patterns, and instruction sequences, right?

And from my understanding, the “search space” is just a collection of all potential versions of the code (kernels) that the compiler can generate from the original input. So for example, the space might include

- Different ways to partition workloads among GPU threads and blocks

- Varying memory access strategies (using shared memory, global memory)

- Various instruction-level optimizations or reordering

- Alternative loop unroll factors or vectorization strategies

The compiler then programmatically produces a large number of candidate kernels by combining different optimizations and configurations. Among these millions of candidates, the compiler tries to find the one that performs best.

In that case, can the compiler print out which gpu configuration works the best for that computer? And will that configuration be applicable to all computers with the same setup?

This is such an interesting technique.

jakestevens2•22m ago
Your description is exactly right. We create a search space of all possible kernels and find the best ones based on runtime. The best heuristic is no heuristic.

This obviously creates a combinatorial problem that we mitigate with smarter search.

The kernels are run on the computer the compiler is running on. Since runtime is our gold standard it will search for the best configuration for your hardware target. As long as the setup is mostly the same, the optimizations should carry over, yes.

aleinin•31m ago
Cool project! How do you think about targeting hardware-specific ISAs directly? There’s an interesting paper from Citadel (https://arxiv.org/pdf/1804.06826) that highlights inefficiencies in nvcc for the Volta architecture. Do you see Luminal’s search-based paradigm eventually extending beyond outperforming handwritten kernels, towards actually competing with NVIDIA’s compiler optimizations at the PTX level?
jafioti•15m ago
yep! currently we're emitting cuda / metal but once the search is better, i want to directly emit ptx / low-level asm on other hardwares.
dvdplm•13m ago
This is very cool. Do you have any advice on papers to read to understand the details of search based compilation a bit more?
jafioti•7m ago
a lot of the ideas luminal is built on are here: https://arxiv.org/abs/2304.04332

A Climate of Unparalleled Malevolence

https://www.theguardian.com/environment/2025/aug/19/a-climate-of-unparalleled-malevolence-are-we-on-our-way-to-the-sixth-major-mass-extinction
1•anigbrowl•1m ago•0 comments

The theory and practice of selling the Aga cooker (1935) [pdf]

https://comeadwithus.wordpress.com/wp-content/uploads/2012/08/the-theory-and-practice-of-selling-the-aga-cooker.pdf
1•phpnode•2m ago•0 comments

Fine-Tuned Open Sourced Models vs. System Tuned SOTA Models for Customization?

1•Ihmzf•2m ago•0 comments

GSA launches AI sandbox, says it won't be around for long

https://www.theregister.com/2025/08/20/brandnew_government_ai_sandbox_only/
1•rntn•2m ago•0 comments

Show HN: Nestable.dev – local whiteboard app with nestable canvases, deep links

https://nestable.dev/about
1•anorak27•3m ago•0 comments

Show HN: We beat Google DeepMind but got killed by Zhipu AI

https://github.com/minitap-ai/mobile-use
2•orangepomodoro•5m ago•0 comments

We keep fixing symptoms, not root causes

https://oneuptime.com/blog/post/2025-08-21-logs-traces-metrics-before-and-after/view
1•ndhandala•6m ago•0 comments

A refresher on end-to-end API Security

https://wso2.com/library/blogs/securing-apis-with-wso2-api-manager-a-guide-to-end-to-end-api-security/
1•langur•6m ago•0 comments

Modal's custom container runtime, filesystems, and GPU solver

https://www.amplifypartners.com/blog-posts/how-modal-built-a-data-cloud-from-the-ground-up
3•itunpredictable•8m ago•0 comments

Ultra Ethernet's Design Principles and Architectural Innovations

https://arxiv.org/abs/2508.08906
1•giuliomagnifico•10m ago•0 comments

Russian drone fell in eastern Poland

https://www.reuters.com/world/russian-drone-fell-eastern-poland-warsaw-says-2025-08-20/
3•danielam•12m ago•0 comments

Proxy 4: The Next Leap in C++ Polymorphism

https://devblogs.microsoft.com/cppblog/announcing-proxy-4-the-next-leap-in-c-polymorphism/
2•janjones•14m ago•0 comments

I gave Claude Code a folder of tax documents and used it as a tax agent

https://martinalderson.com/posts/building-a-tax-agent-with-claude-code/
1•martinald•15m ago•1 comments

The Dangerous Legal Strategy Coming for Our Books

https://www.theatlantic.com/ideas/archive/2025/08/book-bans-public-schools/683921/
2•littlexsparkee•16m ago•2 comments

Privacy Washing Is a Dirty Business

https://www.privacyguides.org/articles/2025/08/20/privacy-washing-is-a-dirty-business/
3•samuel246•19m ago•0 comments

Can Peanut Allergies Be Cured?

https://www.scientificamerican.com/article/new-treatments-can-free-kids-from-the-deadly-threat-of-peanut-allergy/
1•stevenjgarner•20m ago•0 comments

Show HN: Llmswap v3.0 – CLI and SDK for OpenAI, Claude, Gemini, Watsonx

https://pypi.org/project/llmswap/
2•sreenathmenon•21m ago•0 comments

Show HN: Turn any study material into practice questions with one photo

https://www.lexielearn.com/en
2•e_patjas•22m ago•0 comments

Show HN: I built an app to track expense temptation

https://app.skipwise.org
1•0xshadow•23m ago•0 comments

FictusVNC – Fake VNC server to serve your images easily

https://github.com/ayebrian/fictusvnc
1•LorenDB•23m ago•0 comments

Economics of RL

https://www.mechanize.work/blog/cheap-rl-tasks-will-waste-compute/
2•Tamaybes•23m ago•0 comments

Google announces Tennessee as site for small modular nuclear reactor

https://www.reuters.com/sustainability/boards-policy-regulation/google-announces-tennessee-site-small-modular-nuclear-reactor-2025-08-18/
2•rbanffy•25m ago•0 comments

Researchers build first 'microwave brain' on a chip – Cornell Chronicle

https://news.cornell.edu/stories/2025/08/researchers-build-first-microwave-brain-chip
1•rbanffy•25m ago•0 comments

Made by Google '25 launch event [video]

https://www.youtube.com/watch?v=JXCXTQIIvM0
2•ChrisArchitect•26m ago•0 comments

Awesome-ricing, tools to help with ricing on Linux

https://github.com/fosslife/awesome-ricing
2•dxs•27m ago•0 comments

DOM-Based Extension Clickjacking

https://marektoth.com/blog/dom-based-extension-clickjacking/
1•_xgw•27m ago•0 comments

The Content Trap

https://9to5tofounder.substack.com/p/no-1-starting-from-scratch
1•dimitrit•29m ago•0 comments

Pixel 10 Phones

https://blog.google/products/pixel/google-pixel-10-pro-xl/
50•gotmedium•30m ago•50 comments

Misinformation Rises, Climate Fades; Global Risk Is Now a Popularity Contest

https://www.pewresearch.org/global/2025/08/19/international-opinion-on-global-threats/
5•bdev12345•31m ago•1 comments

NSA's Acting Director Tried to Save Top Scientist from Purge

https://www.nytimes.com/2025/08/20/us/politics/security-clearances-scientist-fired.html
10•_tk_•35m ago•3 comments