frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

AI-powered text correction for macOS

https://taipo.app/
1•neuling•58s ago•1 comments

AppSecMaster – Learn Application Security with hands on challenges

https://www.appsecmaster.net/en
1•aqeisi•1m ago•1 comments

Fibonacci Number Certificates

https://www.johndcook.com/blog/2026/02/05/fibonacci-certificate/
1•y1n0•3m ago•0 comments

AI Overviews are killing the web search, and there's nothing we can do about it

https://www.neowin.net/editorials/ai-overviews-are-killing-the-web-search-and-theres-nothing-we-c...
2•bundie•8m ago•0 comments

City skylines need an upgrade in the face of climate stress

https://theconversation.com/city-skylines-need-an-upgrade-in-the-face-of-climate-stress-267763
3•gnabgib•9m ago•0 comments

1979: The Model World of Robert Symes [video]

https://www.youtube.com/watch?v=HmDxmxhrGDc
1•xqcgrek2•13m ago•0 comments

Satellites Have a Lot of Room

https://www.johndcook.com/blog/2026/02/02/satellites-have-a-lot-of-room/
2•y1n0•14m ago•0 comments

1980s Farm Crisis

https://en.wikipedia.org/wiki/1980s_farm_crisis
3•calebhwin•14m ago•1 comments

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

https://github.com/skorotkiewicz/fsid
1•modinfo•19m ago•0 comments

Show HN: Holy Grail: Open-Source Autonomous Development Agent

https://github.com/dakotalock/holygrailopensource
1•Moriarty2026•27m ago•1 comments

Show HN: Minecraft Creeper meets 90s Tamagotchi

https://github.com/danielbrendel/krepagotchi-game
1•foxiel•34m ago•1 comments

Show HN: Termiteam – Control center for multiple AI agent terminals

https://github.com/NetanelBaruch/termiteam
1•Netanelbaruch•34m ago•0 comments

The only U.S. particle collider shuts down

https://www.sciencenews.org/article/particle-collider-shuts-down-brookhaven
2•rolph•37m ago•1 comments

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

1•solarisos•37m ago•2 comments

Show HN: Remotion directory (videos and prompts)

https://www.remotion.directory/
1•rokbenko•39m ago•0 comments

Portable C Compiler

https://en.wikipedia.org/wiki/Portable_C_Compiler
2•guerrilla•41m ago•0 comments

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

1•Ginsabo•42m ago•0 comments

Software Engineering Transformation 2026

https://mfranc.com/blog/ai-2026/
1•michal-franc•43m ago•0 comments

Microsoft purges Win11 printer drivers, devices on borrowed time

https://www.tomshardware.com/peripherals/printers/microsoft-stops-distrubitng-legacy-v3-and-v4-pr...
3•rolph•43m ago•1 comments

Lunch with the FT: Tarek Mansour

https://www.ft.com/content/a4cebf4c-c26c-48bb-82c8-5701d8256282
2•hhs•47m ago•0 comments

Old Mexico and her lost provinces (1883)

https://www.gutenberg.org/cache/epub/77881/pg77881-images.html
1•petethomas•50m ago•0 comments

'AI' is a dick move, redux

https://www.baldurbjarnason.com/notes/2026/note-on-debating-llm-fans/
5•cratermoon•51m ago•0 comments

The source code was the moat. But not anymore

https://philipotoole.com/the-source-code-was-the-moat-no-longer/
1•otoolep•51m ago•0 comments

Does anyone else feel like their inbox has become their job?

1•cfata•51m ago•1 comments

An AI model that can read and diagnose a brain MRI in seconds

https://www.michiganmedicine.org/health-lab/ai-model-can-read-and-diagnose-brain-mri-seconds
2•hhs•55m ago•0 comments

Dev with 5 of experience switched to Rails, what should I be careful about?

2•vampiregrey•57m ago•0 comments

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

https://arxiv.org/abs/2601.16429
1•PaulHoule•58m ago•0 comments

Scientists discover “levitating” time crystals that you can hold in your hand

https://www.nyu.edu/about/news-publications/news/2026/february/scientists-discover--levitating--t...
3•hhs•1h ago•0 comments

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

https://www.youtube.com/watch?v=3VReIuv1GFo
1•erickhill•1h ago•0 comments

Tell HN: Yet Another Round of Zendesk Spam

6•Philpax•1h ago•1 comments
Open in hackernews

Compiling models to megakernels

https://blog.luminal.com/p/compiling-models-to-megakernels
35•jafioti•1w ago

Comments

measurablefunc•1w ago
There are only 4 optimizations in computer science: inlining, partial evaluation, dead code elimination, & caching. It looks like AI researchers just discovered inlining & they already knew about caching so eventually they'll get to partial evaluation & dead code elimination.
fragmede•1w ago
Dead code elimination is already a technique in AI when someone takes an MoE model and removes an unused "E" from it.
mxkopy•1w ago
AI actually has some optimizations unique to the field. You can in fact optimize a model to make it work; not a lot of other disciplines put as much emphasis on this as AI
tossandthrow•1w ago
Can you list these optimizations?
mxkopy•1w ago
RLHF is one that comes to mind
tossandthrow•1w ago
Well, this is an entirely other category of optimizations - not program performance but model performance.
lucrbvi•1w ago
Yes, in "runtime optimization" the model is just a computation graph so we can use a lot of well known tricks from compilation like dead code elimination and co..
tossandthrow•1w ago
We are getting closer!

What other optimizations are there that can be used than what explicitly falls into the 4 categories that the top commenter here listed out?

mirekrusin•1w ago
For inference assorted categories may include vectorization, register allocation, scheduling, lock elision, better algos, changing complexity, better data structures, profile guided specialization, layout/alignment changes, compression, quantization/mixed precision, fused kernels (goes beyond inlining), low rank adapters, sparsity, speculative decoding, parallel/multi token decoding, better sampling, prefill/decode separation, analog computation (why not) etc etc.

There is more to it, mentioned 4 categories are not the only ones, they are not even broad categories.

If somebody likes broad categories here is good one: "1s and 0s" and you can compute anything you want, there you go – single category for everything. Is it meaningful? Not really.

tossandthrow•1w ago
Thanks!
johndough•1w ago
Which categories do algorithmic optimizations fall under? For example:

Strassen algorithm for matrix multiplication https://en.wikipedia.org/wiki/Strassen_algorithm

FFT convolution https://dsp.stackexchange.com/a/63211

Winograd convolution https://www.cv-foundation.org/openaccess/content_cvpr_2016/p...

And of course optimization algorithms themselves.

j-pb•1w ago
Partial evaluation on the symbolic structure of the problem.
torginus•1w ago
Don't know about the others, but FFT is the classic case of common subexpression evaluation (its mathematically equivalent), which I think by OPs definition would fall under caching.
imtringued•1w ago
Your list is so short it doesn't even include the basics such as reordering operations.

It also feels incredibly snarky to say "they knew about caching" and that they will get to partial evaluation and dead code elimination, when those seem to be particularly useless (beyond what the CUDA compiler itself does) when it comes to writing GPU kernels or doing machine learning in general.

You can't do any partial evaluation of a neural network because the activation functions are interrupting the multiplication of tensors. If you remove the activation function, then you end up with two linear layers that are equivalent to one linear layer, defeating the point of the idea. You could have trained a network with a single layer instead and achieved the same accuracy with a corresponding shorter training/inference time.

Dead code elimination is even more useless since most kernels are special purpose to begin with and you can't remove tensors without altering the architecture. Instead of adding useless tensors only to remove them, you could have simply used a better architecture.

torginus•1w ago
I think you can. If you have a neuron whose input weights are 100,-1,2, with threshold 0, you can know the output of the neuron if the first input is enabled, as the other 2 dont matter, so you can skip evaluating those.

I'm not enough of an expert to see if there's any actualy merit to this idea, and if you can skip evaluating huge parts of the network and keeping track of such evaluations, is actually worth it, but it intuitively makes sense to me that making an omelette has nothing to do with the Battle of Hastings, so when making a query about the former, the neurons encoding the latter might not affect the output.

Afaik, there's already research into finding which network weight encode which concepts.

MOE is a somewhat cruder version of this technique.

direwolf20•1w ago
Model pruning is dead code elimination
jafioti•1w ago
That's a bit trite tbh. We all know of these techniques, but actually implementing them on GPUs in a low-overhead manner that maintains the model's fidelity is challenging. It's much more than just breaking out the old CS book and picking the next idea from there.
geremiiah•1w ago
So if I'm understanding correctly, you decompose kernels into their per_sm_workload, then you figure out per_sm_data_dependency and then you can schedule sm_workloads from the next kernel to start running as soon as the data dependency is satisfied, not needing to wait for the other sms from the previous kernel to finish.

In this case are you're strickly fusing pre defined kernels or are you also optimizing them? Is this complimentary to your earlier work on search-based compilers?

jafioti•1w ago
Thats reasonably accurate, we're fusing both pre-defined operations as well as codegenned operations. Block-level operations live inside the search space, as do kernel, warp and thread level operations. Since it's a unified search space, we can look through tons of combinations of kernel, block, warp, and thread level ops. When we go to compile them to runnable code, thread ops get compiled to warp ops, warp ops get compiled to block ops, block ops get compiled to kernel ops (megakernels live here!), so at the end of the day everything that gets ran is a kernel.

In other words, very complimentary to our search-based approach.