frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The mysterious black fungus from Chernobyl that may eat radiation

https://www.bbc.com/future/article/20251125-the-mysterious-black-fungus-from-chernobyl-that-appea...
161•bookmtn•3h ago•57 comments

Petition to formally recognize open source work as civic service in Germany

https://www.openpetition.de/petition/online/anerkennung-von-open-source-arbeit-als-ehrenamt-in-de...
188•PhilippGille•1h ago•43 comments

Show HN: Glasses to detect smart-glasses that have cameras

https://github.com/NullPxl/banrays
358•nullpxl•9h ago•132 comments

Tech Titans Amass Multimillion-Dollar War Chests to Fight AI Regulation

https://www.wsj.com/tech/ai/tech-titans-amass-multimillion-dollar-war-chests-to-fight-ai-regulati...
93•thm•6h ago•104 comments

A Tale of Four Fuzzers

https://tigerbeetle.com/blog/2025-11-28-tale-of-four-fuzzers/
39•jorangreef•3h ago•6 comments

Pocketbase – open-source realtime back end in 1 file

https://pocketbase.io/
471•modinfo•11h ago•137 comments

Moss: a Rust Linux-compatible kernel in 26,000 lines of code

https://github.com/hexagonal-sun/moss
245•hexagonal-sun•6d ago•46 comments

A Remarkable Assertion from A16Z

https://nealstephenson.substack.com/p/a-remarkable-assertion-from-a16z
146•boplicity•2h ago•65 comments

A Repository with 44 Years of Unix Evolution

https://www.spinellis.gr/pubs/conf/2015-MSR-Unix-History/html/Spi15c.html
56•lioeters•6h ago•11 comments

Show HN: Spikelog – A simple metrics service for scripts, cron jobs, and MVPs

https://spikelog.com
10•dsmurrell•1d ago•3 comments

Atuin’s New Runbook Execution Engine

https://blog.atuin.sh/introducing-the-new-runbook-execution-engine/
18•emschwartz•3d ago•2 comments

EU Council Approves New "Chat Control" Mandate Pushing Mass Surveillance

https://reclaimthenet.org/eu-council-approves-new-chat-control-mandate-pushing-mass-surveillance
444•fragebogen•4h ago•275 comments

Louvre to hike ticket prices for most non-EU tourists by 45%

https://www.bbc.com/news/articles/clyd4llgrego
16•geox•53m ago•8 comments

SQLite as an Application File Format

https://sqlite.org/appfileformat.html
40•gjvc•7h ago•14 comments

A trillion dollars (potentially) wasted on gen-AI

https://garymarcus.substack.com/p/a-trillion-dollars-is-a-terrible
70•flail•2h ago•51 comments

Open (Apache 2.0) TTS model for streaming conversational audio in realtime

https://github.com/nari-labs/dia2
35•SweetSoftPillow•4d ago•4 comments

How to make precise sheet metal parts (photochemical machining) [video]

https://www.youtube.com/watch?v=bR9EN3kUlfg
61•surprisetalk•5d ago•7 comments

Switzerland: Data Protection Officers Impose Broad Cloud Ban for Authorities

https://www.heise.de/en/news/Switzerland-Data-Protection-Officers-Impose-Broad-Cloud-Ban-for-Auth...
66•TechTechTech•3h ago•28 comments

Tiger Style: Coding philosophy (2024)

https://tigerstyle.dev/
96•nateb2022•10h ago•95 comments

Same-day upstream Linux support for Snapdragon 8 Elite Gen 5

https://www.qualcomm.com/developer/blog/2025/10/same-day-snapdragon-8-elite-gen-5-upstream-linux-...
438•mfilion•23h ago•213 comments

OS Malevich – how we made a system that embodies the idea of simplicity (2017)

https://www.ajax-systems.uz/blog/hub-os-malevich-story/
15•frxx•4d ago•1 comments

Vsora Jotunn-8 5nm European inference chip

https://vsora.com/products/jotunn-8/
153•rdg42•16h ago•59 comments

A fast EDN (Extensible Data Notation) reader written in C11 with SIMD boost

https://github.com/DotFox/edn.c
101•delaguardo•4d ago•34 comments

The three thousand year journey of colchicine

https://www.worksinprogress.news/p/the-three-thousand-year-journey-of
25•quadrin•1w ago•3 comments

How to use Linux vsock for fast VM communication

https://popovicu.com/posts/how-to-use-linux-vsock-for-fast-vm-communication/
64•mfrw•10h ago•13 comments

GitLab discovers widespread NPM supply chain attack

https://about.gitlab.com/blog/gitlab-discovers-widespread-npm-supply-chain-attack/
309•OuterVale•23h ago•169 comments

How Charles M Schulz created Charlie Brown and Snoopy (2024)

https://www.bbc.com/culture/article/20241205-how-charles-m-schulz-created-charlie-brown-and-snoopy
157•1659447091•15h ago•78 comments

Implementing Bluetooth LE Audio and Auracast on Linux Systems

https://www.collabora.com/news-and-blog/blog/2025/11/24/implementing-bluetooth-le-audio-and-aurac...
109•losgehts•3d ago•4 comments

250MWh 'Sand Battery' to start construction in Finland

https://www.energy-storage.news/250mwh-sand-battery-to-start-construction-in-finland-for-both-hea...
302•doener•16h ago•226 comments

A programmer-friendly I/O abstraction over io_uring and kqueue (2022)

https://tigerbeetle.com/blog/2022-11-23-a-friendly-abstraction-over-iouring-and-kqueue/
107•enz•16h ago•32 comments
Open in hackernews

Attention Wasn't All We Needed

https://www.stephendiehl.com/posts/post_transformers/
130•mooreds•6mo ago

Comments

andrewmcwatters•6mo ago
I know this probably seems like such a small detail to a lot of people, but I really love that the author adds comments.

I can't stand reading PyTorch or other neural network code and asking myself, "What architecture am I looking at here?" or "What the hell are these operations for?"

It's always like an mash up of reading some published paper code with deep effort behind it along with all the worst programming practices of complete unreadability.

imranq•6mo ago
Could you pop your code into an LLM and ask it to write comments for you? I'm not sure how accurate it is though
andrewmcwatters•6mo ago
I've noticed leading models fail to understand what's happening in undocumented neural network code as well, so not yet it seems.
CamperBob2•6mo ago
It may be a reasonable approach if you give the model a lot of clues to start with. Basically tell it everything you do know about the code.

I wouldn't expect miracles from just uploading a big .py file and asking it to add comments.

flebron•6mo ago
This is an excellent summary of these techniques :) I like that every single one comes with an example implementation, with shape comments on the tensors. Thanks Stephen!
kouteiheika•6mo ago
> Let's look at some of the most important ones that have been developed over the years and try to implement the basic ideas as succinctly as possible.

One big architectural tweak that comes to mind and isn't in the article is QK norm: https://arxiv.org/pdf/2010.04245

> Cosine Schedule

A lot (most?) of new training runs actually don't use cosine schedule anymore; instead they keep the learning rate constant and only decay it at the very end, which gives equivalent or better results. See:

https://arxiv.org/pdf/2405.18392 https://arxiv.org/pdf/2404.06395

> There is a highly optimized implementation of AdamW in PyTorch.

A fun tidbit - it's actually not highly optimized from my experience. Imagine my surprise when I reimplemented it in Triton (because I needed to tweak a few things) and I got better performance than the built-in PyTorch implementation.

Scene_Cast2•6mo ago
RE: optimizer performance - any thoughts on heavyball?
kouteiheika•6mo ago
...oh, I didn't know about this library, thanks!

I still probably wouldn't be able to use it because I need a bunch of custom functionality for my optimizers (like for example custom quantization support and incremental gradient accumulation directly in optimizers' state), but I might borrow some of their techniques if they make things even faster.

yorwba•6mo ago
The explanation for Multi-head Latent Attention https://www.stephendiehl.com/posts/post_transformers/#multi-... does not match the definition in the DeepSeek-V2 paper https://arxiv.org/pdf/2405.04434#subsection.2.1

MLA as developed by DeepSeek is a technique to reduce the memory footprint of the KV cache by storing only two vectors of size latent_dim and rope_dim per token and layer, instead of 2 * num_heads vectors of size head_dim. (DeepSeek-V3 has num_heads = 128 and head_dim = 128 vs latent_dim = 512 and rope_dim = 64, so a significant reduction https://arxiv.org/pdf/2412.19437#subsection.4.2 )

What this article describes instead is some kind of two-step attention scheme I haven't seen before and that I think wouldn't work with causal masking (despite mask appearing in the example code) because either you allow an earlier token to attend to a latent that attended to a later token (creating backwards information flow) or the latents can only attend to a limited prefix of the sequence, after which they're frozen and useless. I wonder whether the author dreamed it up himself or whether someone else is actually using this somewhere.

jdeaton•6mo ago
First four things on the list are attention
alanbernstein•6mo ago
The title is a cute shortening of "Attention Is All You Need wasn't all we needed"
empiko•6mo ago
Nice writeup, but regarding title -- I find it fascinating how powerful attention really is. There were some tweaks developedz sure, but if I open Llama 4 code on HugginFace, it is more or less the same code that I've seen there 5 years ago. Despite all the AI hype, we are still just exploiting tech developed in 2015-2020. And despite NeurIPS brandishing 25k papers this year, the innovation rate in deep learning seems to stagnate
kjkjadksj•6mo ago
Too many horseriders, not enough horse breeders.
teleforce•5mo ago
Nice analogy, most probably going to borrow it.
kouteiheika•6mo ago
> There were some tweaks developedz sure, but if I open Llama 4 code on HugginFace, it is more or less the same code that I've seen there 5 years ago.

This is very much true. It's essentially the very same architecture, just tweaked slightly.

I can take the code I've written which implements the original GPT-2, tweak it very minimally (I don't know, maybe 30~40 lines of code changed?) and get Qwen3 which is a state-of-art model released ~3 weeks ago.

Contrary to what you might see when looking at e.g. HuggingFace code where every new architecture needs a new multi-thousand line of code file - that's just a result of an insane amount of copy-pasting and technical debt (although they started to clean it up a little bit lately). I have my own custom implementation which can load weights for ~19 different architectures straight off HuggingFace in like ~2k lines of code. They aren't really all that different.

danpalmer•6mo ago
The Llama models are substantially behind the state of the art, particularly when it comes to efficiency, they’re probably not the best example for adoption of these sorts of techniques.
johnsmith1840•6mo ago
One interesting thought process i've had around these topics is how it's not just attention but all DL methods suffer similar problems.

I truly believe the last step to AGI is solving continual learning. Efficient will always inch up but the "jump" is honestly not in sight.

Maybe attention + (unknown thing) really is all we need.

The thought is interesting because if you extrapolate that all DL models suffer the same class of problems (CL) the solution is implying two possibilities.

1. In the future, AGI level models will be entire new categories sharing little to nothing with methods like attention. (Every part is different like the article suggests)

2. Or (maybe more likely) we will simply build on what we have. If that's true then next generation models in agi like realm will be the same models we have now with one unifying change to all of them.

I previously made a unique transformer model whose every single neuron acted like a decision gate. Every neuron would choose a "computation nueron" before going on. Back prop was modified so that only computation neurons contributed to back prop of the next layer.

It had some interesting properties, the largest being that every token loop through the model was essentially seeing a completely different model. I was/am under the belief that scaling dimensionality == solving CL.

I bring it up because technically this architecture was identical to the transformer. I could drop my special neuron into literally any DL model out there and train.

I believe this kind of advancement is what will be the next generations models. Not a change of the transformer or attention but to the fundamental building blocks of all DL models.

It honestly does feel like attention gets us part of thr AGI equation well enough. It seems to have solved or will soon solve most short term hard problems. Again this is why CL is key, it's the timr comonent no AI method across the board has ever solved.

rusuereboutdat•6mo ago
For the same reason Yann LeCun and everyone else says language won’t lead to AGI, nothing will lead to AGI.

Yann says language models need to be updated with new language to describe new observation.

But that’s not just with language. That’s physics. We cannot solve going to Mars or anything without the process.

But space time is endless and eventually some composition of it will come along the continuous learning machine has no ability to adapt to before it’s destroyed.

We’ve lost the information of the past and merely store simulation. We cannot see all of the future, just reduce to simulation.

Eventually any autonomous thing hits a snag it cannot solve before its destruction because in any reference frame it cannot know all the next best steps and know which past options to eliminate to simplify.

Energy based models will streamline away nonessential state to generating media and making a robot lift a box, like Linux and software like we know, but without 100% accurate data of the past and future (generation of which is impossible) whatever autonomous thing will eventually encounter a problem it never had time to solve and be smashed by the immutable churn of physics.

BriggyDwiggs42•6mo ago
I just… why can’t it adapt over time?
johnsmith1840•6mo ago
Nobody knows.

It's one of those seemingly simple problems to which the solutions imply contradicting answers.

BriggyDwiggs42•6mo ago
I just don’t get why we’re talking about cosmic scales but modern AI tech and not a hypothetical ASI a thousand years out with an iq of 2 million that would actually encounter these limits.
johnsmith1840•6mo ago
Yeah that's not what I was talking about I was only talking about continual learning.

Just hijacked thr comment because my focus is on CL on current systems not the hypothetical.

BriggyDwiggs42•6mo ago
Gotcha sorry. Got the wrong impression.
achierius•6mo ago
Everything you're saying applies to humans too, though. We evolved, learned over time, and are now "AGI".