frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Illustrated Transformer

https://jalammar.github.io/illustrated-transformer/
121•auraham•2h ago•26 comments

Ultrasound Cancer Treatment: Sound Waves Fight Tumors

https://spectrum.ieee.org/ultrasound-cancer-treatment
83•rbanffy•1h ago•23 comments

GLM-4.7: Advancing the Coding Capability

https://z.ai/blog/glm-4.7
125•pretext•2h ago•31 comments

The Garbage Collection Handbook

https://gchandbook.org/index.html
70•andsoitis•2h ago•2 comments

Claude Code gets native LSP support

https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md
205•JamesSwift•5h ago•116 comments

NIST was 5 μs off UTC after last week's power cut

https://www.jeffgeerling.com/blog/2025/nist-was-5-μs-utc-after-last-weeks-power-cut
96•jtokoph•4h ago•51 comments

Scaling LLMs to Larger Codebases

https://blog.kierangill.xyz/oversight-and-guidance
174•kierangill•5h ago•76 comments

The Rise of SQL:the second programming language everyone needs to know

https://spectrum.ieee.org/the-rise-of-sql
49•b-man•4d ago•35 comments

Let's write a toy UI library

https://nakst.gitlab.io/tutorial/ui-part-1.html
95•birdculture•6d ago•14 comments

Things I learnt about passkeys when building passkeybot

https://enzom.dev/b/passkeys/
27•emadda•2h ago•6 comments

Feds demand compromise on Colorado River while states flounder

https://nevadacurrent.com/2025/12/22/feds-demand-compromise-on-colorado-river-states-flounder-des...
5•mooreds•30m ago•0 comments

US blocks all offshore wind construction, says reason is classified

https://arstechnica.com/science/2025/12/us-government-finds-new-excuse-to-stop-construction-of-of...
222•rbanffy•2h ago•171 comments

Your Supabase Is Public

https://skilldeliver.com/your-supabase-is-public
86•skilldeliver•5h ago•38 comments

How do SSDs change database design?

https://brooker.co.za:443/blog/2025/12/15/database-for-ssd.html
13•arn3n•6d ago•1 comments

Uplane (YC F25) Is Hiring Founding Engineers (Full-Stack and AI)

https://www.useparallel.com/uplane1/careers
1•MarvinStarter•4h ago

Vince Zampella, developer of Call of Duty and Battlefield has died

https://comicbook.com/gaming/news/vince-zampella-developer-of-call-of-duty-and-battlefield-dead-a...
55•superpupervlad•1h ago•32 comments

Hybrid Aerial Underwater Drone – Bachelor Project [video]

https://www.youtube.com/watch?v=g7vmPFZrYAk
16•nhma•12h ago•4 comments

Jimmy Lai Is a Martyr for Freedom

https://reason.com/2025/12/19/jimmy-lai-is-a-martyr-for-freedom/
230•mooreds•4h ago•105 comments

The biggest CRT ever made: Sony's PVM-4300

https://dfarq.homeip.net/the-biggest-crt-ever-made-sonys-pvm-4300/
196•giuliomagnifico•8h ago•128 comments

Henge Finder

https://hengefinder.rcdis.co/#learn
30•recursecenter•4h ago•7 comments

Microsoft will finally kill obsolete cipher that has wreaked decades of havoc

https://arstechnica.com/security/2025/12/microsoft-will-finally-kill-obsolete-cipher-that-has-wre...
125•signa11•6d ago•78 comments

Debian's Git Transition

https://diziet.dreamwidth.org/20436.html
162•all-along•13h ago•53 comments

The ancient monuments saluting the winter solstice

https://www.bbc.com/culture/article/20251219-the-ancient-monuments-saluting-the-winter-solstice
155•1659447091•12h ago•84 comments

Programming languages used for music

https://timthompson.com/plum/cgi/showlist.cgi?sort=name&concise=yes
211•ofalkaed•2d ago•82 comments

Universal Reasoning Model (53.8% pass 1 ARC1 and 16.0% ARC 2)

https://arxiv.org/abs/2512.14693
8•marojejian•2h ago•1 comments

State regulators vote to keep utility profits high angering customers across CA

https://www.latimes.com/environment/story/2025-12-18/state-regulators-vote-to-keep-utility-profit...
28•connor11528•2h ago•9 comments

There's no such thing as a fake feather [video]

https://www.youtube.com/watch?v=N5yV1Q9O6r4
61•surprisetalk•4d ago•22 comments

Show HN: Netrinos – A keep it simple Mesh VPN for small teams

https://netrinos.com
73•pcarroll•2d ago•40 comments

Show HN: An easy way of broadcasting radio around you (looking for feedback)

https://github.com/dpipstudio/botwave
24•douxx•5d ago•5 comments

Deliberate Internet Shutdowns

https://www.schneier.com/blog/archives/2025/12/deliberate-internet-shutdowns.html
298•WaitWaitWha•4d ago•155 comments
Open in hackernews

The Illustrated Transformer

https://jalammar.github.io/illustrated-transformer/
121•auraham•2h ago

Comments

profsummergig•2h ago
Haven't watched it yet...

...but, if you have favorite resources on understanding Q & K, please drop them in comments below...

(I've watched the Grant Sanderson/3blue1brown videos [including his excellent talk at TNG Big Tech Day '24], but Q & K still escape me).

Thank you in advance.

red2awn•1h ago
Implement transformers yourself (ie in Numpy). You'll never truly understand it by just watching videos.
D-Machine•1h ago
Seconding this, the terms "Query" and "Value" are largely arbitrary and meaningless in practice, look at how to implement this in PyTorch and you'll see these are just weight matrices that implement a projection of sorts, and self-attention is always just self_attention(x, x, x) or self_attention(x, x, y) in some cases, where x and y are are outputs from previous layers.

Plus with different forms of attention, e.g. merged attention, and the research into why / how attention mechanisms might actually be working, the whole "they are motivated by key-value stores" thing starts to look really bogus. Really it is that the attention layer allows for modeling correlations and/or multiplicative interactions among a dimension-reduced representation.

profsummergig•1h ago
Do you think the dimension reduction is necessary? Or is it just practical (due to current hardware scarcity)?
krat0sprakhar•1h ago
Do you have a tutorial that I can follow?
roadside_picnic•6m ago
The most valuable tutorial will be translating from the paper itself. The more hand holding you have in the process, the less you'll be learning conceptually. The pure manipulation of matrices is rather boring and uninformative without some context.

I also think the implementation is more helpful for understanding the engineering work to run these models that getting a deeper mathematical understanding of what the model is doing.

roadside_picnic•11m ago
I personally don't think implementation is as enlightening as far as really understanding what the model is doing as this statement implies. I had done that many times, but it wasn't until reading about the relationship to kernel methods that it really clicked for me what is really happening under the hood.

Don't get me wrong, implementing attention is still great (and necessary), but even with something as simple as linear regression, implementing it doesn't really give you the entire conceptual model. I do think implementation helps to understand the engineering of these models, but it still requires reflection and study to start to understand conceptually why they are working and what they're really doing (I would, of course, argue I'm still learning about linear models in that regard!)

leopd•1h ago
I think this video does a pretty good job explaining it, starting about 10:30 minutes in: https://www.youtube.com/watch?v=S27pHKBEp30
oofbey•1h ago
As the first comment says "This aged like fine wine". Six years old, but the fundamentals haven't changed.
andoando•1h ago
This wasn't any better than other explanation I've seen.
throw310822•58m ago
Have you tried asking e.g. Claude to explain it to you? None of the usual resources worked for me, until I had a discussion with Claude where I could ask questions about everything that I didn't get.
bobbyschmidd•51m ago
tldr: recursively aggregating packing/unpacking 'if else if (functions)/statements' as keyword arguments that (call)/take them themselves as arguments, with their own position shifting according to the number "(weights)" of else if (functions)/statements needed to get all the other arguments into (one of) THE adequate orders. the order changes based on the language, input prompt and context.

if I understand it all correctly.

implemented it in html a while ago and might do it in htmx sometime soon.

transformers are just slutty dictionaries that Papa Roach and kage bunshin no jutsu right away again and again, spawning clones and variations based on requirements, which is why they tend to repeat themselves rather quickly and often. it's got almost nothing to do with languages themselves and requirements and weights amount to playbooks and DEFCON levels

roadside_picnic•20m ago
It's just a re-invention of kernel smoothing. Cosma Shalizi has an excellent write up on this [0].

Once you recognize this it's a wonderful re-framing of what a transformer is doing under the hood: you're effectively learning a bunch of sophisticated kernels (though the FF part) and then applying kernel smoothing in different ways through the attention layers. It makes you realize that Transformers are philosophically much closer to things like Gaussian Processes (which are also just a bunch of kernel manipulation).

0. http://bactra.org/notebooks/nn-attention-and-transformers.ht...

laser9•1h ago
Here's the comment from the author himself (jayalammar) talking about other good resources on learning Transformers:

https://news.ycombinator.com/item?id=35990118

boltzmann_•1h ago
Kudos also to Transformer Explainer team for putting some amazing visualizations https://poloclub.github.io/transformer-explainer/ It really clicked to me after reading this two and watching 3blue1brown videos
gustavoaca1997•1h ago
I have this book. Really a life savior to help me catching up a few months ago when my team decided to use LLMs in our systems.
qoez•1h ago
Don't really see why you'd need to understand how the transformer works to do LLMs at work. LLMs is just a synthetic human performing reasoning with some failure modes that in-depth knowledge of the transformer interals won't help you predict what they are (just have to use experience with the output to get a sense, or other peoples experiments).
roadside_picnic•41m ago
In my experience this is a substantial difference in the ability to really get performance in LLM related engineering work from people who really understand how LLMs work vs people who think it's a magic box.

If your mental model of an LLM is:

> a synthetic human performing reasoning

You are severely overestimating the capabilities of these models and not realizing potential areas of failure (even if your prompt works for now in the happy case). Understanding how transformers work absolutely can help debug problems (or avoid them in the first place). People without a deep understanding of LLMs also tend to get fooled by them more frequently. When you have internalized the fact that LLMs are literally optimistized to trick you, you tend to be much more skeptical of the initial results (which results in better eval suites etc).

Then there's people who actually do AI engineering. If you're working with local/open weights models or on the inference end of things you can't just play around with an API, you have a lot more control and observability into the model and should be making use of it.

I still hold that the best test of an AI Engineer, at any level of the "AI" stack, is how well they understand speculative decoding. It involves understanding quite a bit about how LLMs work and can still be implemented on a cheap laptop.

amelius•5m ago
But that AI engineer who is implementing speculative decoding is still just doing basic plumbing that has little to do with the actual reasoning. Yes, he/she might make the process faster, but they will know just as little about why/how the reasoning works as when they implemented a naive, slow version of the inference.
Koshkin•42m ago
(Going on a tangent.) The number of transformer explanations/tutorials is becoming overwhelming. Reminds me of monads (or maybe calculus). Someone feels a spark of enlightenment at some point (while, often, in fact, remaining deeply confused), and an urge to share their newly acquired (mis)understanding with a wide audience.
nospice•21m ago
So?

There's no rule that the internet is limited to a single explanation. Find the one that clicks for you, ignore the rest. Whenever I'm trying to learn about concepts in mathematics, computer science, physics, or electronics, I often find that the first or the "canonical" explanation is hard for me to parse. I'm thankful for having options 2 through 10.

kadushka•17m ago
Maybe so, but this particular blog post was the first and is still the best explanation of how transformers work.
ActorNightly•37m ago
People need to get away from this idea of Key/Query/Value as being special.

Whereas a standard deep layer in a network is matrix * input, where each row of the matrix is the weights of the particular neuron in the next layer, a transformer is basically input* MatrixA, input*MatrixB, input*MatrixC (where vector*matrix is a matrix), then the output is C*MatrixA*MatrixB*MatrixC. Just simply more dimensions in a layer.

And consequently, you can represent the entire transformer architecture with a set of deep layers as you unroll the matricies, with a lot of zeros for the multiplication pieces that are not needed.

This is a fairly complex blog but it shows that its just all matrix multiplication all the way down. https://pytorch.org/blog/inside-the-matrix/.

throw310822•22m ago
I might be completely off road, but I can't help thinking of convolutions as my mental model for the K Q V mechanism. Attention has the same property of a convolution kernel of being trained independently of position; it learns how to translate a large, rolling portion of an input to a new "digested" value; and you can train multiple ones in parallel so that they learn to focus on different aspects of the input ("kernels" in the case of convolution, "heads" in the case of attention).
wrsh07•34m ago
Meta: should this have 2018 in the title?
zkmon•1m ago
I think the internal of transformers would become less relevant like internal of compilers, as programmers would only care about how to "use" them instead of how to develop them.