frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Setting Up the AWS SDK for Rust

https://rup12.net/posts/learning-rust-configuring-the-aws-sdk/
1•ruptwelve•4m ago•0 comments

List of Programmers

https://en.wikipedia.org/wiki/List_of_programmers
1•andsoitis•7m ago•0 comments

How to Submit a ChatGPT App

https://www.adspirer.com/blog/how-to-submit-chatgpt-app
1•amekala•9m ago•0 comments

AI Feynman: A physics-inspired method for symbolic regression (2020)

https://www.science.org/doi/pdf/10.1126/sciadv.aay2631
1•lisper•10m ago•0 comments

The Comprehensive Cognition Blog

https://mateolafalce.github.io/
2•lafalce•11m ago•0 comments

Blasts from the past: The Soviet ape-man scandal (2008)

https://www.newscientist.com/article/mg19926701-000-blasts-from-the-past-the-soviet-ape-man-scandal/
1•cwwc•15m ago•0 comments

Call of Duty Co-Creator and EA Executive Vince Zampella Killed in Car Accident

https://www.ign.com/articles/call-of-duty-co-creator-respawn-co-founder-and-ea-executive-vince-za...
3•andsoitis•16m ago•1 comments

Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

https://github.com/QwenLM/Qwen-Image-Layered
1•_____k•16m ago•0 comments

It's Always TCP_nodelay

https://brooker.co.za/blog/2024/05/09/nagle.html
2•eieio•16m ago•0 comments

Write code that you can understand when you get paged at 2am (2024)

https://www.pcloadletter.dev/blog/clever-code/
1•birdculture•17m ago•0 comments

The Solar System Loses an Ocean World

https://www.universetoday.com/articles/the-solar-system-loses-an-ocean-world
1•rbanffy•17m ago•0 comments

Anatomy of a Coding Agent: A step-by-step illustration

https://marginlab.ai/blog/anatomy-of-coding-agent/
1•qwesr123•18m ago•0 comments

When Were Things the Best?

https://thezvi.substack.com/p/when-were-things-the-best
1•paulpauper•18m ago•0 comments

Cultural Variety Is Crazy Hard to Fix

https://www.overcomingbias.com/p/cultural-variety-is-crazy-hard-to
1•paulpauper•19m ago•0 comments

2025 – Immich's Year in Review

https://immich.app/blog/2025-year-in-review
3•altran1502•19m ago•0 comments

Is the golden age of Indie software over?

https://successfulsoftware.net/2025/12/22/is-the-golden-age-of-indie-software-over/
1•hermitcrab•20m ago•0 comments

Reolink RLC-410-5MP IP camera reverse engineered technical details

https://github.com/hn/reolink-camera
1•hn___•21m ago•0 comments

Starlink in the crosshairs: How Russia could attack Elon Musk's conquer of space

https://www.adn.com/nation-world/2025/12/22/starlink-in-the-crosshairs-how-russia-could-attack-el...
2•rolph•22m ago•0 comments

Production Materials: Gotham City Streets Set from "Batman Returns" (2014)

http://www.1989batman.com/2014/12/production-materials-warner-brothers.html
2•exvi•23m ago•0 comments

Feds demand compromise on Colorado River while states flounder

https://nevadacurrent.com/2025/12/22/feds-demand-compromise-on-colorado-river-states-flounder-des...
2•mooreds•23m ago•0 comments

Biohybrid Tendons Enhance the Power-to-Weight Ratio and Modularity of Robots

https://advanced.onlinelibrary.wiley.com/doi/10.1002/advs.202512680
2•PaulHoule•24m ago•0 comments

Snowflake Postgres Is Now Available in Public Preview

https://www.snowflake.com/en/engineering-blog/postgres-public-preview/
1•craigkerstiens•24m ago•0 comments

My 2025 in Production in Review

https://newsletter.vickiboykis.com/archive/my-2025-in-production-in-review/
1•mooreds•25m ago•0 comments

Ifrro member Kopinor signs agreement on newspaper content for AI in Norway

https://ifrro.org/page/article-detail/ifrro-member-kopinor-signs-historic-agreement-on-newspaper-...
1•amarble•26m ago•0 comments

Production Materials: The Pinewood Studios Gotham City Set from "Batman" (2013)

http://www.1989batman.com/2013/04/production-materials-pinewood-studios.html
1•exvi•27m ago•0 comments

AI and Labor Markets: What We Know and Don't Know – Stanford Digital Economy Lab

https://digitaleconomy.stanford.edu/news/ai-and-labor-markets-what-we-know-and-dont-know/
2•sizzle•27m ago•0 comments

Recent discoveries on the acquisition of the highest levels of human performance

https://www.science.org/doi/10.1126/science.adt7790
1•DebugDruid•28m ago•0 comments

VibeLang is a programming language for making music

https://vibelang.org/
1•hmokiguess•28m ago•2 comments

The Architecture of Batman '89 (2019)

https://crookedmarquee.com/the-architecture-of-batman-89/
2•exvi•31m ago•0 comments

Show HN: Corli – an RPG-style productivity app for building real habits

https://corli.app
1•zipqt•31m ago•0 comments
Open in hackernews

The Illustrated Transformer

https://jalammar.github.io/illustrated-transformer/
109•auraham•2h ago

Comments

profsummergig•1h ago
Haven't watched it yet...

...but, if you have favorite resources on understanding Q & K, please drop them in comments below...

(I've watched the Grant Sanderson/3blue1brown videos [including his excellent talk at TNG Big Tech Day '24], but Q & K still escape me).

Thank you in advance.

red2awn•1h ago
Implement transformers yourself (ie in Numpy). You'll never truly understand it by just watching videos.
D-Machine•1h ago
Seconding this, the terms "Query" and "Value" are largely arbitrary and meaningless in practice, look at how to implement this in PyTorch and you'll see these are just weight matrices that implement a projection of sorts, and self-attention is always just self_attention(x, x, x) or self_attention(x, x, y) in some cases, where x and y are are outputs from previous layers.

Plus with different forms of attention, e.g. merged attention, and the research into why / how attention mechanisms might actually be working, the whole "they are motivated by key-value stores" thing starts to look really bogus. Really it is that the attention layer allows for modeling correlations and/or multiplicative interactions among a dimension-reduced representation.

profsummergig•1h ago
Do you think the dimension reduction is necessary? Or is it just practical (due to current hardware scarcity)?
krat0sprakhar•1h ago
Do you have a tutorial that I can follow?
leopd•1h ago
I think this video does a pretty good job explaining it, starting about 10:30 minutes in: https://www.youtube.com/watch?v=S27pHKBEp30
oofbey•1h ago
As the first comment says "This aged like fine wine". Six years old, but the fundamentals haven't changed.
andoando•1h ago
This wasn't any better than other explanation I've seen.
throw310822•51m ago
Have you tried asking e.g. Claude to explain it to you? None of the usual resources worked for me, until I had a discussion with Claude where I could ask questions about everything that I didn't get.
bobbyschmidd•44m ago
tldr: recursively aggregating packing/unpacking 'if else if (functions)/statements' as keyword arguments that (call)/take them themselves as arguments, with their own position shifting according to the number "(weights)" of else if (functions)/statements needed to get all the other arguments into (one of) THE adequate orders. the order changes based on the language, input prompt and context.

if I understand it all correctly.

implemented it in html a while ago and might do it in htmx sometime soon.

transformers are just slutty dictionaries that Papa Roach and kage bunshin no jutsu right away again and again, spawning clones and variations based on requirements, which is why they tend to repeat themselves rather quickly and often. it's got almost nothing to do with languages themselves and requirements and weights amount to playbooks and DEFCON levels

roadside_picnic•13m ago
It's just a re-invention of kernel smoothing. Cosma Shalizi has an excellent write up on this [0].

Once you recognize this it's a wonderful re-framing of what a transformer is doing under the hood: you're effectively learning a bunch of sophisticated kernels (though the FF part) and then applying kernel smoothing in different ways through the attention layers. It makes you realize that Transformers are philosophically much closer to things like Gaussian Processes (which are also just a bunch of kernel manipulation).

0. http://bactra.org/notebooks/nn-attention-and-transformers.ht...

laser9•1h ago
Here's the comment from the author himself (jayalammar) talking about other good resources on learning Transformers:

https://news.ycombinator.com/item?id=35990118

boltzmann_•1h ago
Kudos also to Transformer Explainer team for putting some amazing visualizations https://poloclub.github.io/transformer-explainer/ It really clicked to me after reading this two and watching 3blue1brown videos
gustavoaca1997•1h ago
I have this book. Really a life savior to help me catching up a few months ago when my team decided to use LLMs in our systems.
qoez•1h ago
Don't really see why you'd need to understand how the transformer works to do LLMs at work. LLMs is just a synthetic human performing reasoning with some failure modes that in-depth knowledge of the transformer interals won't help you predict what they are (just have to use experience with the output to get a sense, or other peoples experiments).
roadside_picnic•34m ago
In my experience this is a substantial difference in the ability to really get performance in LLM related engineering work from people who really understand how LLMs work vs people who think it's a magic box.

If your mental model of an LLM is:

> a synthetic human performing reasoning

You are severely overestimating the capabilities of these models and not realizing potential areas of failure (even if your prompt works for now in the happy case). Understanding how transformers work absolutely can help debug problems (or avoid them in the first place). People without a deep understanding of LLMs also tend to get fooled by them more frequently. When you have internalized the fact that LLMs are literally optimistized to trick you, you tend to be much more skeptical of the initial results (which results in better eval suites etc).

Then there's people who actually do AI engineering. If you're working with local/open weights models or on the inference end of things you can't just play around with an API, you have a lot more control and observability into the model and should be making use of it.

I still hold that the best test of an AI Engineer, at any level of the "AI" stack, is how well they understand speculative decoding. It involves understanding quite a bit about how LLMs work and can still be implemented on a cheap laptop.

Koshkin•35m ago
(Going on a tangent.) The number of transformer explanations/tutorials is becoming overwhelming. Reminds me of monads (or maybe calculus). Someone feels a spark of enlightenment at some point (while, often, in fact, remaining deeply confused), and an urge to share their newly acquired (mis)understanding with a wide audience.
nospice•14m ago
So?

There's no rule that the internet is limited to a single explanation. Find the one that clicks for you, ignore the rest. Whenever I'm trying to learn about concepts in mathematics, computer science, physics, or electronics, I often find that the first or the "canonical" explanation is hard for me to parse. I'm thankful for having options 2 through 10.

kadushka•11m ago
Maybe so, but this particular blog post was the first and is still the best explanation of how transformers work.
ActorNightly•30m ago
People need to get away from this idea of Key/Query/Value as being special.

Whereas a standard deep layer in a network is matrix * input, where each row of the matrix is the weights of the particular neuron in the next layer, a transformer is basically input* MatrixA, input*MatrixB, input*MatrixC (where vector*matrix is a matrix), then the output is C*MatrixA*MatrixB*MatrixC. Just simply more dimensions in a layer.

And consequently, you can represent the entire transformer architecture with a set of deep layers as you unroll the matricies, with a lot of zeros for the multiplication pieces that are not needed.

This is a fairly complex blog but it shows that its just all matrix multiplication all the way down. https://pytorch.org/blog/inside-the-matrix/.

throw310822•15m ago
I might be completely off road, but I can't help thinking of convolutions as my mental model for the K Q V mechanism. Attention has the same property of a convolution kernel of being trained independently of position; it learns how to translate a large, rolling portion of an input to a new "digested" value; and you can train multiple ones in parallel so that they learn to focus on different aspects of the input ("kernels" in the case of convolution, "heads" in the case of attention).
wrsh07•27m ago
Meta: should this have 2018 in the title?