frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Greppers – fast CLI cheat sheet with instant copy and shareable search

https://www.greppers.com/
25•shellsteady•1h ago•7 comments

Oldest recorded transaction

https://avi.im/blag/2025/oldest-txn/
103•avinassh•5h ago•47 comments

Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5

https://github.com/b4rtaz/distributed-llama/discussions/255
235•b4rtazz•9h ago•85 comments

We hacked Burger King: How auth bypass led to drive-thru audio surveillance

https://bobdahacker.com/blog/rbi-hacked-drive-thrus/
205•BobDaHacker•7h ago•115 comments

The maths you need to start understanding LLMs

https://www.gilesthomas.com/2025/09/maths-for-llms
390•gpjt•3d ago•91 comments

Using Claude Code SDK to reduce E2E test time

https://jampauchoa.substack.com/p/best-of-both-worlds-using-claude
62•jampa•2h ago•46 comments

Anthropic agrees to pay $1.5B to settle lawsuit with book authors

https://www.nytimes.com/2025/09/05/technology/anthropic-settlement-copyright-ai.html?unlocked_art...
872•acomjean•1d ago•662 comments

Processing Piano Tutorial Videos in the Browser

https://www.heyraviteja.com/post/portfolio/piano-reader/
7•catchmeifyoucan•2d ago•1 comments

The World War Two bomber that cost more than the atomic bomb

https://www.bbc.com/future/article/20250829-the-bomber-that-became-ww2s-most-expensive-weapon
49•pseudolus•3d ago•28 comments

AI surveillance should be banned while there is still time

https://gabrielweinberg.com/p/ai-surveillance-should-be-banned
374•mustaphah•6h ago•127 comments

Why language models hallucinate

https://openai.com/index/why-language-models-hallucinate/
81•simianwords•12h ago•89 comments

Europe enters the exascale supercomputing league with Jupiter

https://ec.europa.eu/commission/presscorner/detail/en/ip_25_2029
14•Sami_Lehtinen•22m ago•0 comments

The life-changing Sarah Paine framework

https://www.valstech.blog/p/the-life-changing-sarah-paine-framework
20•ashia•2d ago•3 comments

Baby's first type checker

https://austinhenley.com/blog/babytypechecker.html
40•alexmolas•3d ago•8 comments

Normalization of deviance (2015)

https://danluu.com/wat/
28•tyleo•1h ago•5 comments

GigaByte CXL memory expansion card with up to 512GB DRAM

https://www.gigabyte.com/PC-Accessory/AI-TOP-CXL-R5X4
5•tanelpoder•2h ago•3 comments

Rug pulls, forks, and open-source feudalism

https://lwn.net/SubscriberLink/1036465/e80ebbc4cee39bfb/
221•pabs3•14h ago•95 comments

Our love letter to Internet Relay Chat [video]

https://www.youtube.com/watch?v=6UbKenFipjo
75•zdw•4d ago•39 comments

Speeding up Unreal Editor launch by not spawning unused tooltips

https://larstofus.com/2025/09/02/speeding-up-the-unreal-editor-launch-by-not-spawning-38000-toolt...
189•samspenc•3d ago•78 comments

Kenvue stock drops on report RFK Jr will link autism to Tylenol during pregnancy

https://www.cnbc.com/2025/09/05/rfk-tylenol-autism-kenvue-stock-for-url.html
72•randycupertino•22h ago•211 comments

AI hype is crashing into reality. Stay calm

https://www.businessinsider.com/ai-hype-crashing-into-reality-iphone-openai-2025-9
13•01-_-•1h ago•2 comments

Video Game Blurs (and how the best one works)

https://blog.frost.kiwi/dual-kawase/
246•todsacerdoti•3d ago•37 comments

A Software Development Methodology for Disciplined LLM Collaboration

https://github.com/Varietyz/Disciplined-AI-Software-Development
75•jay-baleine•9h ago•29 comments

996

https://lucumr.pocoo.org/2025/9/4/996/
847•genericlemon24•6h ago•404 comments

The repercussions of missing an Ampersand in C++ and Rust

https://www.nablag.com/rust_cpp_missing_ampersand
61•nablags•4d ago•56 comments

Purposeful animations

https://emilkowal.ski/ui/you-dont-need-animations
499•jakelazaroff•1d ago•126 comments

The Universe Within 12.5 Light Years

http://www.atlasoftheuniverse.com/12lys.html
245•algorithmista•21h ago•166 comments

Novel hollow-core optical fiber transmits data faster with record low loss

https://phys.org/news/2025-09-hollow-core-optical-fiber-transmits.html
123•Wingy•2d ago•57 comments

Patterns, Predictions, and Actions – A story about machine learning

https://mlstory.org/
6•vinhnx•3h ago•0 comments

GLM 4.5 with Claude Code

https://docs.z.ai/guides/llm/glm-4.5
175•vincirufus•19h ago•77 comments
Open in hackernews

Understanding Transformers Using a Minimal Example

https://rti.github.io/gptvis/
291•rttti•3d ago

Comments

aabdel0181•3d ago
very cool!
busymom0•3d ago
I'd also recommend another article on this topic of LLMs discussed a few days ago. I read it to the finish line and understood everything fully:

> How can AI ID a cat?

https://news.ycombinator.com/item?id=44964800

xwowsersx•2d ago
So glad you shared this. Super accessible without diluting. Thank you!
CGMthrowaway•3d ago
Honest feedback - I was really excited when I read the opening. However, I did not come away from this without a greater understanding than I already had.

For reference, my initial understanding was somewhat low: basically I know a) what embedding is basically b) transformers work by matrix multiplication, and c) it's something like a multi-threaded Markov chain generator with the benefit of prior-trained embeddings

onename•2d ago
Have you checked out this video from 3Blue1Brown that talks bit about transformers?

https://youtu.be/wjZofJX0v4M

CGMthrowaway•2d ago
I've seen it but I don't believe I've watched it all the way through. I will now
imtringued•2d ago
I personally would rather recommend people to just look at these architectural diagrams [0] and try to understand them. There is the caveat that they do not show how attention works. For that you need to understand softmax(QK^T)V and multi head attention being a repetition of this multiple times. GQA, MHA, etc just messes around with reusing Q or K or V in clever ways.

[0] https://huggingface.co/blog/vtabbott/mixtral

rhdunn•2d ago
There's also various videos by Welch Labs that are very good. -- https://www.youtube.com/@WelchLabsVideo/videos
nikki93•2d ago
Pasting a comment I posted elsewhere:

Resources I’ve liked:

Sebastian Raschka book on building them from scratch

Deep Learning a Visual Approach

These videos / playlists:

https://youtube.com/playlist?list=PLoROMvodv4rOY23Y0BoGoBGgQ... https://youtube.com/playlist?list=PLoROMvodv4rOwvldxftJTmoR3... https://youtube.com/playlist?list=PL7m7hLIqA0hoIUPhC26ASCVs_... https://www.youtube.com/live/uIsej_SIIQU?si=RHBetDNa7JXKjziD

here’s a basic impl that i trained on tinystories to decent effect: https://gist.github.com/nikki93/f7eae83095f30374d7a3006fd5af... (i used claude code a lot to help with the above bc a new field for me) (i did this with C and mlx before but ultimately gave into the python lol)

but overall it boils down to:

- tokenize the text

- embed tokens (map each to a vector) with a simple NN

- apply positional info so each token also encodes where it is

- do the attention. this bit is key and also very interesting to me. there are three neural networks: Q, K, V – that are applied to each token. you then generate a new sequence of embeddings where each position has the Vs of all tokens added up weighted by the Q of that position dot’d with the K of the other position. the new embeddings are /added/ to the previous layer (adding like this is called ‘residual’)

- also do another NN pass without attention, again adding the output (residual) there’s actually multiple ‘heads’ each with a different Q, K, V – their outputs are added together before that second NN pass

there’s some normalization at each stage to keep the numbers reasonable and from blowing up

you repeat the attention + forward blocks many times, then the last embedding in the final layer output is what you can sample based on

i was surprised by how quickly this just starts to generate coherent grammar etc. having the training loop also do a generation step to show example output at each stage of training was helpful to see how the output qualitatively improves over time, and it’s kind of cool to “watch” it learn.

this doesn’t cover MoE, sparse vs dense attention and also the whole thing about RL on top of these (whether for human feedback or for doing “search with backtracking and sparse reward”) – i haven’t coded those up yet just kinda read about them…

now the thing is – this is a setup for it to learn some processes spread among the weights that do what it does – but what those processes are seems still very unknown. “mechanistic interpretability” is the space that’s meant to work on that, been looking into it lately.

hunter2_•2d ago
Similarly, I was really excited when I read the headline here on HN and thought this would be about the electrical device. I wonder if the LLM meaning has eclipsed the electrical meaning at this point, as a default in the absence of other qualifiers, in communities like this.
zxexz•2d ago
It does seem to. I’ve been working on some personal projects where I’ve needed to look up and research transformers quite a bit (the kind that often has a ferrite core) and it has been frustrating. Frustrating not just trying to search for the wire datasheets, etc., but also because I often have to use the other transformer via service to find what I’m looking for because search is so enshittified by the newer definition.
quitit•2d ago
I had a similar feeling, I think a little magic was lost by the author trying to be as concise as possible, which is no real fault of their own as it can go down the rabbit hole very quickly.

Instead I believe this might work better as a guided exercise where a person can work on it over a few hours rather than being spoon-fed it over the 10 minute reading time. Or breaking up the steps into "interactive" sections that more clearly demarcate the stages.

Regardless I'm very supportive of people making efforts to simplify this topic, each attempt always gives me something that I either forgot or neglect.

rttti•1d ago
Thanks a lot for your feedback. I like your idea. This matches the pattern that you learn best what you try and experience yourself.
anshumankmr•2d ago
It might be meant for the folks who are not well versed in transformers.
photon_lines•2d ago
If you want my 'intuitive' explanation of how transformers work - you can find it here (if you're a visual learner -- I think you'll like this one) albeit it is a bit long: https://photonlines.substack.com/p/intuitive-and-visual-guid...
CGMthrowaway•1d ago
This I've read before and it was very helpful. It's probably where most of my understanding comes from.

If I'm interpreting it correctly, it sort of validates my intuition that attention heads are "multi-threaded markov chain models" , in other words if autocomplete just looks at level 1, a transformer looks at level 1 for every word in the input plus many layers deeper for every word (or token) in the input.. while bringing a huge pre-training dataset to bear.

If that's correct more or less, something that surprises me is how attention is often treated as some kind of "breakthrough" - it seems obvious to me that improving a markov chain recommendation would involve going deeper and dimensionalizing the context in a deeper way.. the technique appears the same just the amount of analysis is more. I'm not sure what I'm missing here. Perhaps adding those extra layers was a hard problem thta we hadnt figured out how to efficiently do yet (?)

photon_lines•4h ago
So I posted this conversation between Ilya Sutskever (one of the creators of ChatGPT) and Lex Fridman within that blog post and I'll provide it again below because I think it does a good job of summarizing what exactly 'makes transformers work':

  Ilya Sutskever: Yeah, so the thing is the transformer is a combination of multiple ideas simultaneously of which attention is one.

  Lex Friedman: Do you think attention is the key?

  Ilya Sutskever: No, it's a key, but it's not the key. The transformer is successful because it is the simultaneous combination of multiple ideas. And if you were to remove either idea, it would be much less successful. So the transformer uses a lot of attention, but attention existed for a few years. So that can't be the main innovation. The transformer is designed in such a way that it runs really fast on the GPU. And that makes a huge amount of difference. This is one thing. The second thing is that transformer is not recurrent. And that is really important too, because it is more shallow and therefore much easier to optimize. So in other words, it uses attention, it is a really great fit to the GPU and it is not recurrent, so therefore less deep and easier to optimize. And the combination of those factors make it successful.
I'm not sure if the above answers your question, but I tend to think of transformers more-of as 'associative' networks (similar to humans) -- they miss many of the components which actually makes humans human (like imitation learning and consciousness (we still don't know what consciousness actually is)) but for the most part, the general architecture and the way they 'learn' I believe mimics a process similar to how regular humans learn: neurons the fire together, wire together (i.e. associative learning). This is what a huge large-language model is to me: a giant auto-associative network that can comprehend and organize information.
rttti•1d ago
Author here. Thanks a lot for the honest feedback. It makes me realize that the title might have been overselling. While this project was a milestone on my personal learning journey, the article does not offer the same experience to the reader. Reading experience design is what I probably should put more focus on in my next writing.
neuroelectron•2d ago
For me, I feel like this could use a little bit more explanation. It's brief and the grammar or cadence is very clunky.
rttti•1d ago
Thanks a lot for the feedback! Highly appreciated.
meindnoch•2d ago
I'd be surprised if anyone understood transformers from this.
runamuck•2d ago
I love how you represent each token in the form of five stacked boxes, with height, weight etc. depicting different values. Where did you get this amazing idea? I will "steal" it for plotting high dimensionality data.
rttti•1d ago
Great. Would love to learn what you apply it for and how it works out for you.

I think it does not scale well beyond 5 boxes (20 numbers) because the stacks become too complex to remember and identify patterns in. This is me, could be also quite individual.

dpflan•2d ago
Here is another take on visualizing transformers from Georgia Tech researchers: https://poloclub.github.io/transformer-explainer/

Also, the Illustrated Transformer: https://jalammar.github.io/illustrated-transformer/

Also, this HN comment has numerous resources: https://news.ycombinator.com/item?id=35712334

credit_guy•1d ago
Here's the best video I have seen about transformers [1]. It is made by Welch Labs. It talks about DeepSeek and what their main innovation was, but it covers transformers in general too, and I couldn't find any other better description of transformers.

Also, here's an interactive "transformer explainer" that is absolutely mind-blowing [2].

[1] https://www.youtube.com/watch?v=0VLAoVGf_74

[2] https://poloclub.github.io/transformer-explainer/