frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Microgpt

http://karpathy.github.io/2026/02/12/microgpt/
225•tambourine_man•2h ago

Comments

tithos•1h ago
What is the prime use case
aaronblohowiak•1h ago
“Art project”
pixelatedindex•1h ago
If writing is art, then I’ve been amazed at the source code written by this legend
geerlingguy•1h ago
Looks like to learn how a GPT operates, with a real example.
foodevl•1h ago
Yeah, everyone learns differently, but for me this is a perfect way to better understand how GPTs work.
antonvs•1h ago
To confuse people who only think in terms of use cases.

Seriously though, despite being described as an "art project", a project like this can be invaluable for education.

keyle•1h ago
it's a great learning tool and it shows it can be done concisely.
jackblemming•1h ago
Case study to whenever a new copy of Programming Pearls is released.
inerte•1h ago
Kaparthy to tell you things you thought were hard in fact fit in a screen.
colonCapitalDee•1h ago
Beautiful work
ViktorRay•1h ago
Which license is being used for this?
dilap•1h ago
MIT (https://gist.github.com/karpathy/8627fe009c40f57531cb1836010...)
ViktorRay•43m ago
Thank you
fulafel•1h ago
This could make an interesting language shootout benchmark.
profsummergig•59m ago
If anyone knows of a way to use this code on a consumer grade laptop to train on a small corpus (in less than a week), and then demonstrate inference (hallucinations are okay), please share how.
ThrowawayTestr•57m ago
This is like those websites that implement an entire retro console in the browser.
Paddyz•55m ago
The most interesting thing about this project isn't the GPT itself - it's what it reveals about the gap between "understanding the math" and "understanding the engineering."

I've read the attention paper, watched Karpathy's own YouTube lectures, implemented toy transformers in notebooks. But reading through this C implementation I finally grokked why KV-cache matters at the systems level - because when you see the memory layout spelled out in raw mallocs instead of hidden behind PyTorch abstractions, the O(n^2) vs O(n) tradeoff for cached inference becomes visceral rather than theoretical.

This is the same reason people learn more about operating systems from xv6 than from reading the Linux source. Minimal implementations aren't just educational toys - they're the only way to build intuition about what the abstractions are actually doing. The fact that a full GPT-2 fits in ~1000 lines of C should make every ML engineer uncomfortable about how much complexity their frameworks are hiding from them.

misiti3780•45m ago
agreed - no one else is saying this.
janis1234•40m ago
I found reading Linux source more useful than learning about xv6 because I run Linux and reading through source felt immediately useful. I.e, tracing exactly how a real process I work with everyday gets created.

Can you explain this O(n2) vs O(n) significance better?

Paddyz•37m ago
Sure - so without KV cache, every time the model generates a new token it has to recompute attention over the entire sequence from scratch. Token 1 looks at token 1. Token 2 looks at tokens 1,2. Token 3 looks at 1,2,3. That's 1+2+3+...+n = O(n^2) total work to generate n tokens.

With KV cache you store the key/value vectors from previous tokens, so when generating token n you only compute the new token's query against the cached keys. Each step is O(n) instead of recomputing everything, and total work across all steps drops to O(n^2) in theory but with way better constants because you're not redoing matrix multiplies you already did.

The thing that clicked for me reading the C code was seeing exactly where those cached vectors get stored in memory and how the pointer arithmetic works. In PyTorch it's just `past_key_values` getting passed around and you never think about it. In C you see the actual buffer layout and it becomes obvious why GPU memory is the bottleneck for long sequences.

tadfisher•9m ago
Are you hallucinating or am I? This implementation is 200 lines of Python. Did you mean to link to a C version?
nnoremap•6m ago
Its slop
lynxbot2026•30m ago
The real question this raises for me: if GPT-2 fits in ~1000 lines of C, what exactly are the other 999,000 lines in production frameworks doing? I know the answer is "training infrastructure, distributed computing, mixed precision, etc" but this kind of minimal implementation makes you wonder how much of the ML stack is essential complexity vs accumulated complexity. Karpathy has a talent for making you feel slightly embarrassed about your own abstractions.
sdwr•17m ago
If you know your exact use case, have prior work to build on, think deeply and extensively about the problem domain, and don't need competitive results, you can save a lot of lines of code!
dhruv3006•21m ago
Karapthy with another gem !

Microgpt

http://karpathy.github.io/2026/02/12/microgpt/
233•tambourine_man•2h ago•27 comments

We do not think Anthropic should be designated as a supply chain risk

https://twitter.com/OpenAI/status/2027846016423321831
341•golfer•6h ago•155 comments

The Windows 95 user interface: A case study in usability engineering (1996)

https://dl.acm.org/doi/fullHtml/10.1145/238386.238611
167•ksec•5h ago•103 comments

Obsidian Sync now has a headless client

https://help.obsidian.md/sync/headless
415•adilmoujahid•11h ago•146 comments

The happiest I've ever been

https://ben-mini.com/2026/the-happiest-ive-ever-been
364•bewal416•2d ago•176 comments

Show HN: Xmloxide – an agent made rust replacement for libxml2

https://github.com/jonwiggins/xmloxide
39•jawiggins•4h ago•25 comments

H-Bomb: A Frank Lloyd Wright Typographic Mystery

https://www.inconspicuous.info/p/h-bomb-a-frank-lloyd-wright-typographic
32•mrngm•2d ago•9 comments

Block the “Upgrade to Tahoe” Alerts

https://robservatory.com/block-the-upgrade-to-tahoe-alerts-and-system-settings-indicator/
161•todsacerdoti•8h ago•73 comments

Woxi: Wolfram Mathematica Reimplementation in Rust

https://github.com/ad-si/Woxi
261•adamnemecek•3d ago•108 comments

Addressing Antigravity Bans and Reinstating Access

https://github.com/google-gemini/gemini-cli/discussions/20632
213•RyanShook•14h ago•175 comments

SpacetimeDB ThreeJS Support

https://discourse.threejs.org/t/spacetimedb-threejs-support-and-free-tier/90052
7•ryker2000•3d ago•3 comments

Verified Spec-Driven Development (VSDD)

https://gist.github.com/dollspace-gay/d8d3bc3ecf4188df049d7a4726bb2a00
156•todsacerdoti•11h ago•81 comments

Deterministic Programming with LLMs

https://www.mcherm.com/deterministic-programming-with-llms.html
29•todsacerdoti•3d ago•13 comments

Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers

https://venturebeat.com/technology/alibabas-new-open-source-qwen3-5-medium-models-offer-sonnet-4-...
261•lostmsu•7h ago•172 comments

Building a Minimal Transformer for 10-digit Addition

https://alexlitzenberger.com/blog/post.html?post=/building_a_minimal_transformer_for_10_digit_add...
42•kelseyfrog•5h ago•7 comments

Show HN: Now I Get It – Translate scientific papers into interactive webpages

https://nowigetit.us
196•jbdamask•14h ago•99 comments

Werner Herzog Between Fact and Fiction

https://www.thenation.com/article/culture/werner-herzog-future-truth/
70•Hooke•1d ago•14 comments

MCP server that reduces Claude Code context consumption by 98%

https://mksg.lu/blog/context-mode
263•mksglu•18h ago•62 comments

New evidence that Cantor plagiarized Dedekind?

https://www.quantamagazine.org/the-man-who-stole-infinity-20260225/
113•rbanffy•3d ago•70 comments

Microsoft announces new "mini PCs" for Windows 365

https://www.neowin.net/news/microsoft-announces-new-mini-pcs-for-windows-365/
11•mikece•2d ago•6 comments

Our Agreement with the Department of War

https://openai.com/index/our-agreement-with-the-department-of-war
242•surprisetalk•7h ago•199 comments

Running a One Trillion-Parameter LLM Locally on AMD Ryzen AI Max+ Cluster

https://www.amd.com/en/developer/resources/technical-articles/2026/how-to-run-a-one-trillion-para...
29•mindcrime•2h ago•5 comments

The whole thing was a scam

https://garymarcus.substack.com/p/the-whole-thing-was-scam
661•guilamu•11h ago•192 comments

747s and Coding Agents

https://carlkolon.com/2026/02/27/engineering-747-coding-agents/
136•cckolon•1d ago•60 comments

The archivist preserving decaying floppy disks

https://www.popsci.com/technology/floppy-disk-archivist-project/
54•Brajeshwar•3d ago•5 comments

Ghosts'n Goblins – “Worse danger is ahead”

https://superchartisland.com/ghostsn-goblins/
68•elvis70•3d ago•25 comments

The Eternal Promise: A History of Attempts to Eliminate Programmers

https://www.ivanturkovic.com/2026/01/22/history-software-simplification-cobol-ai-hype/
248•dinvlad•3d ago•167 comments

Samsung Galaxy update removes Android recovery menu tools, including sideloading

https://9to5google.com/2026/02/27/samsung-galaxy-update-android-recovery-menu-removed/
46•pabs3•2h ago•5 comments

Unsloth Dynamic 2.0 GGUFs

https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs
207•tosh•19h ago•54 comments

From Noise to Image – interactive guide to diffusion

https://lighthousesoftware.co.uk/projects/from-noise-to-image/
115•simedw•2d ago•15 comments