frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Microgpt

http://karpathy.github.io/2026/02/12/microgpt/
181•tambourine_man•2h ago

Comments

tithos•1h ago
What is the prime use case
aaronblohowiak•1h ago
“Art project”
pixelatedindex•45m ago
If writing is art, then I’ve been amazed at the source code written by this legend
geerlingguy•1h ago
Looks like to learn how a GPT operates, with a real example.
foodevl•45m ago
Yeah, everyone learns differently, but for me this is a perfect way to better understand how GPTs work.
antonvs•1h ago
To confuse people who only think in terms of use cases.

Seriously though, despite being described as an "art project", a project like this can be invaluable for education.

keyle•1h ago
it's a great learning tool and it shows it can be done concisely.
jackblemming•58m ago
Case study to whenever a new copy of Programming Pearls is released.
inerte•45m ago
Kaparthy to tell you things you thought were hard in fact fit in a screen.
colonCapitalDee•1h ago
Beautiful work
ViktorRay•1h ago
Which license is being used for this?
dilap•43m ago
MIT (https://gist.github.com/karpathy/8627fe009c40f57531cb1836010...)
ViktorRay•19m ago
Thank you
fulafel•39m ago
This could make an interesting language shootout benchmark.
profsummergig•36m ago
If anyone knows of a way to use this code on a consumer grade laptop to train on a small corpus (in less than a week), and then demonstrate inference (hallucinations are okay), please share how.
ThrowawayTestr•33m ago
This is like those websites that implement an entire retro console in the browser.
Paddyz•31m ago
The most interesting thing about this project isn't the GPT itself - it's what it reveals about the gap between "understanding the math" and "understanding the engineering."

I've read the attention paper, watched Karpathy's own YouTube lectures, implemented toy transformers in notebooks. But reading through this C implementation I finally grokked why KV-cache matters at the systems level - because when you see the memory layout spelled out in raw mallocs instead of hidden behind PyTorch abstractions, the O(n^2) vs O(n) tradeoff for cached inference becomes visceral rather than theoretical.

This is the same reason people learn more about operating systems from xv6 than from reading the Linux source. Minimal implementations aren't just educational toys - they're the only way to build intuition about what the abstractions are actually doing. The fact that a full GPT-2 fits in ~1000 lines of C should make every ML engineer uncomfortable about how much complexity their frameworks are hiding from them.

misiti3780•22m ago
agreed - no one else is saying this.
janis1234•17m ago
I found reading Linux source more useful than learning about xv6 because I run Linux and reading through source felt immediately useful. I.e, tracing exactly how a real process I work with everyday gets created.

Can you explain this O(n2) vs O(n) significance better?

Paddyz•13m ago
Sure - so without KV cache, every time the model generates a new token it has to recompute attention over the entire sequence from scratch. Token 1 looks at token 1. Token 2 looks at tokens 1,2. Token 3 looks at 1,2,3. That's 1+2+3+...+n = O(n^2) total work to generate n tokens.

With KV cache you store the key/value vectors from previous tokens, so when generating token n you only compute the new token's query against the cached keys. Each step is O(n) instead of recomputing everything, and total work across all steps drops to O(n^2) in theory but with way better constants because you're not redoing matrix multiplies you already did.

The thing that clicked for me reading the C code was seeing exactly where those cached vectors get stored in memory and how the pointer arithmetic works. In PyTorch it's just `past_key_values` getting passed around and you never think about it. In C you see the actual buffer layout and it becomes obvious why GPU memory is the bottleneck for long sequences.

lynxbot2026•6m ago
The real question this raises for me: if GPT-2 fits in ~1000 lines of C, what exactly are the other 999,000 lines in production frameworks doing? I know the answer is "training infrastructure, distributed computing, mixed precision, etc" but this kind of minimal implementation makes you wonder how much of the ML stack is essential complexity vs accumulated complexity. Karpathy has a talent for making you feel slightly embarrassed about your own abstractions.

Show HN: ClaudeTerminal – A tabbed terminal manager for Claude Code

https://github.com/Mr8BitHK/claude-terminal
1•mr8bit•53s ago•0 comments

NeurIPS 2021 Papers (2021)

https://tanelp.github.io/neurips2021/
1•vinhnx•4m ago•0 comments

Office of Technology Assessment

https://en.wikipedia.org/wiki/Office_of_Technology_Assessment
1•softwaredoug•5m ago•0 comments

MidnightBSD Excludes Calif. From Desktop Use Due to Digital Age Assurance Act

https://ostechnix.com/midnightbsd-excludes-california-digital-age-assurance-act/
3•WaitWaitWha•8m ago•1 comments

OpenSandbox

https://github.com/alibaba/OpenSandbox
1•nileshtrivedi•9m ago•0 comments

Why Is Your Operating System Debugging Hackers for Free?

1•agarmte•9m ago•0 comments

Polymarket Iran Bets Hit $529M as New Wallets Draw Notice

https://www.bloomberg.com/news/articles/2026-02-28/polymarket-iran-bets-hit-529-million-as-new-wa...
1•petethomas•11m ago•0 comments

Show HN: Computer Agents – Agents that work while you sleep

https://computer-agents.com
2•janlucasandmann•11m ago•0 comments

Uplift Privileges on FreeBSD

https://vermaden.wordpress.com/2026/03/01/uplift-privileges-on-freebsd/
1•vermaden•11m ago•0 comments

Artichoke induces sweet taste (PubMed)

https://pubmed.ncbi.nlm.nih.gov/5084667/
1•valzevul•11m ago•0 comments

Edge – Generate structured evaluation criteria for any domain using a local LLM

https://github.com/EviAmarates/fresta-edge
1•TiagoSantos•22m ago•0 comments

Have you used Terragrunt in the past? Keen to hear your thoughts

https://techroom101.substack.com/p/terragrunt-what-it-solves-what-it
1•ahaydar•23m ago•0 comments

Two-way Discord bridge-autonomous Claude Code sessions(WebSocket+local queue)

https://github.com/AetherWave-Studio/autonomous-claude-code
1•Drew-Aetherwave•23m ago•1 comments

Token Anxiety

https://writing.nikunjk.com/p/token-anxiety
1•vinhnx•24m ago•0 comments

A State Government Tried to Regulate Linux; It Went How You'd Expect

https://www.youtube.com/watch?v=mQLdDR-hJpc
1•cable2600•29m ago•0 comments

I built AI agents that do the grunt work solo founders hate

2•Seleci•36m ago•0 comments

TorchLean: Formalizing Neural Networks in Lean

https://leandojo.org/torchlean.html
2•matt_d•36m ago•0 comments

Hackers Expose the Surveillance Stack Hiding Inside "Age Verification"

https://www.techdirt.com/2026/02/25/hackers-expose-the-massive-surveillance-stack-hiding-inside-y...
2•nobody9999•37m ago•1 comments

Japanese firm Space One plans to launch Kairos No.3 rocket on Sunday

https://www3.nhk.or.jp/nhkworld/en/news/20260301_01/
2•HardwareLust•40m ago•1 comments

Show HN: Sailor.ai – source-backed personalized outbound emails

https://trysailor.ai/
1•bill_waybird•40m ago•1 comments

Show HN: Brand Analytics for AI Search Engines (Beta)

https://explore.somantra.ai/dashboard/141d19d6-1ee7-4a25-81cf-411e6792e286/Australia
1•prasaar•42m ago•0 comments

Show HN: Parallax – Ansible Without Python

https://parallax.digitalxero.dev/
1•DjGilcrease•43m ago•0 comments

Skills.sh Ecosystem Dashboard

https://skills-dashboard.olshansky.info/
1•Olshansky•45m ago•0 comments

Show HN: A visual sitemap editor that forces you to design structure before UI

3•epic_ai•51m ago•2 comments

Show HN: Memctl v0.1.0 Open source shared persistent memory for AI coding agents

https://memctl.com
3•meszmate•52m ago•0 comments

HeadElf-Mvidia: Executive Intelligence Template

https://github.com/pauljbernard/HeadElf-MVIDIA
3•paulbernard•54m ago•2 comments

Agents are not thinking: Science of agent behavior

https://technoyoda.github.io/agent-science.html
3•chse_cake•58m ago•0 comments

Sam Altman Answers Questions on X.com About Pentagon Deal, Threats to Anthropic

https://news.slashdot.org/story/26/03/01/0233230/sam-altman-answers-questions-on-xcom-about-penta...
1•MilnerRoute•59m ago•0 comments

Church of the SubGenius

https://en.wikipedia.org/wiki/Church_of_the_SubGenius
2•thomassmith65•59m ago•0 comments

Show HN: MCP server that strips injection vectors from LLM input

https://github.com/timstarkk/mcp-safe-fetch
1•timstark•1h ago•0 comments