Ask HN: Build Your Own LLM?

15•retube•4mo ago

The best way to really understand how something works is to build it yourself. So I am wondering if there are any good tutorials on building your own LLM from scratch. I.e. implementing tokenisation, embeddings, attention and so on. I am not suggesting one could replicate chatGPT, but more a toy model that implements the core features but based on a much smaller corpus and training data.

Comments

2ro•4mo ago

How about this?

https://mathstodon.xyz/@empty/115088095028020763

retube•4mo ago

thanks

pm2222•4mo ago

https://www.amazon.com/Build-Large-Language-Model-Scratch/dp...

retube•4mo ago

thanks. looks potential

ryanchants•4mo ago

I'd get it straight from Manning and save a few bucks and take out the middle man: https://www.manning.com/books/build-a-large-language-model-f...

sfmz•4mo ago

Andrej Karpathy: Let's build GPT: from scratch, in code, spelled out. https://www.youtube.com/watch?v=kCc8FmEb1nY

beardyw•4mo ago

Andrej Karpathy's Nano GPT is reasonably accessible and easy to run.

https://github.com/karpathy/nanoGPT

runjake•4mo ago

Since you're posting here, you're looking for the shortcut.

The shortcut is Karpathy's "Let's Build GPT: from scratch, in code, spelled out" video:

https://www.youtube.com/watch?v=kCc8FmEb1nY

Then there is a good video that dives into LLMs and how they work that is quite approachable:

https://www.youtube.com/watch?v=7xTGNNLPyMI

From there, flesh out knowledge with his other videos, where he goes both extremely light and extremely deep:

https://www.youtube.com/@AndrejKarpathy/videos

Anyway, I really like's Karpathy's video because he's very good at explaining LLMs at every level.

khamidou•3mo ago

Sorry to self-promote but I did exactly that a few months back: https://khamidou.com/gpt2/

Generally, I think the Karpathy tutorials are a good starting point but they're very mathy (despite people telling you you only need high school math to understand llms, a lot of the abstractions and concepts he uses are a bit foreign to programmers).

I found out rebuilding inference of a known model taught me a lot more than passively sitting through the videos and maybe retyping his code. You should try it with something simple, like a model from a few years back!

liqilin1567•3mo ago

There is a new repo of karpathy: https://github.com/karpathy/nanochat. It's a full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase.

AI readability score for your documentation

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

I inhaled traffic fumes to find out where air pollution goes in my body

X said it would give $1M to a user who had previously shared racist posts

155M US land parcel boundaries

Private Inference

Font Rendering from First Principles

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

Wally: A fun, reliable voice assistant in the shape of a penguin

Rewriting Pycparser with the Help of an LLM

Lobsters Vibecoding Challenge

E-Commerce vs. Social Commerce

Avoiding Modern C++ – Anton Mikhailov [video]

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

Zig – Package Management Workflow Enhancements

AI-powered text correction for macOS

AppSecMaster – Learn Application Security with hands on challenges

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it

City skylines need an upgrade in the face of climate stress

1979: The Model World of Robert Symes [video]

Satellites Have a Lot of Room

1980s Farm Crisis

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Show HN: Remotion directory (videos and prompts)