Show HN: I built a tiny LLM to demystify how language models work

121•armanified•3h ago

Built a ~9M param LLM from scratch to understand how they actually work. Vanilla transformer, 60K synthetic conversations, ~130 lines of PyTorch. Trains in 5 min on a free Colab T4. The fish thinks the meaning of life is food.

Fork it and swap the personality for your own character.

Comments

AndrewKemendo•1h ago

I love these kinds of educational implementations.

I want to really praise the (unintentional?) nod to Nagel, by limiting capabilities to representation of a fish, the user is immediately able to understand the constraints. It can only talk like a fish cause it’s very simple

Especially compared to public models, thats a really simple correspondence to grok intuitively (small LLM > only as verbose as a fish, larger LLM > more verbose) so kudos to the author for making that simple and fun.

dvt•1h ago

> the user is immediately able to understand the constraints

Nagel's point was quite literally the opposite[1] of this, though. We can't understand what it must "be like to be a bat" because their mental model is so fundamentally different than ours. So using all the human language tokens in the world can't get us to truly understand what it's like to be a bat, or a guppy, or whatever. In fact, Nagel's point is arguably even stronger: there's no possible mental mapping between the experience of a bat and the experience of a human.

[1] https://www.sas.upenn.edu/~cavitch/pdf-library/Nagel_Bat.pdf

AndrewKemendo•58m ago

Different argument

I’m not going to argue other than to say that you need to view the point from a third party perspective evaluating “fish” vs “more verbose thing,” such that the composition is the determinant of the complexity of interaction (which has a unique qualia per nagel)

Hence why it’s a “unintentional nod” not an instantiation

nullbyte808•1h ago

Adorable! Maybe a personality that speaks in emojis?

SilentM68•1h ago

Would have been funny if it were called "DORY" due to memory recall issues of the fish vs LLMs similar recall issues :)

ordinarily•39m ago

It's genuinely a great introduction to LLMs. I built my own awhile ago based off Milton's Paradise Lost: https://www.wvrk.org/works/milton

cbdevidal•23m ago

> you're my favorite big shape. my mouth are happy when you're here.

Laughed loudly :-D

xantronix•19m ago

I fucking hate LLMs as a matter of principle.

However.

I love this. It's so tiny. And cute. It's just a little guy.

gnarlouse•4m ago

I... wow, you made an LLM that can actually tell jokes?

martmulx•3m ago

How much training data did you end up needing for the fish personality to feel coherent? Curious what the minimum viable dataset looks like for something like this.

Show HN: I built a tiny LLM to demystify how language models work

Gemma 4 on iPhone

Show HN: YouTube search barely works, I made a search form with advanced filters

LÖVE: 2D Game Framework for Lua

Copilot is 'for entertainment purposes only', per Microsoft's terms of use

Microsoft hasn't had a coherent GUI strategy since Petzold

Artemis II crew see first glimpse of far side of Moon [video]

Eight years of wanting, three months of building with AI

Endian wars and anti-portability: this again?

Show HN: Gemma Gem – AI model embedded in a browser – no API keys, no cloud

Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code

Employers use your personal data to figure out the lowest salary you'll accept

Sheets Spreadsheets in Your Terminal

In Japan, the robot isn't coming for your job; it's filling the one nobody wants

Scientists mapped all the nerves of the clitoris for the first time

Show HN: Modo – I built an open-source alternative to Kiro, Cursor, and Windsurf

Why Switzerland has 25 Gbit internet and America doesn't

Music for Programming

OpenAI's fall from grace as investors race to Anthropic

The Mechanics of Steins Gate (2023) [pdf]

Recall – local multimodal semantic search for your files

Computational Physics (2nd Edition) (2025)

A tail-call interpreter in (nightly) Rust

Wavelets on Graphs via Spectral Graph Theory (2009)

We replaced Node.js with Bun for 5x throughput

LLMs can't justify their answers–this CLI forces them to

Nanocode: The best Claude Code that $200 can buy in pure JAX on TPUs

Caveman: Why use many token when few token do trick

Stamp It All Programs Must Report Their Version – Michael Stapelberg

Friendica – A Decentralized Social Network

Show HN: I built a tiny LLM to demystify how language models work

Comments

Show HN: I built a tiny LLM to demystify how language models work

Gemma 4 on iPhone

Show HN: YouTube search barely works, I made a search form with advanced filters

LÖVE: 2D Game Framework for Lua

Copilot is 'for entertainment purposes only', per Microsoft's terms of use

Microsoft hasn't had a coherent GUI strategy since Petzold

Artemis II crew see first glimpse of far side of Moon [video]

Eight years of wanting, three months of building with AI

Endian wars and anti-portability: this again?

Show HN: Gemma Gem – AI model embedded in a browser – no API keys, no cloud

Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code

Employers use your personal data to figure out the lowest salary you'll accept

Sheets Spreadsheets in Your Terminal

In Japan, the robot isn't coming for your job; it's filling the one nobody wants

Scientists mapped all the nerves of the clitoris for the first time

Show HN: Modo – I built an open-source alternative to Kiro, Cursor, and Windsurf

Why Switzerland has 25 Gbit internet and America doesn't

Music for Programming

OpenAI's fall from grace as investors race to Anthropic

The Mechanics of Steins Gate (2023) [pdf]

Recall – local multimodal semantic search for your files

Computational Physics (2nd Edition) (2025)

A tail-call interpreter in (nightly) Rust

Wavelets on Graphs via Spectral Graph Theory (2009)

We replaced Node.js with Bun for 5x throughput

LLMs can't justify their answers–this CLI forces them to

Nanocode: The best Claude Code that $200 can buy in pure JAX on TPUs

Caveman: Why use many token when few token do trick

Stamp It All Programs Must Report Their Version – Michael Stapelberg

Friendica – A Decentralized Social Network