Markov chains are the original language models

https://elijahpotter.dev/articles/markov_chains_are_the_original_language_models

76•chilipepperhott•4d ago

Comments

allthatineed•1h ago

I remember playing with megahal eggdrop bots.

This was one of my first forays into modifying c code, trying to figure out why 350mb seemed to be the biggest brain size (32 bit memory limits and requiring a contiguous block for the entire brain).

I miss the innocence of those days. Just being a teen, tinkering with things i didn't understand.

foobarian•1h ago

I'm old now, but thanks to LLMs I can now again tinker with things I don't understand :-)

codr7•1h ago

Are you though? Or is the LLM the target of your tinkering and lack of understanding? Honest question.

jcynix•17m ago

The nice thing about LLMs is that they can explain stuff so you can learn to understand. And they are very patient.

For example I'm currently relearning various ImageMagick details and thanks to their explanations now understand things that I cut/copy/pasted a long time ago without always understanding why things worked the way they did.

vunderba•59m ago

I remember reading the source of the original MegaHAL program when I was younger - one of the tricks that made it stand out (particularly in the Loebner competitions [1]) was that it used both a backwards and forwards Markov chain to generate responses.

[1] https://en.wikipedia.org/wiki/Loebner_Prize

glouwbug•1h ago

The Practice of Programming by Kernighan and Pike had a really elegant Markov:

https://github.com/Heatwave/the-practice-of-programming/blob...

jcynix•23m ago

And Mark V. Shaney was designed by Rob Pike and posted on Usenet, but that happened a long time ago:

https://en.wikipedia.org/wiki/Mark_V._Shaney

chankstein38•1h ago

I once, probably 4-6 years ago, used exports from Slack conversations to train a Markov Chain to recreate a user that was around a lot and then left for a while. I wrote the whole thing in python and wasn't overly well-versed in the statistics and math side but understood the principle. I made a bot and had it join the Slack instance that I administrate and it would interact if you tagged it or if you said things that person always responded to (hardcoded).

Well, the responses were pretty messed up and not accurate but we all got a good chuckle watching the bot sometimes actually sound like the person amidst a mass of random other things that person always said jumbled together :D

vunderba•1h ago

I had a similar program designed as my "AWAY" bot that was trained on transcripts of my previous conversations and connected to Skype. At the time (2009) I was living in Taiwan so I would activate it when I went to bed to chat with my friends back in the States given the ~12 hour time difference. Reading back some of the transcripts made it sound like I was on the verge of a psychotic break though.

ahmedhawas123•46m ago

Random tidbit - 15+ years ago Markov chains were the go to for auto generating text. Google was not as advanced as it is today at flagging spam, so most highly affiliate-marketing dense topics (e.g., certain medications, products) search engine results pages were swamped with Markov chain-created websites that were injected with certain keywords.

AnotherGoodName•37m ago

The problem is the linear nature of markov chains. Sure they can branch but after an observation you are absolutely at a new state. A goes to B goes to C etc. A classic problem to understand why this is an issue is feeding in a 2D bitmap where the patterns are vertical but you’re passing in data left to right which Markov chains can’t handle since they are navigating exclusively on the current left to right inout. They miss the patterns completely. Similar things happen with language. Language is not linear and context from a few sentences ago should change probabilities in the current sequence of characters. The attention mechanism is the best we have for this and Markov chains struggle beyond stringing together a few syllables.

I have played with Markov chains a lot. I tried having skip states and such but ultimately you’re always pushed towards doing something similar to the attention mechanism to handle context.

cuttothechase•4m ago

Would having a Markov chain of Markov chains help in this situation. One chain does this when 2D bitmap patterns are vertical and another one for left to right?

6r17•3m ago

Would you say it's interesting to explore after spending much time on them ? Do you feel like one could make an use for it pragmatically within certain context or it's way too much of a toy where most of the time getting a service / coherent llm would ease-in the work ?

taolson•36m ago

If you program a Markov chain to generate based upon a fairly short sequence length (4 - 5 characters), it can create some neat portamenteaus. I remember back in the early 90's I trained one on some typical tech literature and it came up with the word "marketecture".

cestith•30m ago

I’ve been telling people for years that a reasonably workable initial, simplified mental model of a large language model is a Markov chain generator with an unlimited, weighted corpus trained in. Very few people who know LLMs have said anything to critique that thought more than that it’s a coarse description and downplays the details. Since being simplified is in the initial statement and it’s not meant to capture detail, I say if it walks like a really big duck and it honks instead of quacking then it’s maybe a goose or swan which are both pretty duck-like birds.

nerdponx•24m ago

It's not a Markov chain because it doesn't obey the Markov property.

What it is, and what I assume you mean, is a next-word prediction model based solely on the previous sequence of words, up to some limit. It literally is that, because it was designed to be that.

jama211•21m ago

Sure, but arguably by that definition so are we ;)

jcynix•26m ago

Ah, yes, Markov chains. A long time agoMark V. Shaney https://en.wikipedia.org/wiki/Mark_V._Shaney was designed by Rob Pike and posted on Usenet.

And Veritasium's video "The Strange Math That Predicts (Almost) Anything" talks in detail about the history of Markov chains: https://youtu.be/KZeIEiBrT_w

benob•19m ago

Like it or not, LLMs are effectively high-order Markov chains

guluarte•16m ago

markov chains with limited self correction

BenoitEssiambre•7m ago

Exactly. I think of them as Markov Chains in grammar space or in Abstract Syntax Tree space instead of n-gram chain-of-words space. The attention mechanism likely plays a role in identifying the parent in the grammar tree or identifying other types of back references like pronouns or if it's for programming languages, variable back references.

fair_enough•2m ago

On a humorous note OP, you seem like exactly the kind of guy who would get a kick out of this postmodern essay generator that a STEM professor wrote using a Recursive Transition Network in 1996:

https://www.elsewhere.org/journal/pomo/

Every now and again, I come back to it for a good chuckle. Here's what I got this time (link to the full essay below the excerpt):

"If one examines subcultural capitalist theory, one is faced with a choice: either reject subdialectic cultural theory or conclude that the significance of the artist is significant form, but only if culture is interchangeable with art. It could be said that many desituationisms concerning the capitalist paradigm of context exist. The subject is contextualised into a presemanticist deappropriation that includes truth as a totality."

https://www.elsewhere.org/journal/pomo/?1298795365

jedberg•2m ago

For funsies one of the early reddit engineers wrote a Markov comment generator trained on all the existing comments.

It worked surprisingly well. :)

Libghostty is coming

Find SF parking cops

Android users can now use conversational editing in Google Photos

Markov chains are the original language models

How to draw construction equipment for kids

Launch HN: Strata (YC X25) – One MCP server for AI to handle thousands of tools

Go has added Valgrind support

From MCP to shell: MCP auth flaws enable RCE in Claude Code, Gemini CLI and more

Always Invite Anna

Mesh: I tried Htmx, then ditched it

Nine things I learned in ninety years

x402 — An open protocol for internet-native payments

Getting AI to work in complex codebases

Getting More Strategic

Restrictions on house sharing by unrelated roommates

Thundering herd problem: Preventing the stampede

Structured Outputs in LLMs

OpenDataLoader-PDF: An open source tool for structured PDF parsing

Agents turn simple keyword search into compelling search experiences

Zinc (YC W14) Is Hiring a Senior Back End Engineer (NYC)

Zoxide: A Better CD Command

Denmark wants to push through Chat Control

Shopify, pulling strings at Ruby Central, forces Bundler and RubyGems takeover

YAML document from hell (2023)

Show HN: Run Qwen3-Next-80B on 8GB GPU at 1tok/2s throughput

The Great American Travel Book: The book that helped revive a genre

Smooth weighted round-robin balancing

Processing Strings 109x Faster Than Nvidia on H100

Show HN: Kekkai – a simple, fast file integrity monitoring tool in Go

Permeable materials in homes act as sponges for harmful chemicals: study

Markov chains are the original language models

Comments

Libghostty is coming

Find SF parking cops

Android users can now use conversational editing in Google Photos

Markov chains are the original language models

How to draw construction equipment for kids

Launch HN: Strata (YC X25) – One MCP server for AI to handle thousands of tools

Go has added Valgrind support

From MCP to shell: MCP auth flaws enable RCE in Claude Code, Gemini CLI and more

Always Invite Anna

Mesh: I tried Htmx, then ditched it

Nine things I learned in ninety years

x402 — An open protocol for internet-native payments

Getting AI to work in complex codebases

Getting More Strategic

Restrictions on house sharing by unrelated roommates

Thundering herd problem: Preventing the stampede

Structured Outputs in LLMs

OpenDataLoader-PDF: An open source tool for structured PDF parsing

Agents turn simple keyword search into compelling search experiences

Zinc (YC W14) Is Hiring a Senior Back End Engineer (NYC)

Zoxide: A Better CD Command

Denmark wants to push through Chat Control

Shopify, pulling strings at Ruby Central, forces Bundler and RubyGems takeover

YAML document from hell (2023)

Show HN: Run Qwen3-Next-80B on 8GB GPU at 1tok/2s throughput

The Great American Travel Book: The book that helped revive a genre

Smooth weighted round-robin balancing

Processing Strings 109x Faster Than Nvidia on H100

Show HN: Kekkai – a simple, fast file integrity monitoring tool in Go

Permeable materials in homes act as sponges for harmful chemicals: study