Experts Have World Models. LLMs Have Word Models

https://www.latent.space/p/adversarial-reasoning

10•aaronng91•1h ago

Comments

D-Machine•1h ago

Fun play on words. But yes, LLMs are Large Language Models, not Large World Models. This matters because (1) the world cannot be modeled anywhere close to completely with language alone, and (2) language only somewhat models the world (much in language is convention, wrong, or not concerned with modeling the world, but other concerns like persuasion, causing emotions, or fantasy / imagination).

It is somewhat complicated by the fact LLMs (and VLMs) are also trained in some cases on more than simple language found on the internet (e.g. code, math, images / videos), but the same insight remains true. The interesting question is to just see how far we can get with (2) anyway.

swyx•26m ago

editor here! all questions welcome - this is a topic i've been pursuing in the podcast for much of the past year... links inside.

cracell•19m ago

I found it to be an interesting angle but thought it was odd that a key point is is "LLMs dominate chess-like domains" while LLMs are not great at chess https://dev.to/maximsaplin/can-llms-play-chess-ive-tested-13...

naasking•13m ago

I think it's correct to say that LLM have word models, and given words are correlated with the world, they also have degenerate world models, just with lots of inconsistencies and holes. Tokenization issues aside, LLMs will likely also have some limitations due to this. Multimodality should address many of these holes.

D-Machine•47s ago

It's also important to handle cases where the word patterns (or token patterns, rather) have a negative correlation with the patterns in reality. There are some domains where the majority of content on the internet is actually just wrong, or where different approaches lead to contradictory conclusions.

E.g. syllogistic arguments based on linguistic semantics can lead you deeply astray if you those arguments don't properly measure and quantify at each step.

I ran into this in a somewhat trivial case recently, trying to get ChatGPT to tell me if washing mushrooms ever really actually matters practically in cooking (anyone who cooks and has tested knows, in fact, a quick wash has basically no impact ever for any conceivable cooking method, except if you wash e.g. after cutting and are immediately serving them raw).

Until I forced it to cite respectable sources, it just repeated the usual (false) advice about not washing (i.e. most of the training data is wrong and repeats a myth), and it even gave absolute nonsense arguments about water percentages and thermal energy required for evaporating even small amounts of surface water as pushback (i.e. using theory that just isn't relevant when you actually properly quantify), and only after a lot of prompts and demands to only make claims supported by reputable sources, did it finally find McGee's and Kenji Lopez's actual empirical tests showing that it just doesn't matter practically.

Show HN: Calculator for UK student loan repayment strategies

Context Fence Design Pattern for Claude Code Skills

Intel Recently Shelved Numerous Open-Source Projects

Catching Fire: How Cooking Made Us Human (2009) [pdf]

A Newbie's First Contribution to (Rust for) Linux

Ask HN: How are you enabling your company to vibe-code?

Multi-Layered Counter-UAS Defense: Portable, Mobile, and Fixed

Is artificial general intelligence here?

Show HN: Sofia Core – Open-source AI infrastructure with biological computing

Ask HN: How do you maintain integrations once they're in production?

"Infinite Jest" Has Turned Thirty. Have We Forgotten How to Read It?

Show HN: SubAnalyzer subdomain discovery and external attack surface map tool

31-year old VT220 terminfo curses bug

Interlock (Engineering)

Master of Science in Applied Ontology (Fully Online)

Creating a Programming Language Using Coding Agents on GitHub

Hollywood Is Losing Audiences to AI Fatigue

SOK: On the Analysis of Web Browser Security (2021)

An Analysis of Poptropica's Mancala

Why Improving VO₂ Max Increases Confidence Outside of Workouts

Show HN: Nick the Groq – AI Poker Coach- Open Source

DSA Interview Preparation Guide: Complete 90-Day Roadmap

Ask HN: What Are You Working On? (February 2026)

Ask HN: What made VLIW a good fit for DSPs compared to GPUs?

Living hell of North Korea's paradise on Earth scheme back in spotlight in Japan

The Future of Software Engineering

BBC's Stopmotion 2026 Olympic Winter Games Trailer behind-the-scenes [video]

The next frontier in weight-loss drugs: one-time gene therapy

Turn any REST API with an OpenAPI spec into queryable Apache Spark tables

Tegratop – A Comprehensive TUI monitoring tool for Nvidia jetson boards