frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Experts Have World Models. LLMs Have Word Models

https://www.latent.space/p/adversarial-reasoning
10•aaronng91•1h ago

Comments

D-Machine•1h ago
Fun play on words. But yes, LLMs are Large Language Models, not Large World Models. This matters because (1) the world cannot be modeled anywhere close to completely with language alone, and (2) language only somewhat models the world (much in language is convention, wrong, or not concerned with modeling the world, but other concerns like persuasion, causing emotions, or fantasy / imagination).

It is somewhat complicated by the fact LLMs (and VLMs) are also trained in some cases on more than simple language found on the internet (e.g. code, math, images / videos), but the same insight remains true. The interesting question is to just see how far we can get with (2) anyway.

swyx•26m ago
editor here! all questions welcome - this is a topic i've been pursuing in the podcast for much of the past year... links inside.
cracell•19m ago
I found it to be an interesting angle but thought it was odd that a key point is is "LLMs dominate chess-like domains" while LLMs are not great at chess https://dev.to/maximsaplin/can-llms-play-chess-ive-tested-13...
naasking•13m ago
I think it's correct to say that LLM have word models, and given words are correlated with the world, they also have degenerate world models, just with lots of inconsistencies and holes. Tokenization issues aside, LLMs will likely also have some limitations due to this. Multimodality should address many of these holes.
D-Machine•47s ago
It's also important to handle cases where the word patterns (or token patterns, rather) have a negative correlation with the patterns in reality. There are some domains where the majority of content on the internet is actually just wrong, or where different approaches lead to contradictory conclusions.

E.g. syllogistic arguments based on linguistic semantics can lead you deeply astray if you those arguments don't properly measure and quantify at each step.

I ran into this in a somewhat trivial case recently, trying to get ChatGPT to tell me if washing mushrooms ever really actually matters practically in cooking (anyone who cooks and has tested knows, in fact, a quick wash has basically no impact ever for any conceivable cooking method, except if you wash e.g. after cutting and are immediately serving them raw).

Until I forced it to cite respectable sources, it just repeated the usual (false) advice about not washing (i.e. most of the training data is wrong and repeats a myth), and it even gave absolute nonsense arguments about water percentages and thermal energy required for evaporating even small amounts of surface water as pushback (i.e. using theory that just isn't relevant when you actually properly quantify), and only after a lot of prompts and demands to only make claims supported by reputable sources, did it finally find McGee's and Kenji Lopez's actual empirical tests showing that it just doesn't matter practically.

Show HN: Calculator for UK student loan repayment strategies

https://mystudentloan.uk
1•farham•21s ago•0 comments

Context Fence Design Pattern for Claude Code Skills

https://github.com/jimmc414/claude-context-fence
1•Jimmc414•1m ago•0 comments

Intel Recently Shelved Numerous Open-Source Projects

https://www.phoronix.com/news/Intel-OSS-Projects-Ended-2025
1•pjmlp•2m ago•0 comments

Catching Fire: How Cooking Made Us Human (2009) [pdf]

https://dn790008.ca.archive.org/0/items/pdfy-DDoNCJJ_Wt0qOH7e/Catching%20Fire%20%5BHow%20Cooking%...
1•bookofjoe•2m ago•0 comments

A Newbie's First Contribution to (Rust for) Linux

https://blog.buenzli.dev/rust-for-linux-first-contrib/
1•goranmoomin•7m ago•0 comments

Ask HN: How are you enabling your company to vibe-code?

1•tornato7•8m ago•0 comments

Multi-Layered Counter-UAS Defense: Portable, Mobile, and Fixed

https://dzyne.com/counter-uas/
1•rolph•8m ago•0 comments

Is artificial general intelligence here?

https://www.universityofcalifornia.edu/news/artificial-general-intelligence-here
1•geox•9m ago•0 comments

Show HN: Sofia Core – Open-source AI infrastructure with biological computing

https://github.com/emeraldorbit/sofia-core-backend
1•emeraldorbit•9m ago•0 comments

Ask HN: How do you maintain integrations once they're in production?

1•ksvmkoundinya•10m ago•0 comments

"Infinite Jest" Has Turned Thirty. Have We Forgotten How to Read It?

https://www.newyorker.com/magazine/2026/02/02/infinite-jest-david-foster-wallace-anniversary-book...
1•B1FF_PSUVM•10m ago•0 comments

Show HN: SubAnalyzer subdomain discovery and external attack surface map tool

https://subanalyzer.com
1•TallSession9532•10m ago•0 comments

31-year old VT220 terminfo curses bug

https://lists.gnu.org/archive/html/bug-ncurses/2026-02/msg00004.html
1•mprovost•11m ago•1 comments

Interlock (Engineering)

https://en.wikipedia.org/wiki/Interlock_(engineering)
1•downboots•13m ago•0 comments

Master of Science in Applied Ontology (Fully Online)

http://ontology.buffalo.edu/
1•hackandthink•13m ago•0 comments

Creating a Programming Language Using Coding Agents on GitHub

https://dsyme.net/2026/02/08/july-2025-creating-a-compiler-with-a-swarm/
1•laurentlb•17m ago•0 comments

Hollywood Is Losing Audiences to AI Fatigue

https://www.wired.com/story/hollywood-is-losing-audiences-to-ai-fatigue/
1•saikatsg•19m ago•3 comments

SOK: On the Analysis of Web Browser Security (2021)

https://arxiv.org/abs/2112.15561
1•walterbell•20m ago•0 comments

An Analysis of Poptropica's Mancala

https://farlow.dev/2026/02/08/an-analysis-of-poptropicas-mancala
2•farlow•23m ago•0 comments

Why Improving VO₂ Max Increases Confidence Outside of Workouts

https://www.vo2maxpro.com/blog/vo2-max-confidence-beyond-workouts
1•GoodluckH•23m ago•0 comments

Show HN: Nick the Groq – AI Poker Coach- Open Source

https://poker-coacher.vercel.app/
1•hotrod46•26m ago•0 comments

DSA Interview Preparation Guide: Complete 90-Day Roadmap

https://www.dsaprep.dev/blog/dsa-interview-preparation-guide-90-day-roadmap
1•anjandutta•27m ago•0 comments

Ask HN: What Are You Working On? (February 2026)

4•david927•28m ago•8 comments

Ask HN: What made VLIW a good fit for DSPs compared to GPUs?

3•rishabhaiover•28m ago•0 comments

Living hell of North Korea's paradise on Earth scheme back in spotlight in Japan

https://www.theguardian.com/world/2026/feb/01/living-hell-of-north-koreas-paradise-on-earth-schem...
1•PaulHoule•29m ago•0 comments

The Future of Software Engineering

https://www.poberezkin.com/posts/2026-02-07-the-future-of-software-engineering.html
1•ssummoner001•31m ago•1 comments

BBC's Stopmotion 2026 Olympic Winter Games Trailer behind-the-scenes [video]

https://www.youtube.com/watch?v=iF_BJNrt1I4
4•ChrisArchitect•32m ago•1 comments

The next frontier in weight-loss drugs: one-time gene therapy

https://www.washingtonpost.com/health/2026/01/24/fractyl-glp1-gene-therapy/
1•bookofjoe•34m ago•1 comments

Turn any REST API with an OpenAPI spec into queryable Apache Spark tables

https://github.com/Neutrinic/apilytics
1•ZenithR9•35m ago•1 comments

Tegratop – A Comprehensive TUI monitoring tool for Nvidia jetson boards

https://github.com/pythops/tegratop
1•pythops•36m ago•0 comments