frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Why are large language models so terrible at video games?

https://spectrum.ieee.org/ai-video-games-llms-togelius
17•sxx0•1h ago

Comments

danaris•1h ago
> This brings us to what seems like a contradiction. LLMs are bad at playing games. Yet at the same time, they’re improving rapidly at coding, a skill set that can be used to create a game. How do these facts fit together?

> Togelius: It’s super weird.

...No, it's really not.

They're language models. Code is a language. "Playing a game well" is not. One can, hypothetically, encode game inputs in such a way that it seems kinda-sorta like a language, but it has none of the same kinds of structures that languages—both human and programming—do.

The only way one can think this is strange is if one thinks of LLMs' ability to code rudimentary games as being due to a deeper understanding of games, rather than due to game code being well-represented in their training data.

roxolotl•1h ago
Yea it’s wild watching so many smart people convince themselves that LLMs are general purpose AIs. Don’t get me wrong they are incredibly powerful tools. However being surprised that text models cannot play video games particularly well is like being surprised weather models cannot.
pingou•51m ago
Yet LLMs can play chess and have a "mental" representation of the chessboard.

If LLMs get better but do not progress at playing games when not specifically trained on it it seems to point to a generalisation failure, a limitation that would prevent LLMs to ever achieve AGI, I do not know if that is weird but it seems that for now nobody really knows if they can achieve AGI or not. Perhaps some emergent behavior will arise after more scaling.

To me it's only totally unsurprising if you are 100% certain that LLMs will never reach AGI (like LeCun thinks for example).

IX-103•23m ago
Chess games are in their training set, other games are not.
jiehong•1h ago
Video games are made to entertain humans, so does it really matter whether LLMs are good at playing them?
pixel_popping•1h ago
It matters a lot because it's a real solution for external bots that plays more "fairly" especially in older games. It also allows to test games autonomously, which is huge if we are talking about automated programming.

Imagine if you can bring those AI players to CS 1.6.

vaylian•1h ago
LLMs are the wrong tool for video games. There have been plenty of successful non-LLM AIs that have been trained with reinforcement learning to play games.

If you want to implement actual bots inside the game, then you want to use explicit logic instead of inferred logic. It's much more efficient and easier to debug.

If you want to create Bots for an existing game, which doesn't have its own pre-programmed bots, then you should look at other types of AI. See https://www.geeksforgeeks.org/deep-learning/reinforcement-le...

nubinetwork•46m ago
The headshot/spin bots didn't need ai, all they had to do was ask the server where you were standing, and teleported to your location.
cultofmetatron•1h ago
cough JEPA cough
voidUpdate•1h ago
Its almost like the Large Language Model has trouble with things that arent Language, such as realtime controller input and video output from a game
nubinetwork•48m ago
I know someone who tried the "aibot plays pokemon" thing...

From what I saw, even if you frame advance every single frame, they still don't seem to grasp the concept of "I need to hold down this button for a few frames until x happens"...

There's no concept of time, just a never ending state machine thats constantly changing state.

panarchy•1h ago
I actually really miss all the research being done on having (reinforcement learning) AIs beat Atari games and the like. Or the one that stopped at a TV playing random images instead of continuing through the level. Has there been any progress in that field? It seems like LLMs came around and all the projects stopped completely.
nottorp•1h ago
There was good progress in training neural networks to play video games.

Unfortunately it doesn't seem to fit in some people's context because it was a few years ago.

Kind reminder: there is "AI" beyond LLMs.

kingstnap•26m ago
OpenAI's Dota 2 adventures were super hype back in the days.
deyiao•13m ago
OpenAI Five doesn’t really know how to play games in general — it only knows how to play Dota.
kypro•9m ago
Several years ago I built a simple snake game and wrote a DQN from scratch to learn how to play it.

I was really proud of it at the time because I had to do a decent amount of reading and research since I wrote all of the NN code from scratch and wanted to add some more advanced algorithm optimisations which I hadn't done in previous projects.

I suspect a coding agent could spit the entire project out in 20 minutes now, but it was very cool at the time to build a game then watch my computer learn how to play it in real time.

andunie•59m ago
I wonder if they would be good at text-based games.
ThunderSizzle•52m ago
I wonder if you paired a few different types of AI together, an LLM agent might be good at strategizing -. E.g. building a strategy on how to handle a scenario. But, it would need to know the entire game manual basically. Then it would pass the stratrgy to a better AI in some way. But it might not be needed if the better gaming AI can just do that part too already.

I admit I know nothing about this though.

deadbabe•39m ago
GOAP is a better tool.
ceheaaf•43m ago
It feels like they're really focusing on overstating how confusing and weird it is that an LLM can write code but not play games very well, rather than just explaining it.

Code is text. LLMs are text input/output machines.

Game input/output is not at all text.

LLMs can certainly reason about games with a simple/explicit enough domain (try a risk tournament where models can talk to each other between turns!)

dsabanin•40m ago
Why is a language model bad at video games? I think the answer is stated in the question itself.
jagged-chisel•26m ago
Because they’re large language models. Language doesn’t map onto gameplay.

Choose another “AI” technology and give it about go.

Zobat•20m ago
As others have hinted at LLMs aren't really made in a way that makes them likely to play video games (CS/Halo and such) well. I wonder how they'd fare "against" text based adventures like Zork (which they'll no doubt have ample knowledge about) and newer text based adventure games (which they'll know less about).
deyiao•8m ago
I guess the author’s point is that LLMs can’t really learn in real time yet, whereas playing games is basically all about real-time learning. So an LLM can be very good at writing code, but still be terrible at actually playing games.

Personally, I think this is a really hard problem, and it may turn out to be one of the first big walls we hit on the road to AGI.

A 10 year old Xeon is all you need

https://point.free/blog/gemma-4-on-a-2016-xeon/
166•cafkafk•4h ago•65 comments

Tracing HTTP Requests with Go's net/HTTP/httptrace

https://blainsmith.com/articles/httptrace-with-go/
38•speckx•3d ago•0 comments

Chuwi Minibook X

https://tylercipriani.com/blog/2026/05/28/chuwi-minibook-x/
299•thcipriani•12h ago•225 comments

Cloudflare Turnstile requiring fingerprintable WebGL

https://hacktivis.me/articles/cloudflare-turnstile-webgl-fingerprinting
687•HypnoticOcelot•20h ago•388 comments

Benchmarking SurrealDB 3.x vs. Postgres, Mongo, Neo4j and Redis (With Fsync)

https://surrealdb.com/blog/surrealdb-3-x-by-the-numbers
14•itsezc•2d ago•0 comments

Cessation of public development of Kefir C compiler

https://kefir.protopopov.lv/posts/announce2.html
35•f311a•2h ago•5 comments

Decades of Effort Restore Steelhead and Salmon Passage on Alameda Creek

https://www.fisheries.noaa.gov/feature-story/decades-effort-restore-steelhead-and-salmon-passage-...
133•rawgabbit•2d ago•21 comments

ChatGPT for Google Sheets exfiltrates workbooks

https://www.promptarmor.com/resources/gpt-for-google-sheets-data-exfiltration
234•hackerBanana•14h ago•82 comments

1-Bit Bonsai Image 4B Image Generation for Local Devices

https://prismml.com/news/bonsai-image-4b
399•modinfo•20h ago•167 comments

Dav2d

https://jbkempf.com/blog/2026/dav2d/
499•captain_bender•23h ago•176 comments

Rubin Tracks Skyscraper-Size Asteroids and Failed Supernovas

https://www.quantamagazine.org/rubin-tracks-skyscraper-size-asteroids-failed-supernovas-and-inter...
32•adm4•7h ago•10 comments

United Airlines 767 returns to Newark after Bluetooth name sparks alert

https://simpleflying.com/united-airlines-767-returns-newark-bluetooth-name-alert/
367•Eridanus2•22h ago•712 comments

Two Ways to Draw Infinite Jest's Sierpinski Gasket

https://www.chiply.dev/post-ij-sierpinski
27•chiply•3d ago•28 comments

Meta launches Instagram, Facebook, and WhatsApp subscriptions

https://techcrunch.com/2026/05/27/meta-officially-launches-instagram-facebook-and-whatsapp-subscr...
232•tambourine_man•18h ago•368 comments

The Genius of the Barn Owl's Feathers

https://thereader.mitpress.mit.edu/the-genius-of-the-barn-owls-feathers/
41•EA-3167•3d ago•10 comments

two strangers. one call. no names

https://just2voices.com/
3•whatis1215•1h ago•1 comments

The four programming questions from my 1994 Microsoft internship interview (2023)

https://www.computerenhance.com/p/the-four-programming-questions-from
147•tosh•4d ago•60 comments

What if remote working, not AI, is to blame for weak junior hiring?

https://www.ft.com/content/2205e2d0-50dc-4e80-9bf7-78d0272276c0
184•uxhacker•2d ago•244 comments

Unix in East Germany (GDR) (1990)

https://groups.google.com/g/comp.unix.wizards/c/QX_dxElrVNs
79•downbad_•2d ago•20 comments

Why are large language models so terrible at video games?

https://spectrum.ieee.org/ai-video-games-llms-togelius
17•sxx0•1h ago•24 comments

New Beam Spring Keyboards

https://www.modelfkeyboards.com/product/beam-spring-b104-keyboard/
105•recursivedoubts•2d ago•72 comments

Sony Launches Bravia 9 II and Bravia 7 II with 'True RGB'

https://www.flatpanelshd.com/news.php?subaction=showfull&id=1779897602
34•ksec•4d ago•27 comments

Websites have a new way to spy on visitors: analyzing their SSD activity

https://arstechnica.com/security/2026/05/websites-have-a-new-way-to-spy-on-visitors-analyzing-the...
191•Brajeshwar•3d ago•50 comments

Blorp Language

https://blorp-lang.org/
9•croottree•3h ago•2 comments

The Website Specification

https://specification.website/
506•k1m•1d ago•203 comments

Finding success in industry as a chip designer

https://spectrum.ieee.org/chip-design-academic-vs-industry
42•jnord•2d ago•5 comments

London's Free Roof Terraces

https://diamondgeezer.blogspot.com/2026/05/londons-free-roof-terraces.html
307•zeristor•1d ago•143 comments

The Speed of Prototyping in the Age of AI

https://darylcecile.net/notes/speed-of-prototyping-age-of-ai
170•mooreds•18h ago•86 comments

Restartable Sequences

https://justine.lol/rseq/
233•grappler•20h ago•57 comments

Codex just found a "workaround" of not having sudo on my PC

https://twitter.com/i/status/2060746160558543217
562•thunderbong•16h ago•269 comments