frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Why are large language models so terrible at video games?

https://spectrum.ieee.org/ai-video-games-llms-togelius
15•sxx0•1h ago

Comments

danaris•1h ago
> This brings us to what seems like a contradiction. LLMs are bad at playing games. Yet at the same time, they’re improving rapidly at coding, a skill set that can be used to create a game. How do these facts fit together?

> Togelius: It’s super weird.

...No, it's really not.

They're language models. Code is a language. "Playing a game well" is not. One can, hypothetically, encode game inputs in such a way that it seems kinda-sorta like a language, but it has none of the same kinds of structures that languages—both human and programming—do.

The only way one can think this is strange is if one thinks of LLMs' ability to code rudimentary games as being due to a deeper understanding of games, rather than due to game code being well-represented in their training data.

roxolotl•34m ago
Yea it’s wild watching so many smart people convince themselves that LLMs are general purpose AIs. Don’t get me wrong they are incredibly powerful tools. However being surprised that text models cannot play video games particularly well is like being surprised weather models cannot.
pingou•23m ago
Yet LLMs can play chess and have a "mental" representation of the chessboard.

If LLMs get better but do not progress at playing games when not specifically trained on it it seems to point to a generalisation failure, a limitation that would prevent LLMs to ever achieve AGI, I do not know if that is weird but it seems that for now nobody really knows if they can achieve AGI or not. Perhaps some emergent behavior will arise after more scaling.

To me it's only totally unsurprising if you are 100% certain that LLMs will never reach AGI (like LeCun thinks for example).

jiehong•1h ago
Video games are made to entertain humans, so does it really matter whether LLMs are good at playing them?
pixel_popping•58m ago
It matters a lot because it's a real solution for external bots that plays more "fairly" especially in older games. It also allows to test games autonomously, which is huge if we are talking about automated programming.

Imagine if you can bring those AI players to CS 1.6.

vaylian•53m ago
LLMs are the wrong tool for video games. There have been plenty of successful non-LLM AIs that have been trained with reinforcement learning to play games.

If you want to implement actual bots inside the game, then you want to use explicit logic instead of inferred logic. It's much more efficient and easier to debug.

If you want to create Bots for an existing game, which doesn't have its own pre-programmed bots, then you should look at other types of AI. See https://www.geeksforgeeks.org/deep-learning/reinforcement-le...

nubinetwork•18m ago
The headshot/spin bots didn't need ai, all they had to do was ask the server where you were standing, and teleported to your location.
cultofmetatron•1h ago
cough JEPA cough
voidUpdate•1h ago
Its almost like the Large Language Model has trouble with things that arent Language, such as realtime controller input and video output from a game
nubinetwork•20m ago
I know someone who tried the "aibot plays pokemon" thing...

From what I saw, even if you frame advance every single frame, they still don't seem to grasp the concept of "I need to hold down this button for a few frames until x happens"...

There's no concept of time, just a never ending state machine thats constantly changing state.

panarchy•48m ago
I actually really miss all the research being done on having (reinforcement learning) AIs beat Atari games and the like. Or the one that stopped at a TV playing random images instead of continuing through the level. Has there been any progress in that field? It seems like LLMs came around and all the projects stopped completely.
nottorp•44m ago
There was good progress in training neural networks to play video games.

Unfortunately it doesn't seem to fit in some people's context because it was a few years ago.

Kind reminder: there is "AI" beyond LLMs.

andunie•31m ago
I wonder if they would be good at text-based games.
ThunderSizzle•23m ago
I wonder if you paired a few different types of AI together, an LLM agent might be good at strategizing -. E.g. building a strategy on how to handle a scenario. But, it would need to know the entire game manual basically. Then it would pass the stratrgy to a better AI in some way. But it might not be needed if the better gaming AI can just do that part too already.

I admit I know nothing about this though.

deadbabe•11m ago
GOAP is a better tool.
ceheaaf•14m ago
It feels like they're really focusing on overstating how confusing and weird it is that an LLM can write code but not play games very well, rather than just explaining it.

Code is text. LLMs are text input/output machines.

Game input/output is not at all text.

LLMs can certainly reason about games with a simple/explicit enough domain (try a risk tournament where models can talk to each other between turns!)

dsabanin•11m ago
Why is a language model bad at video games? I think the answer is stated in the question itself.

Efficiently Cooling Satellite Components in Space

https://www.fraunhofer.de/en/press/research-news/2026/june-2026/efficiently-cooling-satellite-com...
2•giuliomagnifico•3m ago•0 comments

Nvidia CEO Jensen Huang speaks at Computex 2026 (2:06:24 Video)

https://www.youtube.com/watch?v=9CJ_MZOOj_E
2•Nevaeh•5m ago•0 comments

A JavaScript PoC FROST side channel a browser tab that senses your SSD activity

https://github.com/brammittendorff/opfs-ssd-timing
1•botw44•6m ago•0 comments

The Spam Economy Comes to Work

https://fffej.substack.com/p/the-spam-economy-comes-to-work
1•PretzelFisch•6m ago•0 comments

The Axis That Made the Chips

https://hoeijmakers.net/the-axis-that-made-the-chips/
1•janvdberg•6m ago•0 comments

Termux: X11 is a fledged X server built with Android NDK

https://github.com/termux/termux-x11
1•ivo8n52•7m ago•0 comments

Undigested fructose linked to anxiety and brain inflammation

https://www.psypost.org/undigested-fructose-linked-to-anxiety-and-brain-inflammation/
2•amichail•7m ago•0 comments

When Background AI Agents Become a Security Boundary Problem

https://www.originhq.com/research/background-c2-agent
1•lucasluitjes•8m ago•0 comments

Nvidia announces new AI chip for personal computers

https://www.bbc.com/news/articles/crmp9mppvzro
1•rishikeshs•11m ago•0 comments

Google's top result is 16yo when searching for "Ubuntu 24.04 install fonts"

https://www.google.com/search?q=ubuntu24.04installfonts
1•felooboolooomba•14m ago•1 comments

Thermal conductivity modulation as a mechanism of thermotolerance in tardigrades

https://royalsocietypublishing.org/rsif/article/23/238/20251033/481636/Thermal-conductivity-modul...
1•bookofjoe•17m ago•0 comments

Why Melanoma Spreads More in Middle Age

https://brieflycurious.com/why-melanoma-spreads-more-in-middle-age-a-mouse-study-points-to-the-im...
1•matkoone•17m ago•0 comments

Sixteen Kids and a Hit Man (2024)

https://nymag.com/intelligencer/article/christopher-pence-corderos-fbi-dark-web-hit-man.html
1•Michelangelo11•20m ago•0 comments

Lynching Postcard

https://en.wikipedia.org/wiki/Lynching_postcard
2•doener•25m ago•0 comments

Show HN: Postbase – 100% open source Alternative to Firebase and Supabase [video]

https://www.youtube.com/watch?v=St_kJZXZ_nE
4•harshalone•27m ago•1 comments

Rebuilding the Access Edge: Why We Replaced PPPoE with a Custom DHCP Server

https://medium.com/@mustafa.n.gaid/rebuilding-the-access-edge-why-we-replaced-pppoe-with-a-custom...
1•musnas•27m ago•0 comments

Show HN: Curtab – Each command in its own interactive terminal tab

https://github.com/rashidmya/curtab
1•rashidmya•31m ago•0 comments

China Aims A.I. At Predicting Who Could Pose a Political Risk

https://www.nytimes.com/2026/06/01/us/politics/china-ai-predicting-dissent.html
5•uxhacker•35m ago•0 comments

Show HN: Open-source sync-engine for managing websites at scale

https://github.com/gospecter/specter
1•aabergkvist•35m ago•0 comments

We entered Fixathon as hackers. We left as winners

https://layerx.xyz/blog/fixathon-win
1•supermalvo•40m ago•0 comments

Am I too pessimistic about Python's future?

2•noon-raccoon•42m ago•2 comments

Code Review Assumes an Author

https://blog.raed.dev/posts/ai-code-review/
1•Raed667•43m ago•0 comments

Show HN: Mochi – a performance-first SveltKit alternative

https://mochi.fast/
1•khromov•44m ago•0 comments

The making of iconic Clint Eastwood posters

https://twitter.com/i/status/2061246916378185742
1•Michelangelo11•45m ago•0 comments

The Rsync thing was inevitable and it's happening everywhere

https://robertjwebb.substack.com/p/the-rsync-thing-was-inevitable-and
2•haburka•45m ago•0 comments

Netflix Wiz creates app to slash AI bills, then open sources it

https://www.theregister.com/ai-ml/2026/05/31/netflix-wiz-creates-app-to-slash-ai-bills-then-open-...
2•pseudolus•52m ago•0 comments

What building payment products taught me about scalable financial infrastructure

https://www.solvimon.com/blog/five-lessons-on-building-scalable-financial-infrastructure
1•arnon•52m ago•0 comments

Australia's far-right party leads in national poll for first time

https://www.reuters.com/world/asia-pacific/australias-far-right-party-leads-national-poll-first-t...
2•KnuthIsGod•53m ago•2 comments

Are API keys too much friction for AI tools in teams?

https://intrascope.app/
1•Intrascopeapp•54m ago•0 comments

Fine-tuning an LLM to write docs like it's 1995

https://passo.uno/fine-tuning-docs-llm/
1•theletterf•56m ago•0 comments