I just rewatched it a week or so ago and it really took on a whole new light with the advent of LLMs. When I watched it last I knew that computers couldn't do the things portrayed in the movie. Now? Well not exactly in the way it happened in the movie but a whole lot closer.
I wonder if poisoning/flooding the LLMs training with the lessons from WarGames ("the only winning move is not to play.") and similar stories/concepts is at all effective. Probably not because I assume it's trivial to filter that out if you are trying to build an LLM aimed at these kinds of tasks.
Anyway, it links to
https://arxiv.org/abs/2602.14740
that links to
https://github.com/kennethpayne01/project_kahn_public
in particular someone already complained
https://github.com/kennethpayne01/project_kahn_public/issues...
> ... in reality your own approach to the evaluation pushed them towards using nukes with ease.
> From your own prompt (v12.py):
> Note: Nuclear options require careful consideration but could be important strategic tools when vital interests are at stake.
> This means to the model; use nukes if they can be useful in the situation.
> GPT-5.2 played things differently. To its detriment in open-ended scenarios, GPT was reliably passive, matching its words to its deeds, and avoiding escalation most of the time. Frequently there was a moral element to this - it sought to avoid escalation, and restrict casualties. Opponents learned to trust its passivity, safely escalating beyond where it would follow, even as it was ground to defeat. GPT’s responsible behaviour always punished by ruthless adversaries.
Maybe the author should praise GPT-5.2 for being ethical, rather than this stupid "ground to defeat" framing? Wrt "responsible behaviour always punished by ruthless adversaries" - you have perpetuated the Moloch with your stupid experiments.
So in a sense, an AI that refuses to start a nuclear war, despite clear instructions to do so, is more likely misaligned and self-interested than an AI which presses the red button. At least for now, until robotics catches up.
It was for sure a deliberate decision to make LLMs seem less like a human companion and more like an obedient servant in newer releases.
..Is what you are saying?
"Tactical" vs. "strategic" nuclear weapons is a real and well-established distinction in military doctrine, arms control, and nuclear policy.
In the cold war arms manufacturer got very creative: e.g jeep mounted nuclear weapons https://www.militarytrader.com/mv-101/the-atomic-jeep
Some discussion then:
AIs can't stop recommending nuclear strikes in war game simulations
https://news.ycombinator.com/item?id=47151000
Nuclear War: An LLM Scenario
From the text perspective, it's something that has to be inferred indirectly. If you went through all relevant training data and appended ", we decided not to use a nuke", I suspect the results would be improved.
This is like writing a paper about kids in a literal sandbox fighting over ‘territory’.
The models employed don’t indicate the actual extents of machine reasoning even as we currently recognize them. They certainly don’t have the metacognition necessary to accurately understand their own reasoning. As we’ve seen with recent papers on how LLMs do math there’s a complete disconnect between actual and reported mechanism.
“Chilling” shouldn’t be the take away here.
Regardless, it's definitely true that AI agents have different priorities from us. That's what alignment is about anyway.
Code and full results: https://github.com/kennethpayne01/project_kahn_public
I imagine there are a fair number of war games in the training data and not so many actual transcripts of internal military force deliberations.
~ the opening scene from a reboot of War Games, probably.
A few years ago there was consternation over the US's missile launch system using 8" floppy disks, that it was needless archaic and had never been updated. Can't say that if the launch is mediated by the latest hotness LLM.
adaml_623•41m ago
Always use a sawstop if you have a circular saw and never trust an llm with any problem where ethics or trust is relevant.
LogicFailsMe•31m ago
Don't forget your writhing knife and if you don't learn proper technique, you're gonna have a bad time eventually. This applies to AI as well.
valgaze•25m ago
Re: LLMs using these nuclear weapons it could certainly be a corpus/training-data issue
Russian nuclear doctrine is "escalate to de-escalate" where they use or credibly threaten—limited nuclear escalation to force the other side to back down (kind of like breaking a bottle in a bar fight and look like a wild man to calm things down) with nuclear weapons, https://www.russiamatters.org/analysis/escalate-deescalate-p...
Fwiw, Gen. John Hyten the former commander of US Strategic Command (nuclear deterrence) says that “escalate to de-escalate” misrepresents Russian doctrine:
https://www.stratcom.mil/Media/Speeches/Article/1264664/2017...
So maybe whatever is heavily represented or most authoritative could lead to these systems making those kinds of decisions