* create a basic text adventure (or MUD) with a very spartan api-like representation
* use an LLM to embellish the description served to the user etc. With recent history in context the LLM might even kinda reference things the user asked previously etc.
* have NPCs implemented as own LLMs that are trying to 'play the game'. These might be using the spartan API directly like they are agents.
Its a fun thought experiment!
(An aside: I found that the graphical text adventure that I made for Ludum Dare 23 is still online! Although it doesn't render quite right in modern browsers.. things shouldn't have broken! But anyway https://williame.github.io/ludum_dare_23_tiny_world/)
The challenge for me was consistency in translating free text from dialogs into classic, deterministic game state changes. But what's satisfying is that the conversations aren't just window dressing, they're part of the game mechanic.
I found this to be the actual strenuous work in LLM based development. While it appears like AI has made everything easy and free, the particular challenge of consistently getting deterministic outputs takes serious programming effort. It feels like an entirely new job role. In other words, I wouldn't do this for free, it takes too much effort.
ChatGPT 3.5
Yes, if you are wondering why they don't clarify the model, it because all this was done back in early 2023 (the chat logs are dated). Back then it was only 3.5 and 4 was just freshly released.
Advancement in this space has been so rapid that this is almost like releasing a paper today titled "Video streaming on Mobile Devices" and only using a 3G connection in 2013.
The authors should have held back a few more months and turned the paper into a 3.5 to O3 or any other 2025 SOTA improvement analysis.
I made some AI tools (https://github.com/DougHaber/lair) and added in a tmux tool so that LLMs could interact with terminals. First, I tried Nethack. As expected, it's not good at understanding text "screenshots" and failed miserably.
https://x.com/LeshyLabs/status/1895842345376944454
After that I tried a bunch of the "bsdgames" text games.
Here is a video of it playing a few minutes of Colossal Cave Adventure:
https://www.youtube.com/watch?v=7BMxkWUON70
With this, it could play, but not very well. It gets confused a lot. I was using gpt-4o-mini. Smaller models I could run at home work much worse. It would be interesting to try one of the bigger state of the art models to see how much it helps.
To give it an easier one I also had it hunt the Wumpus:
https://x.com/LeshyLabs/status/1896443294005317701
I didn't try improving this much, so there might be some low hanging fruit even in providing better instructions and tuning what is sent to the LLM. For these, I was hoping I could just hand it a terminal with a game in it and have it play decently. We'll probably get there, but so far it's not that simple.
https://slashdot.org/story/25/07/03/2028252/microsoft-copilo...
https://github.com/derekburgess/dungen
There are some interesting ideas in this paper, but even just role playing with ChatGPT demonstrates how poorly it does at world building and narrative... I was impressed by the Wayfarer model, and I imagine there are other models out there on civit or something that could be used together in some group chat orchestration to create a more dynamic "party" atmosphere.
On page 5, Figure 1, the authors create a hand-written diagram showing the relationship between objects as a graph showing the directionality of edges in 3D space. To me, this implies that you could supply your LLM with a set of tools like getObjectsInGraph, updateGraphRelatingObjectPair, findObjectsRelativeToObject, describePathBetweenObjectsByName... and allow it to maintain that diagram as a structured DAG, and continually ask the game engine questions that let it update that graph in an agentic way. My prediction would be that they'd recreate that diagram, and enable goal seeking, with high fidelity.
Asking an LLM to work without being able to "visualize" and "touch" its environment in its "mind's eye" is tying a hand behind its back. But I'm bullish that we'll find increasingly better ways of adapting 3D/4D world models into textual tools in a way that rapidly changes the possibilities of what LLMs can do.
To use a debugger, you need:
* Some memory of where you've already explored in the code (vs rooms in a dungeon)
* Some wider idea of your current goal / destination (vs a current quest or a treasure)
* A plan for how to get there - but the flexibility to adapt (vs expected path and potential monsters / dead ends)
* A way for managing information you've learned / state you've viewed (vs inventory)
Given text adventures are quite well-documented and there are many of them out there, I'd also like to take time out to experiment (at some point!) with whether presenting a command-line tool as a text adventure might be a useful "API".
e.g. an MCP server that exposes a tool but also provides a mapping of the tools concepts into dungeon adventure concepts (and back). If nothing else, the LLM's reasoning should be pretty entertaining. Maybe playing "make believe" will even make it better at some things - that would be very cool.
When I land on your page I know nothing except you're offering to learn vim "the fun way". I would not have guessed what you described.
Don't put everything behind a wall. At least try to convince people that they want to be on the other side
s-macke•4h ago
For a more in-depth analysis of chatbots playing text adventures, take a look at my project. I haven’t updated it in a while due to time constraints.
[0] https://github.com/s-macke/AdventureAI
s-macke•3h ago
The challenge with benchmarking text adventures lies in their trial-and-error nature. It’s easy to get stuck for hundreds of moves on a minor detail before eventually giving up and trying a different approach.
[0] https://www.twitch.tv/gpt_plays_pokemon
glimshe•3h ago