What makes it different from AI Dungeon: the AI doesn't just generate text. It emits structured commands that change the music, move NPCs between locations, give/remove items, swap character portraits based on emotional reactions, and trigger cutscenes. Cinematic stills are generated on the fly with Flux 2 Klein 4B, and characters are voiced in real time via Inworld. Separate AI agents maintain a quest journal and write save summaries. The result feels more like a tabletop RPG session than a chatbot conversation.
The world is hand-crafted, not AI-generated. I wrote all the locations, characters, and lore by hand (Himalayan fantasy setting inspired by travel through Nepal and Bhutan). The AI's job is to run the game inside that authored world. Everyone explores the same world, every playthrough is different.
Stack: Godot 4.5 client, FastAPI backend, WebSocket streaming. Some AI calls use Gemini 3.1 Flash Lite, others use Claude Haiku 4.5 (cannot wait for 4.6). Cutscene images generated on the fly with Flux 2 Klein 4B. Voice TTS via Inworld.
Every turn costs real money in AI inference and I'm covering it until the $100 runs out (which will be a while because these models are SO cheap to run). Happy to answer questions about the architecture.
vunderba•6h ago
Are you at least caching the image gen in something like S3? Even with cheaper models, I would think that would add up fast.
tommywilczek•6h ago
vunderba•6h ago
That way you've got a fallback when credits run out, or if the 3rd party API takes too long to generate a new image, etc.
tommywilczek•6h ago
Honestly the biggest issue I've been facing is google's ability to host its 3 flash/3.1 flash-lite models. They CONSTANTLY return 503: model unavailable due to high usage... Next step will be to play with a more reliable LLM that's still fast and smart enough.
tommywilczek•6h ago