I love local LLMs for interactive fiction, but I got tired of the "Naked Model" paradox. If you tell a standard RLHF'd model "I drink the health potion," it will gleefully describe you drinking it—even if your inventory is completely empty. They have no object permanence and are biased to be sycophantic assistants.
To fix this, I built BoneAmanita, an architecture that puts a fine-tuned LLM inside a Python-simulated body.
How it works: It’s a two-part system:
The Brain (GGUF): A custom 3B model (fine-tuned on Llama 3.2 via Unsloth). I scrubbed out the "helpful assistant" RLHF and trained it strictly on atmospheric, sensory, and philosophical prose.
The Body (Python Engine): A local terminal hypervisor that runs a physical simulation. It tracks variables like "ATP" (stamina), "ROS" (trauma), "Voltage," and Cortisol.
The Feedback Loop:
The Python engine intercepts every turn and dynamically rewrites the LLM's system prompt based on its metabolic state. If you stress the engine out with high-entropy actions, its simulated Cortisol spikes. The Python engine injects a strict prompt override forcing the LLM to output short, fragmented, defensive sentences. It literally gets exhausted.Solving the Hallucination Problem (The Gordon Shock): To enforce hard physics, the Python engine manages a strict inventory state. If you attempt an impossible action (e.g., washing a car in a forest), an internal interceptor ("Gordon") catches the premise violation before the LLM can "Yes, and..." you. Gordon violently injects a CRITICAL OVERRIDE into the context window, forcing the LLM to coldly reject the action and ground you in reality.
It boots into 4 modes (Adventure, Conversation, Creative, Technical) depending on how strict you want the physics engine to be.
You can pull the brain straight through Ollama: ollama pull hf.co/aedmark/vsl-cryosomatic-hypervisor
And run the Python hypervisor here: https://github.com/aedmark/BoneAmanita
It’s completely free, local, and released under The Unlicense.
Come play!