Why do so many "agentic AI" systems collapse without persistent state?

3•JohannesGlaser•1mo ago

I’ve been thinking a lot about what’s currently called “agentic AI”. Many systems try to achieve agent-like behavior through planning, tool use, orchestration layers, or increasingly careful prompting. In practice, what I keep running into is that these systems don’t fail because models can’t reason or plan, but because they lack stable state. Without persistent state, coherence has to be re-established every turn. The result is longer prompts, retrieval pipelines, guardrails, and corrective instructions — all of which help access information, but don’t really solve continuity over time.

I’ve been experimenting with a different approach: making state explicit and persistent outside the model, but directly attached to the assistant’s working environment. Append-only logs, rules, inventories, histories — readable files that the model initializes from every run. Not queried opportunistically like a vector DB, just present as working context. Once state is stable, a lot of “agentic” behavior seems to emerge naturally. The system stops reacting moment by moment and starts behaving coherently across longer timescales.

I’m curious how others here see this: Is persistent state under-discussed compared to planning and tooling? For those building agents with RAG / LangChain / similar stacks: how do you handle continuity across days or weeks? Am I underestimating what current agent frameworks already solve here?

Would love technical perspectives or counterexamples.

Comments

verdverm•1mo ago

> readable files that the model initializes from every run

this is how AGENTS.md et al. are supposed to work, you can include many more things, like ...

> Append-only logs, rules, inventories, histories

I include open terminals and files for example, these may make it into the system prompt. The same problem arises here, how much and when. Same story for tools, mcp, skills.

> Without persistent state

There are a lot of different ways people are approaching this. In the end, you are just prewarming a cache (system prompt). The next step is to give the agent control over that system prompt towards self-controlled / dynamic context engineering.

You, increasingly in collaboration with an agent, are doing context engineering. One can take the analogy towards a memory or knowledge hierarchy. You're also going to want a table of contents or librarian (context collecting subagent or phase, search, lots of open design space here)

JohannesGlaser•1mo ago

I think there’s an important distinction here.

What you describe (AGENTS.md, open files, terminals, system prompts) is still context shaping inside the prompt space. It’s about what to load and how much, and yes, that quickly turns into dynamic context engineering.

What I’m experimenting with is one step earlier: treating state as an external artifact, not as an emergent property of the prompt. The files aren’t hints or instructions that compete for relevance, but the assistant’s working state itself. On initialization, the model doesn’t decide what to pull in; it reconstructs orientation from a fixed set of artifacts.

In that sense it’s not prewarming a cache so much as rebuilding a process from disk. Forgetting, correction, and continuity are handled by explicitly changing those artifacts, not by prompt evolution.

I agree there’s a lot of open design space here. My main point was that persistent state tends to be discussed as a prompt or retrieval problem, whereas treating it as first-class state changes the failure modes quite a bit.

Curious how far current agent frameworks really go in that direction in practice.

verdverm•1mo ago

What you are describing is context construction. When working with agents and LLMs, the only way you get anything beyond their training is through the system prompt and message history. There is nothing else.

You can call them whatever fancy notions and anthropomorphic concepts you want, but in the end, it is just context engineering, regardless of how and when you create, store, retrieve, and inject artifacts. A good framework will give you building blocks and flexibility in how you use these and how that happens. That's why I use ADK anyway

Maybe you are talking about giving the agent tools for working with this state or cache? I have that in my ADK based setup

If I have a root AGENTS.md, or a user level file of similar nature, and these are always loaded for every conversation, how is what you are talking about different?

JohannesGlaser•1mo ago

At the lowest level, you’re right: everything the model ever sees is context. I’m not claiming a channel beyond tokens. The distinction I’m trying to draw isn’t where state ends up, but how it is governed.

AGENTS.md (and similar conventions) are a good step toward making agent context explicit. But they are still instructional artifacts: static guidance that gets loaded into the prompt. They don’t define a state lifecycle. They don’t encode history, correction, or invalidation over time. And they don’t change unless a human edits them. In most agent setups I’ve worked with, “state” is assembled per turn. An agent or orchestration layer decides what to include, summarize, drop, or rewrite. That makes continuity an emergent property of context engineering. It works locally, but over time you see drift, silent overwrites, and loss of accountability.

What I’m experimenting with is treating state as a process artifact, not an input artifact. The assistant doesn’t curate its own context. On startup, it reconstructs orientation from a fixed, inspectable set of external files — logs, rules, inventories — with explicit lifecycle rules. State changes happen deliberately (append, correct, invalidate), not implicitly via prompt evolution.

So yes, the model ultimately reads tokens. But forgetting, correction, and continuity are handled outside the prompt logic. The prompt becomes closer to a bootloader than a workspace.

If you always load a root AGENTS.md plus a stable artifact set, the surface can look similar. In practice, the difference shows up in failure modes: how systems degrade over weeks instead of minutes.

I’m not arguing current frameworks can’t approximate this — just that persistent state is usually framed as a context problem, rather than as first-class state with explicit lifecycle semantics. That shift changes what “agentic” failure even looks like.

Show HN: Poddley.com – Follow people, not podcasts

Layoffs Surge 118% in January – The Highest Since 2009

Papyrus 114: Homer's Iliad

DicePit – Real-time multiplayer Knucklebones in the browser

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

Show HN: AI Agent Tool That Keeps You in the Loop

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

Achieving Ultra-Fast AI Chat Widgets

Show HN: Runtime Fence – Kill switch for AI agents

Researchers surprised by the brain benefits of cannabis usage in adults over 40

Peter Thiel warns the Antichrist, apocalypse linked to the 'end of modernity'

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

Show HN: Animated beach scene, made with CSS

An update on unredacting select Epstein files – DBC12.pdf liberated

Was going to share my work

Pitchfork: A devilishly good process manager for developers

You Are Here

Why social apps need to become proactive, not reactive

How patient are AI scrapers, anyway? – Random Thoughts

Vouch: A contributor trust management system

I built a terminal monitoring app and custom firmware for a clock with Claude

Tiny C Compiler

Y Combinator Founder Organizes 'March for Billionaires'

Ask HN: Need feedback on the idea I'm working on

OpenClaw Addresses Security Risks

Apple finalizes Gemini / Siri deal

Italy Railways Sabotaged

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

Nintendo Wii Themed Portfolio

"There must be something like the opposite of suicide "