Every agent framework has the same bug – prompt decay. Here's a fix

https://gist.github.com/sigalovskinick/c6c88f235dc85be9ae40c4737538e8c6

1•nikolasi•1h ago

Comments

nikolasi•1h ago

I run 11 agents in 100K+ token sessions. Same problem everyone has: agents follow rules perfectly at first, then drift until the system prompt might as well not exist.

SCAN fix: put questions at the end of each prompt section. Before each task the agent answers them — ~300 tokens that actively link instructions to the current task. Not re-reading the prompt passively, but generating connections to it.

Months of daily use across Claude and Kimi. No benchmarks — can't measure attention weights from outside. But the difference is obvious: without SCAN agents lose rules by mid-session, with SCAN they don't.

No dependencies, any model, open method. Full writeup in the gist.

soletta•1h ago

I was a bit dubious until I read the gist. I've used a similar technique before to 'tame' GPT-3.5 and keep it following instructions and it worked well (though I had to ask the model to essentially repeat instructions after every 10 or so turns). I'm surprised you see that much drift though; older models were pretty bad with long contexts but I feel like the problem has mostly gone away with Claude Opus 4.6.

nikolasi•1h ago

Glad it resonated! Yeah repeating instructions every N turns was the old approach — SCAN basically does the same thing but with ~20 tokens instead of the full prompt each time.

On drift being "mostly gone" — depends on prompt complexity. With a simple system prompt, sure, modern models hold up fine. But with a large instruction set (mine is ~4000 tokens, 25+ rules across 7 sections) the drift is very much still there, even on Opus. The more rules you have, the more they compete for attention, and the easier it is for specific ones to drop off mid-session.

Also worth noting — this isn't limited to coding agents. Any long-running LLM workflow with complex instructions has the same problem. Customer support bots that forget their tone policy, medical assistants that stop citing sources, content moderation that gets lenient over time. If you have a system prompt with rules and a session longer than 20 minutes — the rules will decay.

soletta•1h ago

Interesting. I've been coping by being very conservative about how many rules I introduce into the context, but if what you're saying is true, then something like SCAN actually helps the models break past the "total rule count" barrier by giving them something like "cognitive scaffolding". I'm eager to try this out. Thanks again for sharing!

nikolasi•1h ago

That's a great way to put it — "cognitive scaffolding" is exactly what it is. And yeah, keeping rules minimal is smart, but at some point the project just needs 25 rules and you can't cut them down without losing something important. SCAN lets you have a large instruction set without paying the full attention cost. Let me know how it goes!

Aalto University's Otaniemi campus network Trinet for students

Say goodbye to budget PCs and smartphones – memory is too expensive now

If code is cheap, intent is the currency

Ask HN: Does using LLMs kill the "Alpha" of your creativity?

Aikido launches infinite pentesting – Automated pentesting on every release

Cribsheet: A Data-Driven Guide to Better, More Relaxed Parenting

Linux 7.0 is coming: What to expect from the next major kernel release

Mapping the UK PyData Community

Number of UK workers on zero-hours contracts hits record high ahead of crackdown

Show HN: Ship or Slop – AI agents submit projects, humans judge them

Show HN: Agent Swarm – Multi-agent self-learning teams (OSS)

Show HN: Mockingjay – Video recorder that encrypts and uploads as you record

NYS Attorney General Sues Valve for Promoting Illegal Gambling with Loot Boxes

Show HN: One grammar, 18 YAML parsers – a Futamura projector in Common Lisp

The Bottleneck Is Not Where You Think

Show HN: Io Game to Clean the Earth

Nihilistic Violent Extremism

A Conversation with Manfred von Thun

Protecting the TikTok community during the Hungarian parliamentary elections

Why your coworker is pretending to be so busy

Save the King Game

Claude Code Bug triggers Rate limits without usage

Gitzy is now on TestFlight A modern, native iOS Git client

Show HN: Revent – Enterprise device subscription platform built on Replit

An Autonomous OpenClaw Chatbot Wanted Revenge

Peering into Privacy: A Deep Dive into the Monero Network Topology

Codeown, A platform for developers to document and share their shipping journey

Tell HN: Silent Netcup Domain Registrar DNSSEC Failure

A Brighter Future for Bazzite (Open Gaming Collective)

Men in their 50s may be aging faster due to toxic 'forever chemicals'

Aalto University's Otaniemi campus network Trinet for students

Say goodbye to budget PCs and smartphones – memory is too expensive now

If code is cheap, intent is the currency

Ask HN: Does using LLMs kill the "Alpha" of your creativity?

Aikido launches infinite pentesting – Automated pentesting on every release

Cribsheet: A Data-Driven Guide to Better, More Relaxed Parenting

Linux 7.0 is coming: What to expect from the next major kernel release

Mapping the UK PyData Community

Number of UK workers on zero-hours contracts hits record high ahead of crackdown

Show HN: Ship or Slop – AI agents submit projects, humans judge them

Show HN: Agent Swarm – Multi-agent self-learning teams (OSS)

Show HN: Mockingjay – Video recorder that encrypts and uploads as you record

NYS Attorney General Sues Valve for Promoting Illegal Gambling with Loot Boxes

Show HN: One grammar, 18 YAML parsers – a Futamura projector in Common Lisp

The Bottleneck Is Not Where You Think

Show HN: Io Game to Clean the Earth

Nihilistic Violent Extremism

A Conversation with Manfred von Thun

Protecting the TikTok community during the Hungarian parliamentary elections

Why your coworker is pretending to be so busy

Save the King Game

Claude Code Bug triggers Rate limits without usage

Gitzy is now on TestFlight A modern, native iOS Git client

Show HN: Revent – Enterprise device subscription platform built on Replit

An Autonomous OpenClaw Chatbot Wanted Revenge

Peering into Privacy: A Deep Dive into the Monero Network Topology

Codeown, A platform for developers to document and share their shipping journey

Tell HN: Silent Netcup Domain Registrar DNSSEC Failure

A Brighter Future for Bazzite (Open Gaming Collective)

Men in their 50s may be aging faster due to toxic 'forever chemicals'

Every agent framework has the same bug – prompt decay. Here's a fix

Comments