I made a costly mistake with Claude Code: I let context accumulate across long sessions and relied on auto-compaction. What surprised me is that the output quality stayed good. I was happy with the results. There was no obvious “something is wrong” signal. The only thing that went wrong was the bill.
What happened - I kept iterating in the same long-lived context. - Auto-compaction kicked in when needed. - Sonnet was used by default most of the time. - From a UX perspective, everything felt fine.
Why this is dangerous - There’s no clear quality degradation to warn you. - Token usage grows invisibly in the background. - Auto-compaction itself consumes a lot of tokens. - You only realize something is wrong when you look at the invoice.
Root causes - Long-lived context feels convenient, so you don’t reset it. - Tooling doesn’t surface cost as a first-class signal. - The mental model (“LLMs can remember my project”) is seductive but expensive. - Defaulting to a large model makes the problem much worse.
What I changed - One task = one fresh context. - Externalized memory: project state lives in `context.md` / `decisions.md`, not in prompt history. - Default to a smaller model; large models only for design/architecture. - Diff-only outputs: no full file re-dumps. - Disabled auto-compaction; summaries now live in docs. - Added cost visibility: token counters and budget caps.
Takeaway This isn’t about output quality degrading. It’s about cost scaling quietly without feedback. LLMs make it too easy to accumulate invisible technical debt in tokens.
If tooling doesn’t make cost visible, people will keep doing this.