They’re powerful, but on real codebases they: - read too much irrelevant code - edit outside the intended scope - get stuck in loops (fix → test → fail) - drift away from the task - introduce architectural issues that linters don’t catch
The root issue isn’t the model — it’s: - poor context selection - lack of execution guardrails - no visibility at team/org level
---
What CodeLedger does:
It sits between the developer and the agent and:
1) Gives the agent the right files first 2) Keeps the agent inside the task scope 3) Validates output against architecture + constraints
It works deterministically (no embeddings, no cloud, fully local).
---
Example:
Instead of an agent scanning 100–500 files, CodeLedger narrows it down to ~10–25 relevant files before the first edit :contentReference[oaicite:0]{index=0}
---
What we’re seeing so far:
- ~40% faster task completion - ~50% fewer iterations - significant reduction in token usage
---
Works with: Claude Code, Cursor, Codex, Gemini CLI
---
Repo + setup: https://github.com/codeledgerECF/codeledger
Quick start:
npm install -g @codeledger/cli cd your-project codeledger init codeledger activate --task "Fix null handling in user service"
---
Would love feedback from folks using AI coding tools on larger codebases.
Especially curious: - where agents break down for you today - whether context selection or guardrails are the bigger issue - what other issues are you seeing.