My goal was ambitious: a fully autonomous system where an army of AI agents: a Researcher, an Architect, a TDD-tester, and more would take a task and handle everything from planning to deployment. I designed a complex, multi-phase workflow with protocols like `ESCALATION` and detailed "Mission Briefs". On paper, it was a perfect, self-managing machine.
In reality, it was an expensive nightmare. The system was plagued by constant file editing errors, infinite loops that burned through tens of thousands of tokens, and "phantom executions" where the orchestrator would mark a task as complete without writing a single line of code. My job turned from developer to full-time prompt debugger.
In desperation, I posted on Reddit, and the solution wasn't a better prompt. It was a single comment that led me to disable two "experimental" checkboxes in the tool's settings. Miraculously, 90% of the file editing problems vanished.
This led to a painful but crucial experiment: what if I removed all my carefully crafted, super-detailed prompts and went back to the default settings? The result was disheartening: the system performed almost exactly the same.
Read the full story with detailed architecture diagrams and my final, simplified workflow: https://xor01.substack.com/p/my-war-with-ai-agents