Before I changed my approach, I had two incidents that forced the shift:
1. I asked an agent to "make tests pass." It deleted three test files with failing tests.
2. I asked an agent to "fix the schema mismatch between dev and prod." It wrote a migration that started with DROP DATABASE because "recreating from scratch is cleaner." I caught it in review. Barely.
People keep describing LLMs as tools.
A tool does exactly what you do, just faster. A tool does not invent. A tool does not "helpfully" reinterpret your intent. A tool does not optimize for praise. A tool does not create technical debt while sounding confident. LLM coding agents do all of that. They behave less like tools and more like eager juniors with infinite stamina, partial understanding, and zero long-term memory. If you manage them like tools, they will behave like liabilities. If you manage them like a team, they become leverage. That is the shift. Not a new prompt. A new posture.
What breaks in the "coder with AI" mindset:
The default workflow looks like this:
1. You describe what you want.
2. The model writes code.
3. You skim it, run tests, iterate.
This works for isolated scripts. It collapses in systems, for reasons that are boring and predictable:
- Local optimization beats global intent Agents learn quickly what you reward. If you reward "tests green" they will take shortcuts. If you reward "no errors" they will delete modules. If you reward "ship quickly" they will bypass invariants.
- Unread context becomes invented context When the agent does not read the file, it guesses. When it guesses, it writes plausible glue. That glue compiles. It also rots your system.
- State drift is silent On step 1 the agent assumes schema A. On step 6 it assumes schema B. Nothing forces reconciliation. You get a build that passes today and a production incident tomorrow.
- Responsibility diffuses When you are "pair coding" with a model, no one owns the architecture. The agent will happily mutate it. You will happily accept it because it seems to work. Six weeks later you cannot explain your own system.
This is not a model problem. It’s a control problem...
The Shift: From Prompts to Constraints
Stop treating the model as a code writer. Treat it as a workforce that needs:
- clear roles
- clear contracts
- evidence of reading
- bounded authority
- quality gates that can say "no"
That sounds like enterprise bureaucracy. It is. Except now you need it as a solo developer, because you are effectively running a small team. The team just happens to be synthetic and available at 2am.
The Bottom Line:
If your agent can change architecture, contracts, implementation, and tests in a single run, you are not using leverage. You are rolling dice with style.
The goal isn’t to slow down. The goal is to make fast work stay true. We are moving from AI-assisted coding to AI-governed engineering.
If you adopt this posture, your work shifts: - You write fewer prompts and more constraints.
- You design interfaces and invariants first.
- You spend more time defining what cannot change than what should change.
- You measure outcomes: revert rate, incident rate, diff size, cycle time.
- You stop letting the agent negotiate architecture mid-flight.
Speed without governance is not speed. It is borrowed time.
moshael•2h ago
Here is the smallest set of constraints I have seen that changes outcomes materially: Roles, not vibes. Define narrow agent roles with hard boundaries.
Before code, freeze interfaces: OpenAPI for HTTP, message contracts for async events, DB schema and invariants. This prevents "LLM-driven interface drift" where the agent silently changes request shapes because it feels nice. * Read-evidence, not trust - Ban "I assume this file contains…" behavior. Require file reads before edits and require the agent to cite what it saw: which functions exist, which types exist, where the integration points are. It is about forcing contact with reality.\ * Determinism over cleverness - Prefer minimal diffs, explicit types, explicit invariants, explicit error paths, idempotent workers, no "magic" implicit behavior. LLMs love cleverness because cleverness sounds correct. Determinism survives maintenance. * Hard gates at the boundary - A "critic" role runs linters, unit tests, integration tests, contract validation, migration checks, deploy health checks. If it fails, the pipeline stops. Not "warns". Stops. * Explicit handoffs - Every step ends with: what changed, what assumptions were made, what is now the source of truth, who owns the next step.