I spent $250 on my first day doing what felt like harmless testing.
Nothing production. No customers. Just me trying things like:
“Summarize this Slack thread”
“Give me a morning digest”
“Explain this error log”
“Pull action items from the last N messages”
A couple Telegram alerts
At first I blamed OpenClaw. The real issue was simpler: I had Claude set as the default for basically everything, and I accidentally created a workflow where every run got more expensive than the last.
Here’s what actually happened.
“Simple tasks” weren’t simple because the context kept growing I started with “summarize the last 30–50 messages.” Then I kept adding “just one more thing”:
include prior decisions
keep continuity across runs
include relevant earlier context
make it more detailed
That makes results feel smarter, but it turns every request into a bigger prompt. The tricky part is it still feels like the same task, so you don’t notice the cost drift until the number is already big.
Tool output bloat snowballed I let tool outputs flow straight into the next step:
long logs
giant diffs
full API responses
“for debugging” screenshots
Even if one run is tolerable, the next run inherits the baggage. This is how testing quietly becomes a token furnace: output becomes input becomes output again.
Scheduled jobs created an “idle → warm-up tax” loop I had cron-ish jobs that ran, went idle, then ran again.
If your setup effectively re-establishes a big prompt footprint on each run, you keep paying the setup cost repeatedly. It’s not one catastrophic request. It’s lots of medium ones with repeated overhead.
Duplicates from retries/triggers A couple times I saw behavior consistent with “the same expensive work executed twice”:
transient slowdowns causing retries
duplicated triggers from chat integrations
One duplicated summarization run isn’t a rounding error when the prompt is already bloated.
So why did it hit $250 so fast? Because Claude was my default hammer for every nail, and I unintentionally designed the system to feed itself bigger and bigger inputs.
What fixed it (the boring, effective stuff)
- Hard caps on what gets summarized (smaller windows, tighter selection)
- Aggressive trimming of tool output (only keep what the next step truly needs)
- Removed screenshots unless strictly required
- Forced “fresh session” boundaries for scheduled jobs so context can’t grow forever
- Output length ceilings so digests can’t become essays
- De-duped triggers and made retries safer to avoid re-running the same job twice
- And the biggest one: stop using the most expensive model by default for routine steps
The part that pushed me into building something After that first-day bill, the pattern was obvious: relying on discipline (“I’ll remember to switch models later”) doesn’t scale.
Claude was the immediate cost driver, so I took the routing model I’d built for Agentlify and adapted it into a custom routing layer specifically for OpenClaw: cheap/fast models for routine steps, only escalate to Claude when the task actually needs it. That became https://clawpane.co
Not linking anything here. The point isn’t “buy my thing.” The point is that routing stops being an optimization and becomes a seatbelt once you’ve had one day like this.
Takeaway If you’re trialing agent workflows and your bill is spiking, it’s usually not one big request. It’s:
- context creep
- tool payloads piling up
- scheduled runs repeatedly paying warm-up overhead
- occasional duplicates
and an expensive default model doing work that doesn’t require it.
If you want, reply with what tasks you’re running and what your defaults look like. I’ll tell you where the spend usually hides.