The goal was simple: measure how token usage grows as you introduce more tools and more conversation turns.
THE SETUP 6 tools (metrics, alerts, topology, neighbors, etc.) gpt-4o-mini Token instrumentation across four phases No caching, no prompt tricks, no compression
THE FOUR PHASES Phase 1: Single tool. One LLM call, one tool schema. Baseline. Phase 2: Six tools. Same query, but the agent exposes six tools. Token growth comes entirely from additional tool definitions. Phase 3: Chained calls. Three sequential tool calls, each feeding into the next. No conversation history yet. Phase 4: Multi-turn conversation. Three turns with full replay of every prior message, tool request, and tool response.
RESULTS Phase 1: 590 tokens Phase 2: 1,250 tokens (2.1x increase) Phase 3: 4,500 tokens (7.6x increase) Phase 4: 7,166 tokens (12.1x increase)
Two non-obvious findings stood out. First, adding 5 more tools roughly doubled token cost. Second, adding two more conversation turns tripled it. Conversation depth drove more token growth than tool count.
WHY THIS HAPPENS LLMs are stateless. Every call must replay full context: tool definitions, conversation history, and previous tool outputs. Adding tools increases context size linearly. Adding conversation turns increases it multiplicatively because each turn resends everything that came before it.
IMPLICATIONS Real systems often have dozens of tools across domains, multi-turn conversations during incidents, and power users issuing many queries per day. Token costs don’t scale linearly. They compound. This isn’t a prompt-engineering issue. It’s an architectural issue. If you get the architecture wrong, you pay for it on every query.
NEXT STEPS I’m measuring the effects of parallel tool execution, conversation history truncation, semantic routing, structured output constraints, and OpenAI’s new prompt caching (which claims large cost reductions on cache hits). Each of these targets a different part of the token-growth pattern.
Happy to share those results as I gather them. Curious how others are managing token expansion in multi-turn, multi-tool agents.