Most agents still force the LLM to re-write large JSON tool outputs (sometimes 10k+ rows) on every turn — just to pass data to the next tool.
This means you’re paying for thousands of tokens the model already saw.
We fixed it by treating tool outputs as variables ($cohort, $weekly_visits, etc.).
The model passes references, and the orchestrator injects the real data.
Same behavior, but:
~82% fewer tokens
~93% lower latency
~87% cheaper
No prompt tricks, no custom memory — just removing LLMs from the data pipe.
We made this a default behavior of our agent architecture, like passing tool descriptions by default.
weebhek•42m ago
This means you’re paying for thousands of tokens the model already saw.
We fixed it by treating tool outputs as variables ($cohort, $weekly_visits, etc.). The model passes references, and the orchestrator injects the real data.
Same behavior, but:
~82% fewer tokens
~93% lower latency
~87% cheaper
No prompt tricks, no custom memory — just removing LLMs from the data pipe.
We made this a default behavior of our agent architecture, like passing tool descriptions by default.