LLMs are far better at writing a small program than coordinating multiple tool calls.
So instead of giving the model 10+ tools, I exposed one: a TypeScript sandbox with access to the same interfaces.
The model writes a script → it runs once → done.
What changed - +68% reduction in token use - No multi-step drift or retries - Local models (Llama 3.1 8B / Phi-3) became much more reliable
Repo: https://github.com/universal-tool-calling-protocol/code-mode
Curious whether others have seen the same thing: Is code execution a better abstraction for agents than tool calls?