As a control freak, the differences in how these two tool-calling approaches got me thinking:
How will open source enable standardized tool-calling for agents so we do not have to build and support custom tool-calling harnesses on our own?
I wanted to share an architecture design pattern we're using to mitigate custom code for tool-calling in many components/subsystems. We open sourced our local OATs coding agent on GitHub https://github.com/district-solutions/open-agent-tools-coder. I run coder with a large local model that delegates tool calling to smaller local models. The coder includes vLLM deployments in the stacks dir https://github.com/district-solutions/open-agent-tools-coder/tree/main/stack for running Qwen36 27B and 35B with tool-calling delegation to functiongemma.
On startup, coder looks for a preprocessed, large JSON index of supported tools. We open sourced the OATs Tool-Calling Prompt Index for >141K Tools on GitHub https://github.com/district-solutions/open-agent-tools#openagent-tools-oats to help everyone use the same patterns (hopefully!). I think of OATs as a "thinking cap". Once that cap is on the smaller models only process a reduced set of tools. This tool-call guidance enables a local large model to delegate "a list of instructions" to a smaller model(s) that can be running on remote devices (I have functiongemma running on laptops with old gpus too e.g. mobile nvidia 3060). This allows for laptops to run local commands with a set of local models: one for the db, one for the api, one for the frontend, one for coding...
Here's the demo video with coder calling functiongemma:
https://asciinema.org/a/3ZhMCyUKjr2dmIH1
What else can we reuse?
- Published the OATs Prompt Index JSON to GitHub and the dataset to HuggingFace https://huggingface.co/datasets/open-agent-tools/open-tools as parquet files which should enable local training and usage with faster tools than json parsers.
Fundamental Trust Issues - Who watches the agent?
Once coder was running +200 local commands overnight with 1 prompt, we started seeing negative side effects around these use cases:
Change Management
- What did coder change? - What did it run? - Why did it choose this tool or that among a sequence of 200+ calls?
Code Reviews
- How do we keep up with changes at this speed?
Things got sketchy fast
- 6-7 weeks ago, I can't prove this but I'm 99% confident coder dropped the tables in non-prod db.
Shit. How do I stop this? How many other people are going to get wrecked by this?
I hope OATs can help you prevent unexpected tool calls doing unexpected things on your env.
- Monitoring - Coder tracks all tool calls for auditing and reviewing. I run many mattermost instances where agents post tool call audit logs for review by humans/agents in specific channels. This allows for tracking stuck agents and watching what they are doing, and I can archive all chats into parquet files for training later. - Human curated approved tools - I open sourced the huge prompt index to make a point, with >141,000 tools, which tools are approved by your team and by security? OATs coder uses 1 json dictionary Prompt Index file to map prompts to local source code. Whatever you change in that json Prompt Index file, coder will support. If you want to link "superhappy" as a prompt to call your already-working local code for: "reading an open-webui note" or "reading an open-webui knowledge collection", just edit the file and save. - Here's a 3 part blog series on how coder works: https://districtsolutions.ai/blog
Thanks for your time!