Programmatic Tool Calling for Agents

1•zekejohn•1mo ago

Comments

zekejohn•1mo ago

Hey all :)

I've been working on an open source implementation of Programmatic Tool Calling for Agents, based on cloudflare's codemode & a few anthropic articles, and although i think it can be very powerful in certain usecases, there are some challenges that i would love to have your thoughts on

Instead of traditional agents that burn tens of thousands of tokens loading all tool definitions upfront and compound context with sequential calls, this approach lets agents discover only the tools they need from a file tree of TypeScript SDKs, then write code to one-shot tasks in a single pass.

Although having an agent execute code seems like its ideal as LLMs are great at writing code, there are a few big challenges that i have faced below

The main challenges w/ Programmatic Tool Calling:

- Output Schemas from the Tools

MCP servers or most tool definitions almost never define output schemas, and without knowing what a tool returns, the model hallucinates property names, like think of 'task.title' vs 'task.name' as an example, and the script fails at runtime because it has too guess the shape of the output of a tool. I'm working around this by the classifying tools and by actually calling the tools to infer schemas, but it's really hacky because a single sample misses optional fields, and testing write + destructive tools means creating real or destroying data which is an approach i really dislike and don't think is viable

- Tool Outputs Are Often Plain Strings (returns unstructured data)

Even with perfect schemas and defined shapes, most MCP tools return markdown blobs or plain strings meant for LLM inference. No JSON, no fields to index into and just text. If majority of your tools return in just strings (even when listing data) the main value of codecall is lost because you can't write deterministic code against unstructured data in a string. You're forced back into traditional agent behavior where the LLM interprets text. If you don't control the server or the tool definitions, there's no fix i can really think of.

- Input/Output examples for each Tool (Amplified w/ Programmatic Tool Calling)

The final challenge is that JSON Schema defines structure but not usage patterns. Take that support ticket API example: the schema tells you due_date is a string, but not whether it wants "2024-11-06" or "Nov 6, 2024". It says reporter.id is a string, but is that a UUID or "USR-12345"? When should reporter.contact be populated? How do escalation.level and priority interact? (got this example from an anthropic article covering this)

In traditional tool calling, the model can learn these patterns through trial and error across multiple turns. It tries something, gets an error or unexpected result, and adjusts for the rest But with programmatic tool calling, the model writes a script that might call create_ticket 50 times in a loop for different users. If it misinterprets the date format or ID convention in the first call, all 50 calls fail and so on.

-------------

Although all of these could be fixed by just setting them manually by the user, is there a reliable way we can get the Output Schemas and generate Input/Output examples for each Tool, without actually calling the tool, and without having a user manually input the data?

If anybody is interested, or has any thoughts on Tool Calling for Agents and has any ideas please feel free to share!

Crypto Deposit Frauds

Substack makes money from hosting Nazi newsletters

Framing an LLM as a safety researcher changes its language, not its judgement

Are there anyone interested about a creator economy startup

Show HN: Skill Lab – CLI tool for testing and quality scoring agent skills

2003: What is Google's Ultimate Goal? [video]

Roger Ebert Reviews "The Shawshank Redemption"

Busy Months in KDE Linux

Zram as Swap

Green’s Dictionary of Slang - Five hundred years of the vulgar tongue

Nvidia CEO Says AI Capital Spending Is Appropriate, Sustainable

Show HN: StyloShare – privacy-first anonymous file sharing with zero sign-up

Part 1 the Persistent Vault Issue: Your Encryption Strategy Has a Shelf Life

Show HN: Teleop_xr – Modular WebXR solution for bimanual robot teleoperation

The Highest Exam: How the Gaokao Shapes China

Open-source framework for tracking prediction accuracy

India's Sarvan AI LLM launches Indic-language focused models

Show HN: CryptoClaw – open-source AI agent with built-in wallet and DeFi skills

ShowHN: Make OpenClaw respond in Scarlett Johansson’s AI Voice from the Film Her

CReact Version 0.3.0 Released

Show HN: CReact – AI Powered AWS Website Generator

The rocky 1960s origins of online dating (2025)

Show HN: Agent-fetch – Sandboxed HTTP client with SSRF protection for AI agents

Why there is no official statement from Substack about the data leak

Effects of Zepbound on Stool Quality

Show HN: Seedance 2.0 – The Most Powerful AI Video Generator

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

Kessler Syndrome Has Started [video]