Improving MCP tool call performance through LLM code generation

https://github.com/zbowling/mcpcodeserver

1•zbowling•3mo ago

Comments

zbowling•3mo ago

I hacked together a new MCP server this weekend that can significantly cut down the overhead with direct tool calling with LLMs inside different agents, especially when making multiple tool calls in a more complex workflow. Inspired by the recent blog post by Cloudflare for their CodeMod MCP server and the original Apple white paper, I hacked together a new MCP server that is a lot better than the Cloudflare server in several ways. One of them being not relying on their backends to isolate the execution of the tool calling but also just generally better support around all the features in MCP and also significantly better interface generation and LLM tool hinting to save on context window tokens. This implementation can also scale to a lot more child servers more cleanly.

Most LLMs are naturally better at code generation than they are at tool calling with code understanding being more foundational to their knowledge and tool calling being pound into models in later stages during fine tuning. It can also burn an excessive number of tokens passing data between tools via LLMs in these agent orchestrators. But if you move the tool calling to be done by code rather than directly by the LLMs in the agents and have the LLMs generate that code, it can produce significantly better results for complex cases and reduce overhead with passing data between tool calls.

This implementation works as an MCP server proxy basically. As an MCP server, it is also an MCP client to your child servers. In the middle it hosts a node VM to execute code generated by the LLM to make tool calls indirectly. By introspecting the child MCP servers and converting their tool call interfaces to small condensed typescript API declarations, your LLM can generate code that invokes these tools in the provided node VM instead of invoking directly and do the complex processing of the response handling and errors in code instead of directly. This can be really powerful with when doing multiple tool calls in parallel or with logic around processing. And since it's a node VM, it has access to standard node models and built in standard libraries there.

One issue is if your tool calls are actually simple, like doing a basic web search or a single tool call, this can a bit more unnecessary overhead. But the more complex the prompt, the more this approach can significantly improve the quality of the output and lower your inference billing costs.

Goal: Ship 1M Lines of Code Daily

Show HN: Codex-mem, 90% fewer tokens for Codex

FastLangML: FastLangML:Context‑aware lang detector for short conversational text

LineageOS 23.2

Crypto Deposit Frauds

Substack makes money from hosting Nazi newsletters

Framing an LLM as a safety researcher changes its language, not its judgement

Are there anyone interested about a creator economy startup

Show HN: Skill Lab – CLI tool for testing and quality scoring agent skills

2003: What is Google's Ultimate Goal? [video]

Roger Ebert Reviews "The Shawshank Redemption"

Busy Months in KDE Linux

Zram as Swap

Green’s Dictionary of Slang - Five hundred years of the vulgar tongue

Nvidia CEO Says AI Capital Spending Is Appropriate, Sustainable

Show HN: StyloShare – privacy-first anonymous file sharing with zero sign-up

Part 1 the Persistent Vault Issue: Your Encryption Strategy Has a Shelf Life

Show HN: Teleop_xr – Modular WebXR solution for bimanual robot teleoperation

The Highest Exam: How the Gaokao Shapes China

Open-source framework for tracking prediction accuracy

India's Sarvan AI LLM launches Indic-language focused models

Show HN: CryptoClaw – open-source AI agent with built-in wallet and DeFi skills

ShowHN: Make OpenClaw respond in Scarlett Johansson’s AI Voice from the Film Her

CReact Version 0.3.0 Released

Show HN: CReact – AI Powered AWS Website Generator

The rocky 1960s origins of online dating (2025)

Show HN: Agent-fetch – Sandboxed HTTP client with SSRF protection for AI agents

Why there is no official statement from Substack about the data leak

Effects of Zepbound on Stool Quality

Show HN: Seedance 2.0 – The Most Powerful AI Video Generator