It's interesting how architectural patterns built at large tech companies (for completely different use-cases than AI) have become so relevant to the AI execution space.
You see a lot of AI startups learning the hard way that value of event sourcing and (eventually) durable execution, but these patterns aren't commonly adopted on Day 1. I blame the AI frameworks.
(disclaimer - currently working on a durable execution platform)
A lot of the time the dashboard contents doesn't actually matter anyway, just needs to look pretty...
On a serious note, the systems being built now will eventually be "correct enough most of the time" and that will be good enough (read: cheaper than doing it any other way).
Of course that still doesn't mean that you should do that. If you want to maximize model's performance, offload as much distracting stuff as possible to the code.
Is this true for all tool calls? Even if the tool returns little data?
also looking into "fire and forget" tools, to see even if that is possible
Use grep & edit lines. and sequences instead of full files.
This way you can edit files with 50kl loc without issue while Claude will blow out if you ever try to write such file.
Most MCP are replicating API. Returning blobs of data.
1. This is using a lot of input context in formating as JSON and escaping a Json inside already a JSON. 2. This contain a lot of irrelevant information that you can same on it.
So the issue is the MCP tool. It should instead flaten the data as possible as it's going back again thru JSON Encoding. And if needed remove some fields.
So MCP SAAS here are mainly API gateways.
That brings this noise! And most of ALL they are not optimizing MCP's.
There's nothing stopping your endpoints from returning data in some other format. LLMs actually seem to excel with XML for instance. But you could just use a template to define some narrative text.
XML seems more text heavy, more tokens. However, maybe more context helps?
def process(param1, param2):
my_data = mcp_get_data(param1)
sorted_data = mcp_sort(my_data, by=param2)
return sorted_data
No one has found anything revolutionary yet, but there are some useful applications to be sure.
If your tools are calling APIs on-behalf of users, it's better to use OAuth flows to enable users of the app to give explicit consent to the APIs/scopes they want the tools to access. That way, tools use scoped tokens to make calls instead of hard to manage, maintain API keys (or even client credentials).
Slight tangent, but as a long term user of OpenAI models, I was surprised at how well Claude Sonnet 3.7 through the desktop app handles multi-hop problem solving using tools (over MCP). As long as tool descriptions are good, it’s quite capable of chaining and “lateral thinking” without any customisation of the system or user prompts.
For those of you using Sonnet over API: is this behaviour similar there out of the box? If not, does simply pasting the recently exfiltrated[1] “agentic” prompt into the API system prompt get you (most of the way) there?
This would be more reliable than expecting the LLM to generate working code 100% of the time?
MCP is literally just a wrapper around an API call, but because it has some LLM buzz sprinkled on top, people expect it to do some magic, when they wouldn't expect the same magic from the underlying API.
Or, if I gave the LLM a list of my users and asked it to filter based on some criteria, the grammar would change to only output user IDs that existed in my list.
I don't know how useful this would be in practice, but at least it would make it impossible for the LLM to hallucinate for these cases.
avereveard•5h ago
I guess one could in principle wrap the entire execution block into a distributed transaction, but llm try to make code that is robust, which works against this pattern as it makes hard to understand failure
jngiam1•4h ago
For example, when the code execution fails mid-way, we really want the model to be able to pick up from where it failed (with the states of the variables at the time of failure) and be able to continue from there.
We've found that the LLM is able to generate correct code that picks up gracefully. The hard part now is building the runtime that makes that possible; we've something that works pretty well in many cases now in production at Lutra.
hooverd•4h ago
avereveard•3h ago
latency tho, would be unbearable for real time.
avereveard•3h ago
jngiam1•3h ago