Really trivial to have the LLM first filter it down to the sections it cares about and then condense those sections though.
Wrap that process in a small tool and give that to the LLM along with a `fetch` tool that handles credentials based on URLs and agent capabilities explode pretty rapidly.
Even if you're LLM could learn the openai spec, you still have to figure out how to concretely receive a response back. This is necessary for virtually any application build using an LLM and requires support for far, far more use cases than just calling an API.
Consider the following use case: - You need to include some relevant contextual data from a local RAG system. - There are local functions that you want the model to be able to call - The API example you describe - You need to access data from a database
In all of these cases, if you have experience working with LLMs, you've implemented some ad hoc template solution to pass the context into the model. You might have writing something like "Here is the info relevant to this task {{info}}" or "These are the tools you can use {{tools}}", but in each case you've had to craft a prompting solution specific to one problem.
MCP solves this by making a generic interface to sending a wide range of information to the model to make use of. While the hype can be a bit much, it's a pretty good (minus the lack of foresight around security) and obvious solution to this current problem in AI Engineering.
You don't need a spec.
For sending prompts to the LLM you will absolutely need to hand-craft custom prompts anyways, as each model responds slightly different.
Isn't that handled by whatever Tool API you're using? There's usually a `function_call_output` or `tool_result` message type. I haven't had a need for a separate protocol just to send responses.
I agree, I wish, it will be a solved problem eventually. Just feeding a complex data model like that to the paper shredder that is the LLM, for making decisions about whether DELETE or POST is used is just asking for trouble.
We have been adding MCP remote server to louie.ai, think a semantic layer over DBs for automating investigations, analytics, and viz over operational systems. MCP is nice so people can now use from Slack, VS Code, CLI, etc, without us building every single integration when they want to use it outside of our AI notebooks. And same starting point of openAPI spec, and even better, fastapi standard web framework for the REST layer.
Using frameworks has been good. However, for chat ergonomics, we find we are defining custom tools, as talking directly to REST APIs is better than nothing, but that doesn't mean it's good. The tool layer isn't that fancy, but getting the ergonomics right matters, at least in our experience. Most of our time has been on security and ergonomics. (And for fun, we had an experiment of vibe coding this while hitting enterprise-level quality goals.)
_pdp_•1d ago
Atotalnoob•1d ago
If you allow Y to do X, if an attacker takes control of Y, of course they can do X.
wild_egg•1d ago
I don't think your XY phrasing fully describes the GitHub MCP exploit and curious if you think that's somehow a "token scoping" issue.
fkyoureadthedoc•1d ago
For example, let's say I create an application that lets you chat with my open source repo. I set up my LLM with a GitHub tool. I don't want to think about oauth and getting a token from the end user, so I give it a PAT that I generated from my account. I'm even more lazy so I just used a PAT I already had laying around, and it unfortunately had read/write access to SSH keys. The user can add their ssh key to my account and do malicious things.
Oh no, MCP is super vulnerable, please buy my LLM security product.
If you give the LLM a tool, and you give the LLM input from a user, the user has access to that tool. That shrimple.
wild_egg•1d ago
Also currently on the front page. It's mainly that this tool hits the trifecta of having privileged access, untrusted inputs, and ability to exfiltrate. Most tools only do 1-2 of those so attacks need to be more sophisticated to coordinate that.
rexer•17h ago
Perhaps you could have an allow/deny popup whenever the LLM wanted to interact with a service. But I think the end state there is presenting the user a bunch of metadata about the operation, which the user then needs to reason about. I don't know that's much better; those OAuth prompts are generally click throughs for users.
truemotive•22h ago
I knew it would get bad, but this bad already? I yearn for rigor haha