https://modelcontextprotocol.io/specification/2025-06-18/cha...
My main disappointment with sampling right now is the very limited scope. It'd be nice to support some universal tool calling syntax or something. Otherwise a reasonably complicated MCP Server is still going to need a direct LLM connect.
I don't get how MCP could create a wrapper for all possible LLM inference APIs or why it'd be desirable (that's an awful long leash for me to give out on my API key)
If I'm building a local program, I am going to want tighter control over the toolsets my LLM calls have access to.
E.g. an MCP server for Google Calendar. MCP is not saving me significant time - I can access the same API's the MCP can. I probably need to carefully instruct the LLM on when and how to use the Google Calendar calls, and I don't want to delegate that to a third party.
I also do not want to spin up a bunch of arbitrary processes in whatever runtime environment the MCP is written in. If I'm writing in Python, why do I want my users to have to set up a typescript runtime? God help me if there's a security issue in the MCP wrapper for language_foo.
On the server, things get even more difficult to justify. We have a great tool for having one machine call a process hosted on another machine without knowing it's implementation details: the RPC. MCP just adds a bunch of opinionated middleware (and security holes)
In the limit, I remember some old saw about how every had the same top 3 rows of apps on their iPhone homescreen, but the last row was all different. I bet IT will be managing, and dev teams will make, their own bespoke MCP servers for years to come.
This is what people mean when they say that MCP should maybe wait for a better LLM before going all-in on this design.
To your point that this isn't trivial or universal, there's a sharp gradient that you wouldn't notice if you're just opining on it as opposed to coding against it -- ex. I've spent every waking minute since mid-December on MCP-like territory, and it still bugs me out how worse every model is than Claude at it. It sounds like you have similar experience, though, perhaps not as satisfied with Claude as I am.
> you wouldn't notice if you're just opining on it as opposed to coding against it
Maybe i'm being sensitive but that is perhaps not the way I would have worded that as it reads a bit like an insult. Food for thought.
It's providing a standardized protocol to attach tools (and other stuff) to agents (in an LLM-centric world).
That most MCP usage you'll find out in repositories in the wild is focused on local toolchains is mostly due to MCP on launch being essentially only available via the Claude Desktop client. There they also highlighted many local single-user use-cases (rather than organizational ones). That SSE support was also spotty in most implementations also didn't help.
If you're using the API and not in a hurry, there's no need for it.
Not familiar with elixir, but is there anything prohibiting you from just making a monolith MCP combining multiple disparate API's/backends/microservices as you were doing previously?
Further, you won't get the various client-application integrations with MCP merely using tool-calling; which to me is the "killer app" of MCP (as a sibling comment touches on).
(I do still have mixed feelings about MCP, but in this case MCP sorta wins for me)
This is what I ended up doing.
The reason I thought I must do it the "MCP way" was because of the tons of YouTube videos about MCP which just kept saying how much of an awesome protocol it is, an everyone should be using it, etc. Once I realized it's actually more consumer facing than the backend, it made much more sense as why it became so popular.
This is basically what MCP is. Before MCP, everyone was rolling their own function calling interfaces to every API. Now it’s (slowly) standardising.
An "entire server" is also overplaying what an MCP server is - in the case where an MCP server is just wrapping a single API it can be absolutely tiny, and also can just be a binary that speaks to the MCP client over stdio - it doesn't need to be a standalone server you need to start separately. In which case the MCP server is effectively just a small module.
The problem with making it one server with 100 modules is doing that in a language agnostic way, and MCP solves that with the stdio option. You can make "one server with 100 modules" if you want, just those modules would themselves be MCP servers talking over stdio.
I agree with this. For my particular use-case, I'm completely into Elixir, so, for backend work, it doesn't provide much benefit for me.
> it can be absolutely tiny
Yes, but at the end of the day, it's still a server. Its size is immaterial - you still need to deal with the issues of maintaining a server - patching security vulnerabilities and making sure you don't get hacked and don't expose anything publicly you're not supposed to. It requires routine maintenance just like a real server. Mulitply that with 100, if you have 100 MCP "servers". It's just not a scalable model.
In a monolith with 100 modules, you just do all the security patching for ONE server.
I think you have an overcomplicated idea of what a "server" means here. For MCP that does not mean it needs to speak HTTP. It can be just a binary that reads from stdin and writes to stdout.
That's actually true, because I'm always thinking from a cloud deployment perspective (which is my use case). What kind of architecture do you run this on, at scale on the cloud? You have very limited options if your monolith is on a serverless and is CPU/memory bound, too. So, that's where I was coming from.
The overhead of spawning a process is not the problem. The overhead if your runtime is huge and/or slow to start could be, in which case you simply wouldn't run it on a serverless system - which is ludicrously expensive at scale anyway (my dayjob is running a consultancy where the big money-earner is helping people cut their cloud costs, and moving them off serverless systems that are entirely wrong for them is often one of the big savings; it's cheap for tiny systems, but even then often the hassle isn't worth it)
That implies a language lock-in, which is undesirable.
With something like https://nango.dev you can get a single server that covers 400+ APIs.
Also handles auth, observability and offers other interfaces for direct tool calling.
(Full disclosure, I’m the founder)
Email me on robin @ <domain> and happy to find a solution for your use case
In the end, MCP is just like Rest APIs, there isn't need for a paid service for me to connect to 400 Rest APIs now, why do I need a service to connect to 400 MCPs?
All I need for my users is to be able to connect to one or two really useful MCPs, which I can do myself. I don't need to pay for some multi REST API server or multi MCP server.
An "agent" with access to 400 MCPs would perform terribly in real world situations, have you ever tried it? Agents would best with a tuned well written system prompt and access to a few, well tuned tools for that particular agents job.
There's a huge difference between a fun demo of an agent calling 20 different tools, and an actual valuable use-case which works RELIABLY. In fact, agents with tons of MCP tools are currently insanely UNRELIABLE, it's much better to just have a model + one or two tools combined with a strict input and output schema, even then, it's pretty unreliable for actual business use-cases.
I think most folks right now have never actually tried making a reliable, valuable business feature using MCPs so they think somehow having "400 MCPs" is a good thing. But they haven't spent a few minutes thinking, "wait, why does our business need an agent which can connect to Youtube and Spotify?"
Forcing LLMs to output JSON is just silly. A lot of time and effort is being spent forcing models to output a format that is picky and that LLMs quite frankly just don't seem to like very much. A text based DSL with more restrictions on it would've been a far better choice.
Years ago I was able to trivially teach GPT 3.5 to reliably output an English like DSL with just a few in prompt examples. Meanwhile even today the latest models still have notes that they may occasionally ignore some parts of JSON schemas sent down.
Square peg, round hole, please stop hammering.
For example, in Elixir, we have this library: https://hexdocs.pm/instructor/
It's massively useful for any structured output related work.
I imagine that validation as you go could slow things down though.
Do I look at whether the data format is easily output by my target LLM?
Or do I just validate clamp/discard non-conforming output?
Always using the latter seems pretty inefficient.
No reason why you couldn't just use tool calls with OpenAPI specs though. Either way, making all your microservices talk to each other over MCP sounds wild.
unlike web dev where you client cab just write more components / API callers, AI has context limits. if you try to plug (i get it, exaggerated) 500 MCP servers each with 2-10 tools you will waste a ton of context in your sys prompt. you can use a tool proxy (one tool which routes to others) but that will add latency from the processing.
There are security / reliability concerns, true, but finally getting technically close to Star Trek computers and then still doing things 'the way they've always been done' doesn't seem efficient.
You can give an LLM a github repo link, a fresh VPC, and say "deploy my app using nginx" and any other details you need... and it'll get it done.
> n. the process of getting or producing something, especially information or a reaction
Now, Sampling seems like an odd feature name to me, but maybe I'm missing the driver behind that name.
It's right there in my comment. You don't need to call user input "elicitation" just to pretend it's something it's not.
Same goes for sampling.
Yeah, let's pretend it works. So far structured output from an LLM is an exercise in programmers' ability to code defensively against responses that may or may not be valid JSON, may not conform to the schema, or may just be null. There's a new cottage industry of modules that automate dealing with this crap.
https://openai.com/index/introducing-structured-outputs-in-t...
> Structured Outputs can still contain mistakes.
The guarantee promised in link 1 is not supported by the documentation in link 2. Structured Output does a _very good_ job, but still sometimes messes up. When you’re trying to parse hundreds of thousands of documents per day, you need a lot of 9s of reliability before you can earnestly say “100% guarantee” of accuracy.
Anecdotally, I've seen Azure OpenAI services hallucinate tools just last week, when I provided an empty array of tools rather than not providing the tools key at all (silly me!). Up until that point I would have assumed that there are server-side safeguards against that, but now I have to consider spending time on adding client-side checks for all kinds of bugs in that area.
You are confusing API response payloads with Structured JSON that we expect to conform to the given Schema. It's carnage that requires defensive coding. Neither OpenAI nor Google are interested in fixing this, because some developers decide to retry until they get valid structured output which means they spend 3x-5x on the calls to the API.
That (prompt injection) isn’t something you can fix until you come up with a way to split prompts.
That means new types models; there is no variant on MCP that can solve it with existing models.
I guess the phone phreak generation wasn't around to say 'Maybe this is a bad idea...' (because the first thing users are going to do is try to hijack control via in band overrides)
Earlier I posted about mcp-webcam (you can find it) which gives you a no-install way to try out Sampling if you like.
1) Probably the coolest thing to happen with LLMs. While you could do all this with tool calling and APIs, being able to send to less technical friends a MCP URL for claude and seeing them get going with it in a few clicks is amazing.
2) I'm using the csharp SDK, which only has auth in a branch - so very bleeding edge. Had a lot of problems with implementing that - 95% of my time was spent on auth (which is required for claude MCP integrations if you are not building locally). I'm sure this will get easier with docs but it's pretty involved.
3) Related to that, Claude doesn't AFIAK expose much in terms of developer logs for what they are sending via their web (not desktop) app and what is going wrong. Would be super helpful to have a developer mode where it showed you request/response of errors. I had real problems with refresh on auth and it turned out I was logging the wrong endpoint on my side. Operator error for sure but would have fixed that in a couple of mins had they had better MCP logging somewhere in the webui. It all worked fine with stdio in desktop and MCP inspector.
4) My main question/issue is handling longer running tasks. The dataset I'm exposing is effectively a load of PDF documents as I can't get claude to handle the PDF files itself (I am all ears if there is a way!). What I'm currently doing is sending them through gemini to get the text, then sending that to the user via MCP. This works fine for short/simple documents, but for longer length documents (which can take some time to process) I return a message saying it is processing and to retry later.
While I'm aware there is a progress API, it still requires keeping the connection open to the server (which times out after a while with Cloudflare at least) - could be wrong here though?. It would be much better to be able to tell the LLM to check back in x seconds when you predict it will be done and the LLM can do other stuff in the background (which it will do), but then sorta 'pause execution' until the timer is hit.
Right now (AFIAK!) you can either keep it waiting (which means it can't do anything else in the meantime) with an open connection w/ progress, or you can return a job ID, but then it will just return a half finished answer which is often misleading as it doesn't have all the context yet. Don't know if this makes any sense, but I can imagine this being a real pain for tasks that take 10mins+.
There are a few proposals floating around, but one issue is that you don't always know whether a task will be long-running, so having separate APIs for long-running tasks vs "regular" tool calls doesn't fully address the problem.
I've written a proposal to solve the problem in a more holistic way: https://github.com/modelcontextprotocol/modelcontextprotocol...
Real world LLM is going to be built on non-negligible (and increasingly complicated) deterministic tools, so might as well integrate with the 'for all possible workflows' use case.
dend•5mo ago
jjfoooo4•5mo ago
refulgentis•5mo ago
nsonha•5mo ago
antupis•5mo ago
freeone3000•5mo ago
How does it differ from providing a non MCP REST API?
hobofan•5mo ago
However, as someone that has tried to use OpenAPI for that in the past (both via OpenAI's "Custom GPT"s and auto-converting OpenAPI specifications to a list of tools), in my experience almost every existing OpenAPI spec out there is insufficient as a basis for tool calling in one way or another:
- Largely insufficient documentation on the endpoints themselves
- REST is too open to interpretation, and without operationIds (which almost nobody in the wild defines), there is usually context missing on what "action" is being triggered by POST/PUT/DELETE endpoints (e.g. many APIs do a delete of a resource via a POST or PUT, and some APIs use DELETE to archive resources)
- baseUrls are often wrong/broken and assumed to be replaced by the API client
- underdocumented AuthZ/AuthN mechanisms (usually only present in the general description comment on the API, and missing on the individual endpoints)
In practice you often have to remedy that by patching the officially distributed OpenAPI specs to make them good enough for a basis of tool calling, making it not-very-plug-and-play.
I think the biggest upside that MCP brings (given all "content"/"functionality") being equal is that using it instead of just plain REST, is that it acts as a badge that says "we had AI usage in mind when building this".
On top of that, MCP also standardizes mechanisms like e.g. elicitation that with traditional REST APIs are completely up to the client to implement.
ethbr1•5mo ago
freeone3000•5mo ago
dend•5mo ago
fennecfoxy•5mo ago
My gripe is that they had the opportunity to spec out tool use in models and they did not. The client->llm implementation is up to the implementor and many models differ with different tags like <|python_call|> etc.
lsaferite•5mo ago
I'm with you on an Agent -> LLM industry standard spec need. The APIs are all over the place and it's frustrating. If there was a spec for that, then agent development becomes simply focused on the business logic and the LLM and the Tools/Resource are just standardized components you plug together like Lego. I've basically done that for our internal agent development. I have a Universal LLM API that everything uses. It's helped a lot.
ethbr1•5mo ago
It has the physical plug, but what can it actually do?
It would be nice to see a standard aiming for better UX than USB C. (Imho they should have used colored micro dots on device and cable connector to physically declare capabilities)
fennecfoxy•5mo ago
ethbr1•5mo ago
If I am looking at a device/cable, with my eyes, in the physical world, and ask the question "What does this support?", there's no way to tell.
I have to consult documentation and specifications, which may not exist anymore.
So in the case of standards like MCP, I think it's important to come up with answers to discovery questions, lest we all just accept that nothing can be done and the clusterfuck in +10 years was inevitable.
A good analogy might be imagining how the web would have evolved if we'd had TCP but no HTTP.
fennecfoxy•5mo ago
I would say for all the technology we have in 2025, this has certainly been one of the core issues for decades & decades. Nothing talks to each other properly, nothing works with another thing properly. Immense effort has to be expended for each thing to talk to or work with the other thing.
I got a Macbook Air for light dev as a personal laptop. It can't access Android filesystem with a phone plugged in. Windows can do it. I know Apple's corporate reasoning, but just an example of purposeful incompatibility.
As you say, all these companies use standards like TCP/HTTP/Wifi/Bluetooth/USB/etc and they would be nowhere without them - but literally every chance they get they try to shaft us on it. Perhaps AI will assist in the future - tell it you want x to work with y and the model will hack on it until the fucking thing works.
vidarh•5mo ago
fennecfoxy•5mo ago
Not sure why people treat MCP like it's much more than smashing tool descriptions together and concatenating to the prompt, but here we are.
It is nice to have a standard definition of tools that models can be trained/fine tuned for, though.
ethbr1•5mo ago
lobsterthief•5mo ago
As an example, anyone who’s coded email templates will tell you: it’s hard. While the major browsers adopted the W3C specs, email clients (I.e. email renderers) never adopted the spec, or such a W3C email HTML spec didn’t exist. So something that renders correctly in Gmail looks broken in Yahoo mail in Safari on iOS, etc.
fennecfoxy•5mo ago
But browsers/web ecosystem are still a bad example as we had decades of browsers supporting their own particular features/extensions. This has converged slightly pretty much because everything now uses Chrome underneath (bar Safari and Firefox).
But even so...if I write an extension while using Firefox, why can't I install that extension in Chrome? And vice-versa? Even bookmarks are stored in slightly different formats.
It is a massive pain to align technology like this but the benefits are huge. Like boxing developers in with a good library (to stop them from doing arbitrary custom per-project BS) I think all software needs to be boxed into standards with provisions for extension/innovation. Rather than this pick & choose BS because muh lock-in.
rco8786•5mo ago
seanobannon•5mo ago
- a Typescript library to set up ChatGPT-MCP compatible auth
- Source code for an Express & NextJS project implementing the library
- a URL for the demo of the deployed NextJS app that logs ChatGPT tool calls
Should be helpful for folks trying to set up and distribute custom connectors.
https://github.com/mcpauth/mcpauth