> "If I could short MCP, I would"
I mean, MCP is hard to work with. But there's a very large set of things that we want a hardened interface to out there - if not MCP, it will be something very like it. In particular, MCP was probably overly complicated at the design phase to deal with the realities of streaming text / tokens back and forth live. That is, it chose not to abstract these realities in exchange for some nice features, and we got a lot of implementation complexity early.To quote the Systems Bible, any working complex system is only the result of the growth of a working simple system -- MCP seems to me to be right on the edge of what you'd define as a "working simple system" -- but to the extent it's all torn down for something simpler, that thing will inevitably evolve to allow API specifications, API calls, and streaming interaction modes.
Anyway, I'm "neutral" on MCP, which is to say I don't love it. But I don't have a better system in mind, and crucially, because these models still need fine-tuning to deal properly with agent setups, I think it's likely here to stay.
Doesn't seem like implementation could be more simple. Just JSON-RPC and API stuff. For example the MCP hello-world with python and FastMCP is practically 1-to-1 with a http/web flavored hello-world in flask
I haven't looked in a few months, so my information might be a bit out of date, but at the time - if you wanted to use a python server from the modelcontextprotocol GitHub, fine. If you wanted to, say, build a proxy server in rust or golang, you were looking at a set of half-implemented server implementations targeting two-versions-old MCP specs while clients like claude obscure even which endpoints they use for discovery.
It's an immature spec, moderately complicated, and moving really quickly with only a few major 'subscribers' to the server side; I found it challenging to work with.
Even with these constraints the core MCP design is actually pretty good. First, use stdio transport, and now your language only needs to speak JSON [1]. Then, forget about building proxies and routers and web stuff, and offload that to mcpjungle [2] or similar to front your stdio work.
If that still doesn't work, I think I would probably wrap the foreign language with subprocs and retreat towards python's FastMCP (or whatever the well-supported and fast-moving stuff is in another language). Ugly but practical if you really must use a language with no good MCP support. If really none of that works I guess one is on the hook to support a changing MCP spec with a custom implementation in that language.. but isn't there maybe an argument now that MCP is complex because someone insisted on it being complex?
[1]: https://modelcontextprotocol.io/specification/2025-06-18/bas... [2]: https://github.com/mcpjungle/MCPJungle
For reference, I think writing an MCP proxy layer in (lang of choice) is significantly harder than writing something to respond to GET / over http, both in complexity of what clients need out of a server (web clients are hardened to deal with all kinds of bad behavior), and in the amount of stuff you actually need to write, and also in the lack of documentation.
If you don't already have an API, sure, MCP is a possible choice for that API. But if you have an API, there is decreasing reasons to bother implementing an MPC server the smarter the models are getting vs. just giving it access to your API docs.
and earlier Simon Willison argued[1] that Skills are even bigger deal than MCP.
But I do not see as much hype for Skills as it was for MCP - it seems people are in the MCP "inertia" and having no time to shift to Skills.
Skills feel like a non-feature to me. It feels more valuable to connect a user to the actual tool and let them familiarize themselves with it (and not need the LLM to find it in the future) rather than having the tool embedded in the LLM platform. I will carve out a very big exception of accessibility here - I love my home device being an egg timer - it's a wonderful egg timer (when it doesn't randomly play music) and I could buy an egg timer but having a hands-free egg timer is actually quite valuable to me while cooking. So I believe there is real value in making these features accessible through the LLM over media that the feature would normally be difficult to use in.
"Known to work" -- if it has a hardcoded code, it will work 100% of the time - that's the point of Skills. If it's just markdown then yes, some sort of probability will be there and it will keep on improving.
Like with MCP, you can provide a deterministic, known-good piece of code to carry out the operation once the LLM decides to use it.
But a skill can evolve from pure Markdown via inlining some shell commands, up to a large application. And if you let it, with Skills the LLM can also inspect the tool, and modify it if it will help you.
All the Skills I use now have evolved bit by bit as I've run into new use-cases and told Claude Code to update the script the skills references or the SKILL.md itself. I can evolve the tooling while I'm using it.
A fun thing: Claude Code will sometimes fail to find the skill the "proper" way, and will then in fact sometimes look for the SKILL.md file with tools, and read the file with tools, showing that it's perfectly capable of doing all the steps.
You could probably "fake" skills pretty well with instructions in CLAUDE.md to use a suitable command to extract the preamble of files in a given directory, and tell it to use that to decide when to read the rest.
It's the fact that it's such a thin layer that is exciting - it means we need increasingly less special logic other than relying on just basic instructions to the model itself.
Working around their many limitations has been the nature of the game since the original GPT-3.
It’s documentation vs researching how to do something.
Similiar to what humans do.
It is after all a collection of instructions and code that any other llm can read and understand and then do a code execution (via tool call / mcp call)
1. Open-Skills: https://github.com/BandarLabs/open-skills
Just tell your non-Claude agent to read your skills directory, and extract the preambles.
Aren't skills really just a collection of tagged MCP prompts, config resources, and tools, except with more lock-in since only Claude can use it? About that "agent virtual environment" that runs the scripts.. how is it customized, and.. can it just be a container? Aren't you going to need to ship/bundle dependencies for the tools/libraries those skills require/reference, and at that point why are we avoiding MCP-style docker/npx/uvx again?
Other things that jump out are that skills are supposed to be "composable", yet afaik it's still the case that skills may not explicitly reference other skills. Huge limiting factors IMHO compared to MCP servers that can just use boring inheritance and composition with, you know, programming languages, or composition/grouping with namespacing and such at the server layer. It's unclear how we're going to extend skills, require skills, use remote skills, "deploy" reusable skills etc etc, and answering all these questions gets us most of the way back to MCP!
That said, skills do seem like a potentially useful alternate "view" on the same data/code that MCP is covering. If it really catches on, maybe we'll see skill-to-MCP converters for serious users that want to be able do the normal stuff (like scaling out, testing in isolation, doing stuff without being completely attached to the claude engine forever). Until there's interoperability I personally can't see getting interested though
Tell your agent of choice to read the preamble of all the documents in the skills directory, and tell it that when it has a task that matches one of the preambles, it should read the rest of the relevant file for full instructions.
There are far fewer dependencies for skills than for MCP. Even a model that knows nothing about tool use beyond how to run a shell command, and has no support for anything else can figure out skills.
I don't know what you mean regarding explicitly referencing other skills - Claude at least is smart enough that if you reference a skill that isn't even properly registered, it will often start using grep and find to hunt for it to figure out what you meant. I've seen this happen regularly while developing a plugin and having errors in my setup.
This is wrong and an example magical thinking. AI obviously does not mean that you can ship/use software without addressing dependencies? See for example https://github.com/anthropics/skills/blob/main/slack-gif-cre... or worse, the many other skills that just punt on this and assume CLI tools and libraries are already available
The trivial evidence of this, is that if you have an MCP server available, the skill can simply explain to the agent how to use the MCP server, and so even the absolute worst case for skills is parity.
Even if we expect the LLMs to fully resolve the task, it'll heavily rely on I/O and print statements sprinkled across the execution trace to get the job done.
sandbox is not mandatory here. You can execute the skills on your host machine too (with some fidgeting) but it's a good practice and probably for the better to get in to the habit of executing code in an isolated environment for security purposes.
They are a bigger deal in a sense because they remove the need for all the scaffolding MCPs require.
E.g. I needed Claude to work on transcripts from my Fathom account, so I just had it write a CLI script to download them, and then I had it write a SKILL.md, and didn't have to care about wrapping it up into an MCP.
At a client, I needed a way to test their APIs, so I just told Claude Code to pull out the client code from one of their projects and turn it into a CLI, and then write a SKILL.md. And again, no need to care about wrapping it up into an MCP.
But this seems a lot less remarkable, and there's a lot less room to build big complicated projects and tooling around it, and so, sure, people will talk about it less.
MCP is completely different, I don’t understand why people keep comparing the two. A skill cannot connect to your Slack server.
Skills are more similar to sub-agents, the main difference being context inheritance. Sub-agents enable you to set a different system prompt for those which is super useful.
Most of my skills connect to APIs.
This is what I expected the post to be about before clicking.
> Granted to use a skill the agent needs to have general purpose access to a computer, but this is the bitter lesson in action. Giving an agent general purpose tools and trusting it to have the ability to use them to accomplish a task might very well be the winning strategy over making specialized tools for every task.
The academic community has been using the term "skill" for years, to refer to classes of tasks at which LLMs exhibit competence.
Now OpenAI has usurped the term to refer to these inference-guiding .md files.
I'm not looking forward to having to pick through a Google hit list for "LLM skills", figuring out which publications are about skills in the traditional sense and which are about the OpenAI feature. Semantic overload sucks.
How do we deal with this? Start using "competencies" (or similar) in academic papers? Or just resign ourselves to suffering the ambiguity?
Or maybe the OpenAI feature will fall flat and nobody will talk about it at all. That would frankly be the best outcome.
An LLM with a shell integration can do anything you need it to.
I don't agree with this. Natural language is so ambiguous. At least for software development the hard work is still coming up with clearly defined solutions. There is a reason for why math has its own domain specific language.
As a former tech comms guy I will say:
Natural language can be bent into arbitrary precision. Write something, then enter a read-rewrite-reread loop as the devil's advocate (this is key) until it stops being ambiguous or having multiple conceivable interpretations.
Yes with English this process can be a pain in the butt, until you get the hang of it.
There's a good reason we use jargon in professions, or more constrained and less ambiguous languages for maths/coding
Was a pain to set up, but you can score the context completion and then if the score is under 98% or something, “ask” clarifying questions of the requesting agent or person or system
We liked it quite a bit, but it led to some funny things. We use Reminders to keep our home to-do lists, hers and mine in one list with two sections. I wanted to take this existing flow we had and make it work with a Custom GPT. It's practically impossible because Reminders:
* doesn't have a good API through EventKit
* requires a pop-up permission grant in the UI
So in the end, I did end up making somewhat of an MCP server for it, running it on an old Macbook Pro I had and then sticking Amphetamine on in closed-lid display-sleep mode hooked up to my Tailnet and exposed via a Cloudflare tunnel so that we could use ChatGPT to interact with the thing. Yes, you can see how insane that whole thing is. But there's quite a lot of value to have your AI agent just be the one thing.
0: https://wiki.roshangeorge.dev/w/Blog/2025-10-17/Custom_GPTs
Is there any other difference in the end-user side?
dsign•2mo ago
But I reckon that every time that humans have been able to improve their information processing in any way, the world has changed. Even if all we get is to have an LLM be right more times than it is wrong, the world will change again.