I have been trying to build skills to do various things on our internal tools, and more often then not, when it doesn't work, it is as much a problem with _our tools_ as it is with the LLM. You can't do obvious things, the documentation sucks, api's return opaque error messages. These are problems that humans can work around because of tribal knowledge, but LLMs absolutely cannot, and fixing it for LLM's also improves it for your human users, who probably have been quietly dealing with friction and bullshit without complaining -- or not dealing with it and going elsewhere.
If you are building a product today, the feature you are working on _is not done_ until Claude Code can use it. A skill and an MCP isn't a "nice to have", it is going to be as important as SEO and accessibility, with extremely similar work to do to enable it.
Your product might as well not exist in a few years if it isn't discoverable by agents and usable by agents.
This is an interesting take. I admit I've never thought this way.
Design of https://www.skillcreator.ai/explore for me it's more useful. At least I can search by category, framework, language and I also see much more information what some skill does at a glance. I don't know why vercel really wanted to do it completely black and white - colors used and done with a taste gives useful context and information.
slop?
Edit: btw I’ve gone from genai value denier to skeptic to cautiously optimistic to fairly impressed in the span of a year. (I’m a user of Claude code)
> In 56% of eval cases, the skill was never invoked. The agent had access to the documentation but didn't use it. Adding the skill produced no improvement over baseline.
> …
> Skills aren't useless. The AGENTS.md approach provides broad, horizontal improvements to how agents work with Next.js across all tasks. Skills work better for vertical, action-specific workflows that users explicitly trigger,
https://vercel.com/blog/agents-md-outperforms-skills-in-our-...
I saw someone's analysis a few days ago and they found that their agents were more accurate when just dumping the skill context directly into AGENTS.md
I think this is (mostly) a solvable problem. The current generation of SotA models wasn’t RLVR-trained on skills (they didn’t exist at that time) and probably gets slightly confused by the way the little descriptions are all packed into the same tool call schema. (At least that’s how it works with Claude Code.) The next generation will have likely been RLVRed on a lot of tasks where skills are available, and will use them much more reliably. Basically, wait until the next Opus release and you should hopefully see major improvements. (Of course, all this stuff is non-deterministic blah blah, but I think it’s reasonable to expect going from “misses the skill 30% of the time” to “misses it 2% of the time”.)
Probably the more skills you have, the more confused it might get. The more potentially conflicting instructions you give the harder it gets for an LLM to figure out what you actually want to happen.
If I catch it going off script, I often interrupt it and tell it what to do and update the relevant skill. Seems to work pretty good. Keeping things simple seems to work.
I have a feeling that otherwise it becomes too messy for agents to reliably handle a lot of complex stuff.
For example, I have OpenClaw automatically looking for trending papers, turning them into fun stories, and then sending me the text via Telegram so I can listen to it in the ElevenLabs app.
I'm not sure whether it's better to have the story-generating system behind an API or to code it as a skill — especially since OpenClaw already does a lot of other stuff for me.
My general design principle for agents, is that the top level context (ie claude.md, etc) is primarily "information about information", a list of skills, mcps, etc, a very general overview, and a limited amount of information that they always need to have with every request. Everything more specific is in a skill, which is mostly some very light touch instructions for how to use various tools we have (scripts, apis and mcps).
I have found that people very often add _way_ to much information into claude.md's and skills. Claude knows a lot of stuff already! Keep your information to things specific whatever you are working on that it doesn't already know. If your internal processes and house style are super complicated to explain to claude and it keeps making mistakes, you might want to adapt to claude instead of the other way around. Claude itself makes this mistake! If you ask it to build a claude md, it'll often fill it with extraneous stuff that it already knows. You should regularly trim it.
.claude/skills
.codex/skills
.opencode/skills
.github/skillsFor instance, Gemini CLI ignores symlinked skills. Codex doesn't support symlinked SKILL.md files.
I treat my skills the same as I would write tiny bash scripts and fish functions in the days gone to simplify my life by writing 2 words instead of 2 sentences. Tiny improvement that only makes sense for a programmer at heart.
Skills seem a bit early to standardize. We are so early in this, why do we want to handcuff our creativity so soon?
[1]: https://code.claude.com/docs/en/skills#control-who-invokes-a... [2]: https://opencode.ai/docs/skills/#disable-the-skill-tool [3]: https://developers.openai.com/codex/skills/#enable-or-disabl...
The problem I see now is that everyone wants to be the winner in a hype cycle and be the standards bringer. How many "standards" have we seen put out now? No one talks about MCP much anymore, langchain I haven't seen in more than a year, will we be talking about Skills in another year?
Why do I want to throw away my dependency management system and shared libraries folder for putting scripts in skills?
What tools do they have access to, can I define this so it's dynamic? Do skills even have a concept for sub tools or sub agents? Why do I want to put references in a folder instead of a search engine? Does frontmatter even make sense, why not something closer to a package.json in a file next to it?
Does it even make sense to have skills in the repo? How do I use them across projects? How do we build an ecosystem and dependency management system for skills (which are themselves versioned)
You are right. I have edited my post slightly.
> Why do I want to throw away my dependency management system and shared libraries folder for putting scripts in skills?
You don't have to put scripts in skills. The script can be anywhere the agent can access. The skill just needs to tell the LLM how to run it.
> Does it even make sense to have skills in the repo? How do I use them across projects?
You don't have to put them in the repo. E.g. with Claude Code you can put project-specific skills in `.claude/skills` in the repo and system-wide skills in `~/.claude/skills`.
.opencode/skill>any time you want to search for a skill in `./codex`, search instead in `./claude`
and continue as you were.
…including, apparently, the clueless enthusiasm for people to “share” skills.
MCP is also perfectly fine when you run your own MCP locally. It’s bad when you install some arbitrary MCP from some random person. It fails when you have too many installed.
Same for skills.
It’s only a matter of time (maybe it already exists?) until someone makes a “package manager” for skills that has all of the stupid of MCP.
MCP is giving the agents a bunch of functions/tools it can use to interact with some other piece of infrastructure or technology through abstraction. More like a toolbox full of screwdrivers and hammers for different purposes, or a high-level API interface that a program can use.
Skills are more similar to a stack of manuals/books in a library that teach an agent how to do something, without polluting the main context. For example a guide how to use `git` on the CLI: The agent can read the manual when it needs to use `git`, but it doesn’t need to have the knowledge how to use `git` in it’s brain when it’s not relevant.
A directory of skills... same thing
You can use MCP the same way as skills with a different interface. There are no rules on what goes into them.
They both need descriptions and instruction around them, they both have to be is presented and index/instn to the agent dynamically, so we can tell them what they have access to without polluting the context.
See the Anthropic post on moving MCP servers to a search function. Once you have enough skills, you are going to require the same optimization.
I separate things in a different way
1. What things do I force into context (agents.md, "tools" index, files) 2. What things can the agent discorver (MCP, skills, search)
Ok I'm glad I'm not the only one who wondered this. This seems like simplified MCP; so why not just have it be part of an MCP server?
I liked that idea to have something more CLI agnostic
For example, we have a skill to /create-new-endpoint. The skill contains a detailed checklist of all the boilerplate tasks that an engineer needs to do in addition to implementing the logic (e.g. update OpenAPI spec, add integration tests, endpoint boilerplate, etc.). The engineer manually invokes the skill from the CLI via slash commands, provides a JIRA ticket number, and engages in some brief design discussion. The LLM is consistently able to one-shot these tickets in a way that matches our existing application architecture.
Whenever there's an agent best practice (skill) or 'pre-prompt' that you want to use all the time, turn it into a text expansion snippet so that it works no matter where you are.
As an example, I have a design 'pre-prompt' that dictates a bunch of steering for agents re: how to pick style components, typography, layout, etc. It's a few paragraphs long and I always send it alongside requests for design implementation to get way-better-than-average output.
I could turn it into a skill, but then I'd have to make sure whatever I'm using supported skills -- and install it every time or in a way that was universally seen on my system (no, symlinking doesn't really solve this).
So I use AutoHotkey (you might use Raycast, Espanso, etc) to config that every time I type '/dsn', it auto-expands into my pre-prompt snippet.
Now, no matter whether I'm using an agent on the web/cloud, in my terminal window, or in an IDE, I've memorized my most important 'pre-prompts' and they're a few seconds away.
It's anti-fragile steering by design. Call it universal skill injection.
You might as well just write instructions in English in any old format, as long as it's comprehensible. Exactly as you'd do for human readers! Nothing has really changed about what constitutes good documentation. (Edit to add: my parochialism is showing there, it doesn't have to be English)
Is any of this standardization really needed? Who does it benefit, except the people who enjoy writing specs and establishing standards like this? If it really is a productivity win, it ought to be possible to run a comparison study and prove it. Even then, it might not be worthwhile in the longer run.
https://vercel.com/blog/agents-md-outperforms-skills-in-our-...
It’s also related to attention — invoking a skill “now” means that the model has all the relevant information fresh in context, you’ll have much better results.
What I’m doing myself is write skills that invoke Python scripts that “inject” prompts. This way you can set up multi-turn workflows for eg codebase analysis, deep thinking, root cause analysis, etc.
Works very well.
I'm very curious to know the size & state of a codebase where skills are beneficial over just having good information hierarchy for your documentation.
From a huggingface employee:
codex + skills finetunes Qwen3-0.6B to +6 on humaneval and beats the base score on the first run.
I reran the experiment from this week, but used codex's new skills integration. Like claude code, codex consumes the full skill into context and doesn't start with failing runs. It's first run beats the base score, and on the second run it beats claude code.
https://x.com/ben_burtenshaw/status/2000233069517676756https://xcancel.com/ben_burtenshaw/status/200023306951767675...
The pattern that works: skills that represent complete, self-contained sequences - "do X, then Y, then Z, then verify" - with clear trigger conditions. The agent recognizes these as distinct modes of operation rather than optional reference material.
What doesn't work: skills as general guidelines or "best practices" documents. These get lost in context or ignored entirely because the agent has no clear signal for when to apply them.
The mental model shift: think of skills less like documentation and more like subroutines you'd explicitly invoke. If you wouldn't write a function for it, it probably shouldn't be a skill.
Example: A Python file is read or written, guidance is given back (once, with a long cooldown) to activate global and company-specific Python skills. Claude activates the skills and writes Python to our preference.
But on the other hand, in Claude Code, at least, the skill "foo" is accessible as /foo, as the generalisation of the old commands/ directory, so I tend to favour being explicit that way.
You can have the perfect scraping skill, but if the target blocks your requests, you're stuck. The hard problems are downstream.
My model for skills is similar to this, but I extended it to have explicit use when and don’t use when examples and counter examples. This helped the small model which tended to not get the nuances of a free form text description.
ironbound•1h ago