Services can provide an MCP-like layer that provides semantic definitions of everything you can do with said service (API + docs).
Skills can then be built that combine some subset of the 3rd party interfaces, some bespoke code, etc. and then surface these more context-focused skills to the LLM/agent.
Couldn’t we just use APIs?
Yes, but not every API is documented in the same way. An “MCP-like” registry might be the right abstraction for 3rd parties to expose their services in a semantic-first way.
feels like the right layer of abstraction for remote APIs
It's got the best implementation of a "skills-like" agent tool I've seen. Basically a visual tree builder, currently only one level deep. So I've set up the "<my company name> agent" and then it has subagents/skills for thing like marketing/supply chain research/sysadmin/translation etc., each with a separate description, prompt, and knowledge base.
Unfortunately, everything else about Gemini Enterprise screams "early alpha, why the hell are you selling this as an actual finished product?".
For example, after I put half a day into setting up an agent and subagents, then went to share this with the other people helping me to test it, I found that... I can't. Literally no way to share agents in a tool that is supposedly for teams to use. I found one of the devs saying that sharing agents would be released in "about two weeks". That was two months ago.
Mini rant over... But my point is that skills are just "agents + auto-selecting sub-agents via a short description" and we'll see this pattern everywhere soon. Claude Skills have some additional sandboxing but that's mostly only interesting for coders.
Bloat has a new name and its AI integration. You thought Chrome using GB per tab was bad, wait until you need a whole datacenter to use your coding environment.
Sure, if you could use VBA to read a patient's current complaint, vitals, and medical history, look up all the relevant research on Google Scholar, and then output a recommended course of treatment.
But perhaps an LLM could write an adapter that gets cached until something changes?
So companies are really trying to deliver value. This is the right pivot. If you gave me an AGI with a 100 IQ, that seems pretty much worthless in today’s world. But domain expertise - that I’ll take.
Some frameworks/languages move really fast unfortunately.
The clever part is that the markdown file has a section in it like this: https://github.com/datasette/skill/blob/a63d8a2ddac9db8225ee...
---
name: datasette-plugins
description: "Writing Datasette plugins using Python and the pluggy plugin system. Use when Claude needs to: (1) Create a new Datasette plugin, (2) Implement plugin hooks like prepare_connection, register_routes, render_cell, etc., (3) Add custom SQL functions, (4) Create custom output renderers, (5) Add authentication or permissions logic, (6) Extend Datasette's UI with menus, actions, or templates, (7) Package a plugin for distribution on PyPI"
---
On startup Claude Code / Codex CLI etc scan all available skills folders and extract just those descriptions into the context. Then, if you ask them to do something that's covered by a skill, they read the rest of that markdown file on demand before going ahead with the task.The models are really good at driving those environments now which makes skills the right idea at the right time.
So when it's time to commit, make sure you run these checks, write a good commit message, etc.
Debugging is especially useful since AI agents can often go off the rails and go into loops rewriting code - so it's in a skill I can push for "read the log messages. Inserting some more useful debug assertions to isolate the failure. Write some more unit tests that are more specific." Etc.
I’ve been playing with doing this but kind of doesn’t feel the most natural fit.
OpenAI keep changing their mind on what to call it. I like the original name, "ChatGPT Code Interpreter", but they've also called it "advanced data analysis" at various points.
Claude added the same feature in September this year: https://simonwillison.net/2025/Sep/9/claude-code-interpreter...
In both ChatGPT and Claude you can say things like "use your Python tool to calculate total mortgage payments over a 30 year period for X and Y" and it will write and execute code to do so - but you can also upload files (including CSVs or even SQLite database files) into that container file system and have them write and execute python code to process those in different ways.
Skills are just folders full of markdown files that are saved in that container when it first boots up.
simonw•1h ago
(I'm not just about pelicans.)
KK7NIL•1h ago
jb_rad•1h ago
OrsonSmelles•35m ago
throwup238•26m ago
The foreplay starts around the 1 minute mark.
bilekas•15m ago
Good thinking, I agree actually, however..
> Skills are based on a very light specification, if you could even call it that, but I still think it would be good for these to be formally documented somewhere.
Like a lot of posts around AI, and I hope OP can speak to it, surely you can agree that while when used for a good cool idea, it can also be used for the inverse and probably to more detrimental reason. Why would they document an unmanageable feature that may be consumed.
Shareholder value might not go up if they learnt that the major product is learning bad things.
Have you or would you try this on a local LLM instead ?
simonw•9m ago
The OpenAI GPT OSS models can drive Codex CLI, so they should be able to do this.
I have high hopes for Mistral's Devstral 2 but I've not run that locally yet.