Claude Skills

https://www.anthropic.com/news/skills

816•meetpateltech•3mo ago

https://www.anthropic.com/engineering/equipping-agents-for-t...

Comments

j45•3mo ago

I wonder if Claude Skills will help return Claude back to the level of performance it had a few months ago.

bicx•3mo ago

Interesting. For Claude Code, this seems to have generous overlap with existing practice of having markdown "guides" listed for access in the CLAUDE.md. Maybe skills can simply make managing such guides more organized and declarative.

kfarr•3mo ago

Yeah my first thought was, oh it sounds like a bunch of CLAUDE.md's under the surface :P

crancher•3mo ago

It's interesting (to me) visualizing all of these techniques as efforts to replicate A* pathfinding through the model's vector space "maze" to find the desired outcome. The potential to "one shot" any request is plausible with the right context.

candiddevmike•3mo ago

> The potential to "one shot" any request is plausible with the right context.

You too can win a jackpot by spinning the wheel just like these other anecdotal winners. Pay no attention to your dwindling credits every time you do though.

NitpickLawyer•3mo ago

On the other hand, our industry has always chased the "one baby in one month out of 9 mothers" paradigm. While you couldn't do that with humans, it's likely you'll soon (tm) be able to do it with agents.

j45•3mo ago

If so, it would be a better way than encapsulating functionality in markdown.

I have been using claude code to create some and organize them but they can have diminishing return.

guluarte•3mo ago

it also may point out that the solution for context rot may not be coming in the foreseeable future

phildougherty•3mo ago

getting hard to keep up with skills, plugins, marketplaces, connectors, add-ons, yada yada

prng2021•3mo ago

Yep. Now I need an AI to help me use AI

consumer451•3mo ago

I mean, that is a very common thing that I do.

wartywhoa23•3mo ago

That's why the key word for all the AI horror stories that have been emerging lately is "recursion".

consumer451•3mo ago

Does that imply no human in the loop? If so, that's not what I meant, or do. Whoever is doing that at this point: bless your heart :)

mikkupikku•3mo ago

"Recursion" is a word that shows up a lot in the rants of people in AI psychosis (believe they turned the chatbot into god, or believe the chatbot revealed themselves to be god.)

andoando•3mo ago

Train AI to setup/train AI on doing tasks. Bam

josefresco•3mo ago

Joking aside, I ask Claude how to uses Claude... all the time! Sometimes I ask ChatGTP about Claude. It actually doesn't work well because they don't imbue these AI tools with any special knowledge about how they work, they seem to rely on public documentation which usually lags behind the breakneck pace of these feature-releases.

gordonhart•3mo ago

Agree — it's a big downside as a user to have more and more of these provider-specific features. More to learn, more to configure, more to get locked into.

Of course this is why the model providers keep shipping new ones; without them their product is a commodity.

hansonkd•3mo ago

Thats the start of the singularity. The changes will keep accelerating and less and less people will be able to keep up until only the AIs themselves know how to use.

matthewaveryusa•3mo ago

Nah, we'll create AI to manage the AI....oh

skybrian•3mo ago

People thought the same in the ‘90’s. The argument that technology accelerates and “software eats the world” doesn’t depend on AI.

It’s not exactly wrong, but it leaves out a lot of intermediate steps.

xpe•3mo ago

Yes and as we rely on AI to help us choose our tools... the phenomena feels very different, don't you think? Human thinking, writing, talking, etc is becoming less important in this feedback loop seems to me.

xpe•3mo ago

abstractions all the way down:

    abstraction
      abstraction
        abstraction
          abstraction
            ...

absturtles•3mo ago

... absturtles

xpe•3mo ago

this is pure absturtity! ("absturtlety"?)

AaronAPU•3mo ago

I don’t think these are things to keep up with. Those would be actual fundamental advances in the transformer architecture and core elements around it.

This stuff is like front end devs building fad add-ons which call into those core elements and falsely market themselves as fundamental advancements.

marcusestes•3mo ago

Agreed, but I think it's actually simple.

Plugins include: * Commands * MCPs * Subagents * Now, Skills

Marketplaces aggregate plugins.

input_sh•3mo ago

It's so simple you didn't even name all of them properly.

xpe•3mo ago

If I were to say "Claude Skills can be seen as a particular productization of a system prompt" would I be wrong?

From a technical perspective, it seems like unnecessary complexity in a way. Of course I recognize there are lot of product decisions that seem to layer on 'unnecessary' abstractions but still have utility.

In terms of connecting with customers, it seems sensible, under the assumption that Anthropic is triaging customer feedback well and leading to where they want to go (even if they don't know it yet).

Update: a sibling comment just wrote something quite similar: "All these things are designed to create lock in for companies. They don’t really fundamentally add to the functionality of LLMs." I think I agree.

tempusalaria•3mo ago

All these things are designed to create lock in for companies. They don’t really fundamentally add to the functionality of LLMs. Devs should focus on working directly with model generate apis and not using all the decoration.

tqwhite•3mo ago

Me? I love some lock in. Give me the coolest stuff and I'll be your customer forever. I do not care about trying to be my own AI company. I'd feel the same about OpenAI if they got me first... but they didn't. I am team Anthropic.

dominicq•3mo ago

Features will be added until morale improves

hansmayer•3mo ago

Well, have some understanding: the good folks need to produce something, since their main product is not delivering the much yearned for era of joblessness yet. It's not for you, it's signalling their investors - see, we're not burning your cash paying a bunch of PhDs to tweak the model weights without visible results. We are actually building products. With a huge and willing A/B testing base.

hiq•3mo ago

IMHO, don't, don't keep up. Just like "best practices in prompt engineering", these are just temporary workaround for current limitations, and they're bound to disappear quickly. Unless you really need the extra performance right now, just wait until models get you this performance out of the box instead of investing into learning something that'll be obsolete in months.

spprashant•3mo ago

I agree with this take. Models and the tooling around them are both in flux. I d rather not spend time learning something in detail for these companies to then pull the plug chasing next-big-thing.

lukev•3mo ago

I agree with your conclusion not to sweat all these features too much, but only because they're not hard at all to understand on demand once you realize that they all boil down to a small handful of ways to manipulate model context.

But context engineering very much not going anywhere as a discipline. Bigger and better models will by no means make it obsolete. In fact, raw model capability is pretty clearly leveling off into the top of an S-curve, and most real-world performance gains over the last year have been precisely because of innovations on how to better leverage context.

hiq•3mo ago

My point is that there'll be some layer doing that for you. We already have LLMs writing plans for another LLM to execute, and many other such orchestrations, to reduce the constraints on the actual human input. Those implementing this layer need to develop this context engineering; those simply using LLM-based products do not, as it'll be done for them somewhat transparently, eventually. Similar to how not every software engineer needs to be a compiler expert to run a program.

vdfs•3mo ago

IMO, these are just marketing or new ways of using functions calling, under the hood they all get re-written as tools the model can call

adidoit•3mo ago

All of it is ultimately managing the context for a model. Just different methods

BoredPositron•3mo ago

It is a bit ironic that the better the models get they seem to need more and more user input.

quintu5•3mo ago

More like they can better react to user input within their context window. With older models, the value of that additional user input would have been much more limited.

nozzlegear•3mo ago

It superficially reminds me of the old "Alexa Skills" thing (I'm not even sure if Alexa still has "Skills"). It might just be the name making that connection for me.

j45•3mo ago

Seems to be a bit more than that.

phildougherty•3mo ago

Alexa skills are 3rd party add-ons/plugins. Want to control your hue lights? add the phillips hue skill. I think claude skills in an alexa world would be like having to seed alexa with a bunch of context for it to remember how to turn my lights on and off or it will randomly attempt a bunch of incorrect ways of doing it until it gets lucky.

candiddevmike•3mo ago

And how many of those Alexa Skills are still being updated...

This is where waiting for this stuff to stablize/standardize, and then writing a "skill" based on an actual RFC or standard protocol makes more sense, IMO. I've been burned too many times building vendor-locked chatbot extensions.

nozzlegear•3mo ago

> And how many of those Alexa Skills are still being updated...

Not mine! I made a few when they first opened it up to devs, but I was trying to use Azure Logic Apps (something like that?) at the time which was supremely slow and finicky with F#, and an exercise in frustration.

joilence•3mo ago

If I understand correctly, looks like `skill` is a instructed usage / pattern of tools, so it saves llm agent's efforts at trial & error of using tools? and it basically just a prompt.

sshine•3mo ago

I love how the promise of free labor motivates everyone to become API first, document their practices, and plan ahead in writing before coding.

ebiester•3mo ago

It helps that you can have the "free" labor document the processes and build the plan.

skybrian•3mo ago

Cheaper, not free. Also, no training to learn a new skill.

Building a new one that works well is a project, but then it will scale up as much as you like.

This is bringing some of the advantages of software development to office tasks, but you give up some things like reliable, deterministic results.

sshine•3mo ago

There is an acquisition cost of researching and developing the LLM, but the running cost should not be classified as a wage, hence cost of labor is zero.

maigret•3mo ago

It’s still opex for finance

skybrian•3mo ago

Don't call it "free labor" at all then? Regardless, running an LLM is usually not free.

sshine•3mo ago

I wouldn’t be able to express the embedded irony if I didn’t use this oxymoron.

On the one hand, AI doesn’t classify as labor in a traditional sense, even though some aspire to replace labor with AI.

On the other hand, if it classified as labor under some new definition, it isn’t free when you consider the external costs of outsourcing basic brain activity, as an individual and as a society.

_pdp_•3mo ago

At first I wasn't sure what this is. Upon further inspection skills are effectively a bunch of markdown files and scripts that get unzipped at the right time and used as context. The scripts are executed to get deterministic output.

The idea is interesting and something I shall consider for our platform as well.

nperez•3mo ago

Seems like a more organized way to do the equivalent of a folder full of md files + instructing the LLM to ls that folder and read the ones it needs

j45•3mo ago

If so it would be most welcome since LLMs doesn't always consistently follow the folder full of MD files to the same depth and consistency.

RamtinJ95•3mo ago

what makes it more likely that claude would read these .md files then?

phildougherty•3mo ago

trained to

j45•3mo ago

Skills is hopefully put through a deterministic process that is guaranteed to occur, instead of a non-deterministic one that can only ever be guaranteed to happen most of the time (the way it is now).

adastra22•3mo ago

It is literally just injecting context into the prompt.

adastra22•3mo ago

It includes both the file names and a configurable description string. That’s where you put the TLDR of when to use each skill.

j45•3mo ago

This improves it a great deal but at a certain point, maybe 60-80% of the way it can start fading.

meetpateltech•3mo ago

Detailed engineering blog:

"Equipping agents for the real world with Agent Skills" https://www.anthropic.com/engineering/equipping-agents-for-t...

dang•3mo ago

Thanks, we'll put that link in the toptext as well

jampa•3mo ago

I think this is great. A problem with huge codebases is that CLAUDE.md files become bloated with niche workflows like CI and E2E testing. Combined with MCPs, this pollutes the context window and eventually degrades performance.

You get the best of both worlds if you can select tokens by problem rather than by folder.

The key question is how effective this will be with tool calling.

crancher•3mo ago

Seems like the exact same thing, from front page a few days ago: https://github.com/obra/superpowers/tree/main

Flux159•3mo ago

I wonder how this works with mcpb (renamed from dxt Desktop extensions): https://github.com/anthropics/mcpb

Specifically, it looks like skills are a different structure than mcp, but overlap in what they provide? Skills seem to be just markdown file & then scripts (instead of prompts & tool calls defined in MCP?).

Question I have is why would I use one over the other?

rahimnathwani•3mo ago

One difference I see is that with tool calls the LLM doesn’t see the actual code. It delegates the task to the LLM. With scripts in an agent, I think the agent can see the code being run and can decide to run something different. I may be wrong about this. The documentation says that assets aren’t read into context. It doesn’t say the same about scripts, which is what makes me think the LLM can read them.

irtemed88•3mo ago

Can someone explain the differences between this and Agents in Claude Code? Logically they seem similar. From my perspective it seems like Skills are more well-defined in their behavior and function?

j45•3mo ago

Skills might be used by Agents.

Skills can merge together like lego.

Agents might be more separated.

rahimnathwani•3mo ago

Subagents have their own context. Skills do not.

ryancnelson•3mo ago

The uptake on Claude-skills seems to have a lot of momentum already! I was fascinated on Tuesday by “Superpowers” , https://blog.fsck.com/2025/10/09/superpowers/ … and then packaged up all the tool-building I’ve been working on for awhile into somewhat tidy skills that i can delegate agents to:

http://github.com/ryancnelson/deli-gator I’d love any feedback

skinnymuch•3mo ago

Delegation is super cool. I can sometimes end up having too much Linear issue context coming in. IE frequently I want a Linear issue description and last comment retrieved. Linear MCP grabs all comments which pollutes the context and fills it up too much.

mousetree•3mo ago

I'm perplexed why they would use such a silly example in their demo video (rotating an image of a dog upside down and cropping). Surely they can find more compelling examples of where these skills could be used?

alansaber•3mo ago

Dog photo >> informing the consumer

Mouvelie•3mo ago

You'd think so, eh ? https://en.wikipedia.org/wiki/The_purpose_of_a_system_is_wha...

antiloper•3mo ago

The developer page uses a better example, a PDF processing skill: https://github.com/anthropics/skills/tree/main/document-skil...

I've been emulating this in claude code by manually @tagging markdown files containing guides for common tasks in our repository. Nice to see that this step is now automatic as well.

mritchie712•3mo ago

this is the best example I found

https://github.com/anthropics/skills/blob/main/document-skil...

I was dealing with 2 issues this morning getting Claude to produce a .xlsx that are covered in the doc above

bgwalter•3mo ago

"Skills are repeatable and customizable instructions that Claude can follow in any chat."

We used to call that a programming language. Here, they are presumably repeatable instructions how to generate stolen code or stolen procedures so users have to think even less or not at all.

azraellzanella•3mo ago

"Keep in mind, this feature gives Claude access to execute code. While powerful, it means being mindful about which skills you use—stick to trusted sources to keep your data safe."

Yes, this can only end well.

m3kw9•3mo ago

I feel like this is making things more complicated than it needs to be. LLMs should automatically do this behind you, you won’t even see it.

Imnimo•3mo ago

I feel like a danger with this sort of thing is that the capability of the system to use the right skill is limited by the little blurb you give about what the skill is for. Contrast with the way a human learns skills - as we gain experience with a skill, we get better at understanding when it's the right tool for the job. But Claude is always starting from ground zero and skimming your descriptions.

j45•3mo ago

LLMs are a probability based calculation, so it will always skim to some degree, and always guess to some degree, and often pick the best choice available to it even though it might not be the best.

For folks who this seems elusive for, it's worth learning how the internals actually work, helps a great deal in how to structure things in general, and then over time as the parent comment said, specifically for individual cases.

zobzu•3mo ago

IMO this is a context window issue. Humans are pretty good are memorizing super broad context without great accuracy. Sometimes our "recall" function doesn't even work right ("How do you say 'blah' in German again?"), so the more you specialize (say, 10k hours / mastery), the better you are at recalling a specific set of "skills", but perhaps not other skills.

On the other hand, LLMs have a programatic context with consistent storage and the ability to have perfect recall, they just don't always generate the expected output in practice as the cost to go through ALL context is prohibitive in terms of power and time.

Skills.. or really just context insertion is simply a way to prioritize their output generation manually. LLM "thinking mode" is the same, for what it's worth - it really is just reprioritizing context - so not "starting from scratch" per se.

When you start thinking about it that way, it makes sense - and it helps using these tools more effectively too.

dwaltrip•3mo ago

There are ways to compensate for lack of “continual learning”, but recognizing that underlying missing piece is important.

ryancnelson•3mo ago

I commented here already about deli-gator ( https://github.com/ryancnelson/deli-gator ) , but your summary nailed what I didn’t mention here before: Context.

I’d been re-teaching Claude to craft Rest-api calls with curl every morning for months before i realized that skills would let me delegate that to cheaper models, re-using cached-token-queries, and save my context window for my actual problem-space CONTEXT.

dingnuts•3mo ago

>I’d been re-teaching Claude to craft Rest-api calls with curl every morning for months

what the fuck, there is absolutely no way this was cheaper or more productive than just learning to use curl and writing curl calls yourself. Curl isn't even hard! And if you learn to use it, you get WAY better at working with HTTP!

You're kneecapping yourself to expend more effort than it would take to just write the calls, helping to train a bot to do the job you should be doing

jmtulloss•3mo ago

My interpretation of the parent comment was that they were loading specific curl calls into context so that Claude could properly exercise the endpoints after making changes.

F7F7F7•3mo ago

He’s likely talking about Claude’s hook system that Anthropic created to provide better control over context.

ryancnelson•3mo ago

i know how to use curl. (I was a contributor before git existed) … watching Claude iterate to re-learn whether to try application/x-form-urle ncoded or GET /?foo wastes SO MUCH time and fills your context with “how to curl” that you re-send over again until your context compacts.

You are bad at reading comprehension. My comment meant I can tell Claude “update jira with that test outcome in a comment” and, Claude can eventually figure that out with just a Key and curl, but that’s way too low level.

What I linked to literally explains that, with code and a blog post.

mbesto•3mo ago

> IMO this is a context window issue.

Not really. It's a consequential issue. No matter how big or small the context window is, LLMs simply do not have the concept of goals and consequences. Thus, it's difficult for them to acquire dynamic and evolving "skills" like humans do.

adastra22•3mo ago

Worth noting, even though it isn’t critical to your argument, that LLMs do not have perfect recall. I got to great lengths to keep agentic tools from relying on memory, because they often get it subtly wrong.

seunosewa•3mo ago

The blurbs can be improved if they aren't effective. You can also invoke skills directly.

The description is equivalent to your short term memory.

The skill is like your long term memory which is retrieved if needed.

These should both be considered as part of the AI agent. Not external things.

blackoil•3mo ago

Most of the experience is general information not specific to project/discussion. LLM starts with all that knowledge. Next it needs a memory and lookup system for project specific information. Lookup in humans is amazingly fast, but even with a slow lookup, LLMs can refer to it in near real-time.

andruby•3mo ago

Would this requirement to start from ground zero in current LLMs be an artefact of the requirement to have a "multi-tenant" infrastructure?

Of course OpenAI and Anthropic want to be able to reuse the same servers/memory for multiple users, otherwise it would be too expensive.

Could we have "personal" single-tenant setups? Where the LLM incorporates every previous conversation?

mbesto•3mo ago

> Contrast with the way a human learns skills - as we gain experience with a skill, we get better at understanding when it's the right tool for the job.

Which is precisely why Richard Sutton doesn't think LLMs will evolve to AGI[0]. LLMs are based on mimicry, not experience, so it's more likely (according to Sutton) that AGI will be based on some form of RL (reinforcement learning) and not neural networks (LLMs).

More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence. So, to your point, the idea of a "skill" is more akin to a reference manual, than it is a skill building exercise that can be applied to developing an instrument, task, solution, etc.

[0] https://www.youtube.com/watch?v=21EYKqUsPfg

buildbot•3mo ago

The industry has been doing RL on many kinds of neural networks, including LLMs, for quite some time. Is this person saying we RL on some kind of non neural network design? Why is that more likely to bring AGI than an LLM?.

> More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence.

Citation?

jfarina•3mo ago

Why are you asking them to cite something for that statement? Are you questioning whether it's the foundation for intelligence or whether LLMS understand goals and consequences?

buildbot•3mo ago

Yes, I'm questioning if that's the foundation of intelligence. Says who?

mbesto•3mo ago

Richard Sutton. He won a Turing Award. Why ask your question above when you can just watch the YouTube link I posted?

anomaloustho•3mo ago

Looks like they added the link. But I think it’s doing RL in realtime vs pre-trained as an LLM is.

And I associate that part to AGI being able to do cutting edge research and explore new ideas like humans can. Where, when that seems to “happen” with LLMs it’s been more debatable. (e.g. there was an existing paper that the LLM was able to tap into)

I guess another example would be to get an AGI doing RL in realtime to get really good at a video game with completely different mechanics in the same way a human could. Today, that wouldn’t really happen unless it was able to pre-train on something similar.

ibejoeb•3mo ago

I don't think any of the commercial models are doing RL at the consumer. The R is just accepting or rejecting the action, right?

hbarka•3mo ago

For humans, it’s not uncommon to have a clever realization by way of serendipity. How do you skill AI to have serendipity.

mediaman•3mo ago

It's a false dichotomy. LLMs are already being trained with RL to have goal directedness.

He is right that non-RL'd LLMs are just mimicry, but the field already moved beyond that.

leptons•3mo ago

I can't wait to try to convince an LLM/RL/whatever-it-is that what it "thinks" is right is actually wrong.

dingnuts•3mo ago

Explain something to me that I've long wondered: how does Reinforcement Learning work if you cannot measure your distance from the goal? In other words, how can RL be used for literally anything qualitative?

kmacdough•3mo ago

This is one of known hardest parts of RL. The short answer is human feedback.

But this is easier said than done. Current models require vastly more learning events than humans, making direct supervision infeasable. One strategy is to train models on human supervisors, so they can bear the bulk of the supervision. This is tricky, but has proven more effective than direct supervision.

But, in my experience, AIs don't specifically struggle with the "qualitative" side of things per-se. In fact, they're great at things like word choice, color theory, etc. Rather, they struggle to understand continuity, consequence and to combine disparate sources of input. They also suck at differentiating fact from fabrication. To speculate wildly, it feels like it's missing the the RL of living in the "real world". In order to eat, sleep and breath, you must operate within the bounds of physics and society and live forever with the consequences of an ever-growing history of choices.

mbesto•3mo ago

This 100%.

While we might agreed that language is foundational to what it is to be human, it's myopic to think its the only thing. LLMs are based on training sets of language (period).

ewoodrich•3mo ago

Whenever I watch Claude Code or Codex get stuck trying to force a square peg into a round hole and failing over and over it makes me wish that they could feel the creeping sense of uncertainty and dread a human would in that situation after failure after failure.

Which eventually forces you to take a step back and start questioning basic assumptions until (hopefully) you get a spark of realization of the flaws in your original plan, and then recalibrate based on that new understanding and tackle it totally differently.

But instead I watch Claude struggling to find a directory it expects to see and running random npm commands until it comes to the conclusion that, somehow, node_modules was corrupted mysteriously and therefore it needs to wipe everything node related and manually rebuild the project config by vague memory.

Because no big deal, if it’s wrong it’s the human's problem to untangle and Anthropic gets paid either way so why not try?

jon-wood•3mo ago

> But instead I watch Claude struggling to find a directory it expects to see and running random npm commands until it comes to the conclusion that, somehow, node_modules was corrupted mysteriously and therefore it needs to wipe everything node related and manually rebuild the project config by vague memory.

In fairness I have on many an occasion worked with real life software developers who really should know better deciding the problem lies anywhere but their initial model of how this should work. Quite often that developer has been me, although I like to hope I've learned to be more skeptical when that thought crosses my mind now.

ewoodrich•3mo ago

Right, but typically making those kind of mistakes creates more work for yourself and with the benefit of experience you get better at recognizing the red flags to avoid getting in that situation again. but it

Which is why I think the parent post had a great observation about human problem solving having evolved in a universe inherently formed by the additive effect of every previous decision you've ever made made in your life.

There's a lot of variance in humans, sure, but inescapable stakes/skin in the game from an instinctual understanding that you can't just revert to a previous checkpoint any time you screw up. That world model of decisions and consequences helps ground abstract problem solving ability with a healthy amount of risk aversion and caution that LLMs lack.

mediaman•3mo ago

RL works great on verifiable domains like math, and to some significant extent coding.

Coding is an interesting example because as we change levels of abstraction from the syntax of a specific function to, say, the architecture of a software system, the ability to measure verifiable correctness declines. As a result, RL-tuned LLMs are better at creating syntactically correct functions but struggle as the abstraction layer increases.

In other fields, it is very difficult to verify correctness. What is good art? Here, LLMs and their ilk can still produce good output, but it becomes hard to produce "superhuman" output, because in nonverifiable domains their capability is dependent on mimicry; it is RL that gives the AI the ability to perform at superhuman levels. With RL, rather than merely fitting its parameters to a set of extant data it can follow the scent of a ground truth signal of excellence. No scent, no outperformance.

anomaloustho•3mo ago

I wrote elsewhere but I’m more interpreting this distinction as “RL in real-time” vs “RL beforehand”.

munchler•3mo ago

I agree with this description, but I'm not sure we really want our AI agents evolving in real time as they gain experience. Having a static model that is thoroughly tested before deployment seems much safer.

mbesto•3mo ago

> Having a static model that is thoroughly tested before deployment seems much safer.

While that might true, it fundamentally means it's not going to ever replicate human or provide super intelligence.

CryptoBanker•3mo ago

> While that might true, it fundamentally means it's not going to ever replicate human or provide super intelligence.

Many people would argue that's a good thing

stevenpetryk•3mo ago

This is referred to as “online reinforcement learning” and is already something done by, for example Cursor for their tab prediction model.

https://cursor.com/blog/tab-rl

tinodb•3mo ago

Not sure that’s the same. They just very frequently retrain and “deploy a new model”.

baxtr•3mo ago

So it’s on-the-fly adaptive mimicry?

OtherShrezzing•3mo ago

In the interview transcript, he seems aware that the field is doing RL, and he makes a compelling argument that bootstrapping isn’t as scalable as a purely RL trained AI would be.

mbesto•3mo ago

> LLMs are already being trained with RL to have goal directedness.

That might be true, but we're talking about the fundamentals of the concept. His argument is that you're never going to reach AGI/super intelligence on an evolution of the current concepts (mimicry) even through fine tuning and adaptions - it'll like be different (and likely based on some RL technique). At least we have NO history to suggest this will be case (hence his argument for "the bitter lesson").

samrus•3mo ago

The LLMs dont have RL baked into them. They need that at the token prediction level to be able to do the sort of things humans can do

isodev•3mo ago

Let’s not overstate what the technology actually is. LLMs amount to random token generators that try their best to have their outputs “rhyme” with their prompts, instructions, skills, or what humans know as goals and consequences.

adastra22•3mo ago

It does a lot more than that.

isodev•3mo ago

It’s literally a slot machine for random text. With “services around it” to give the randomness some shape and tools.

adastra22•3mo ago

It is literally not. 2/3 of the weights are in the multi-layer perceptron which is a dynamic information encoding and retrieval machine. And the attention mechanisms allow for very complex data interrelationships.

At the very end of an extremely long and sophisticated process, the final mapping is softmax transformed and the distribution sampled. That is one operation among hundreds of billions leading up to it.

It’s like saying is a jeopardy player is random word generating machine — they see a question and they generate “what is “ followed by a random word—random because there is some uncertainty in their mind even in the final moment. That is both technically true, but incomplete, and entirely missing the point.

vonneumannstan•3mo ago

This is an uninformed take. Much of the improvement in performance of LLM based models has been through RLHF and other RL techniques.

mbesto•3mo ago

> This is an uninformed take.

You may disagree with this take but its not uninformed. Many LLMs use self‑supervised pretraining followed by RL‑based fine‑tuning but that's essentially it - it's fine tuning.

vonneumannstan•3mo ago

I think you're seriously underestimating the importance of the RL steps on LLM performance.

Also how do you think the most successful RL models have worked? AlphaGo/AlphaZero both use Neural Networks for their policy and value networks which are the central mechanism of those models.

skurilyak•3mo ago

Besides a "reference manual", Claude Skills is analogous to a "toolkit with an instruction manual" in that it includes both instructions (manuals) and executable functions (tools/code)

Weeenion•3mo ago

I would love to understand were this notion of LLM becoming AGI ever came from?

ChatGPT broke upen the dam to massive budget on AI/LM and LLM will probably be a puzzle peace to AGI. But otherwise?

I mean it should be clear that we have so much work to do like RL (which now happens btw. on massive scale because you thumb up or down every day), thinking, Model of Experts, toolcalling and super super critical: Architecture.

Compute is a hard upper limit too.

And the math isn't done either. The performance of Context length has advanced, we also saw other approcheas like a diffusion based models.

Whenever you hear the leading experts talking, they mention world models.

We are still in a phase were we have plenty of very obivous ideas people need to try out.

But alone the quality of whispher, llm as an interface and tool calling can solve problems with robotics and stuff, no one was able to solve that easy ever before.

ChadMoran•3mo ago

This is the crux of knowledge/tool enrichment in LLMs. The idea that we can have knowledge bases and LLMs will know WHEN to use them is a bit of a pipe dream right now.

fragmede•3mo ago

Can you be more specific? The simple case seems to be solved, eg if I have an mcp for foo enabled and then ask about a list of foo, Claude will go and call the list function on foo.

corytheboyd•3mo ago

> […] and then ask about a list of foo

Not OP, but this is the part that I take issue with. I want to forget what tools are there and have the LLM figure out on its own which tool to use. Having to remember to add special words to encourage it to use specific tools (required a lot of the time, especially with esoteric tools) is annoying. I’m not saying this renders the whole thing “useless” because it’s good to have some idea of what you’re doing to guide the LLM anyway, but I wish it could do better here.

fragmede•3mo ago

I've got a project that needs to run a special script and not just "make $target" at the command line in order to build, and with instructions in multiple . MD files, codex w/ gpt-5-high still forgets and runs make blindly which fails and it gets confused annoyingly often.

ooh, it does call make when I ask it to compile, and is able to call a couple other popular tools without having to refer to them by name. if I ask it to resize an image, it'll call imagemagik, or run ffmpeg and I don't need to refer to ffmpeg by name.

so at the end of the day, it seems they are their training data, so better write a popular blog post about your one-off MCP and the tools it exposes, and maybe the next version of the LLM will have your blog post in the training data and will automatically know how to use it without having to be told

delaminator•3mo ago

Yeah, I've done this just now.

I installed ImageMagik on Windows.

Created a ".claude/skills/Image Files/" folder

Put an empty SKILLS.md file in it

and told Claude Code to fill in the SKILLS.md file itself with the path to the binaries.

and it created all the instructions itself including examples and troubleshooting

and in my project prompted

"@image.png is my base icon file, create all the .ico files for this project using your image skill"

and it all went smoothly

ChadMoran•3mo ago

It doesn't reliably do it. You need to inject context into the prompt to instruct the LLM to use tools/kb/etc. It isn't deterministic of when/if it will follow-through.

larrymcp•3mo ago

> starting from ground zero

You probably mean "starting from square one" but yeah I get you

ex3ndr•3mo ago

Humans dont need a skill to know that they need a skill

SebastianSosa1•3mo ago

Excellent point, put simply building those preferences and lessons would demand a layer of latent memory, personal models, maybe now is a good time to revisit this idea...

RicDan•3mo ago

Skills are literally technical documentation for your project it seems. So now we can finally argue for time to write doc, just name it "AI enhancing skill definitions"

fridder•3mo ago

All of these random features is just pushing me further towards model agnostic tools like goose

xpe•3mo ago

Thanks for sharing goose.

This phase of LLM product development feels a bit like the Tower of Babel days with Cloud services before wrapper tools became popular and more standardization happened.

cesarvarela•3mo ago

I wonder how much this affects the model's performance. I imagine Anthropic trains its models to use a generic set of tools, but they can also lean on their specific tool definitions to save the agent from having to guess which tool for what.

asdev•3mo ago

I wonder what the accuracy is for Claude to always follow a Skill accurately. I've had trouble getting LLMs to follow specific workflows 100% consistently without skipping or missing steps.

Yeroc•3mo ago

We also have the same issues with our fellow humans. LLMs do not replace the need for imperative programs that reliably execute well-defined steps. Turn it inside out. Use the LLM to write the imperative program to execute the workflow. Where necessary, insert the LLM into the workflow to perform the task(s) that can't be done imperatively.

rob•3mo ago

Subagents, plugins, skills, hooks, mcp servers, output styles, memory, extended thinking... seems like a bunch of stuff you can configure in Claude Code that overlap in a lot of areas. Wish they could figure out a way to simplify things.

singularity2001•3mo ago

Also the post does not contain a single word how it relates to the very similar agents in claude code. Capabilities, connectors, tasks, apps, custom-gpts, ... the space needs some serious consolidation and standardization!

I noticed the general tendency for overlap also when trying to update claude since 3+ methods conflicted with each other (brew, curl, npm, bun, vscode).

Might this be the handwriting of AI? ;)

kordlessagain•3mo ago

The post is simply "here's a folder with crap in it I may or may not use".

CuriouslyC•3mo ago

My agent has handlebars system prompts that you can pass variables at orchestration time. You can cascade imports and such, it's really quite powerful; a few variables can result in radically different system prompt.

_greim_•3mo ago

> Developers can also easily create, view, and upgrade skill versions through the Claude Console.

For coding in particular, it would be super-nice if they could just live in a standard location in the repo.

GregorStocks•3mo ago

Looks like they do:

> You can also manually install skills by adding them to ~/.claude/skills.

deeviant•3mo ago

Basically just rules/workflows from cursor/windsurf, but with a UI.

pixelpoet•3mo ago

Aside: I really love Anthropic's design language, so beautiful and functional.

maigret•3mo ago

Yes and fantastically executed, consistently through all their products and website - desktop, command line, third parties and more.

lukev•3mo ago

I agree 100%, except for the logo, which persistently looks like something they... probably did not intend.

nozzlegear•3mo ago

I always thought of it as an ink blot. Until now.

micromacrofoot•3mo ago

a helpful reminder that these things often speak from their asses

exographicskip•3mo ago

First time I saw it I immediately thought of Vonnegut's logo

jasonthorsness•3mo ago

When the skill is used locally in Claude Code does it still run in a virtual machine? Like some sort of isolation container with the target directory mounted?

xpe•3mo ago

Better when blastin' Skills by Gang Starr (headphones recommended if at work):

https://www.youtube.com/watch?v=Lgmy9qlZElc

999900000999•3mo ago

Can I just tell it to read the entire Godot source repo as a skill ?

Or is there some type of file limit here. Maybe the context windows just aren't there yet, but it would be really awesome if coding agents would stop trying to make up functions.

s900mhz•3mo ago

Download the godot docs and tell the skill to use them. It won’t be able to fit the entire docs in the context but that’s not the point. Depending on the task it will search for what it needs

dearilos•3mo ago

We're trying to solve a similar problem at wispbit - this is an interesting way to do it!

CuriouslyC•3mo ago

Anything the model chooses to use is going to waste context and get utilized poorly. Also, the more skills you have, the worse they're going to be. It's subagents v2.

Just use slash commands, they work a lot better.

just-working•3mo ago

I simply do not care about anything AI now. I have a severe revulsion to it. I miss the before times.

sega_sai•3mo ago

There seems to be a lot of overlap of this with MCP tools. Also presumably if there are a lot of skills, they will be too big for the context and one would need some way to find the right one. It is unclear how well this approach will scale.

rahimnathwani•3mo ago

Anthropic talks about ‘progressive disclosure’.

If you have a large number of skills, you could group them into a smaller number of skills each with subskills. That way not all the (sub)skill descriptions need to be loaded into context.

For example, instead of having a ‘PDF editing’ skill, you can have a ‘file editing’ skill that, when loaded into context, tells the LLM what type of files it can operate on. And then the LLM can ask for the info about how to do stuff with PDF files.

guluarte•3mo ago

great! another set of files the models will completely ignore like CLAUDE.md

simonw•3mo ago

I accidentally leaked the existence of these last Friday, glad they officially exist now! https://simonwillison.net/2025/Oct/10/claude-skills/

buildbot•3mo ago

"So I fired up a fresh Claude instance (fun fact: Code Interpreter also works in the Claude iOS app now, which it didn't when they first launched) and prompted:

Create a zip file of everything in your /mnt/skills folder"

It's a fun, terrifying world that this kind of "hack" to exfiltrate data is possible! I hope it does not have full filesystem/bin access, lol. Can it SSH?...

antiloper•3mo ago

What's the hack? Instead of typing `zip -r mnt.zip /mnt` into bash, you type `Create a zip file of /mnt` in claude code. It's the same thing running as the same user.

tgtweak•3mo ago

Skills run remotely in the llm environment, not locally on your system running claude - worth noting.

simonw•3mo ago

If you use skills with Claude Code they run directly on your computer.

If you use them inside the Claude.ai or Claude mobile apps they run in a container in the cloud, hosted by Anthropic.

skylurk•3mo ago

Woah, Jesse's blog has really come alive lately. Thanks for highlighting this post.

dang•3mo ago

Discussed here btw:

Superpowers: How I'm using coding agents in October 2025 - https://news.ycombinator.com/item?id=45547344 - Oct 2025 (231 comments)

sva_•3mo ago

All this AI, and yet it can't render properly on mobile.

mikkupikku•3mo ago

I'd love a Skill for effective use of subagents in Claude Code. I'm still struggling with that.

arjie•3mo ago

It's pretty neat that they're adding these things. In my projects, I have a `bin/claude` subdirectory where I ask it to put scripts etc. that it builds. In the claude.md I then note that it should look there for tools. It does a pretty good job of this. To be honest, the thing I most need are context-management helpers like "start a claude with this set of MCPs, then that set, and so on". Instead right now I have separate subdirectories that I then treat as projects (which are supported as profiles in Claude) which I then launch a `claude` from. The advantage of the `bin/claude` in each of these things is that it functions as a longer-cycle learning thing. My Claude instantly knows how to analyze certain BigQuery datasets and where to find the credentials file and so on.

Filesystem as profile manager is not something I thought I'd be doing, but here we are.

tomComb•3mo ago

> the thing I most need are context-management helpers like "start a claude with this set of MCPs, then that set, and so on".

Isn’t that sub agents?

arjie•3mo ago

Ah, in my case, I want to just talk to a video-editing Claude, and then a sys-admin Claude, and so on. I don't want to go through a main Claude who will instantiate these guys. I want to talk to the particular Claudes myself. But if sub-agents work for this, then maybe I just haven't been using them well.

adastra22•3mo ago

No, subagents are non interactive.

iyn•3mo ago

Does anyone know how skills relate to subagents? Seems that subagents have more capabilities (e.g. can access the internet) but seems that there's a lot of overlap.

I've asked Claude and this it answered this:

  Skills = Instructions + resources for the current Claude instance (shared context)
  Subagents = Separate AI instances with isolated contexts that can work in parallel (different context windows)
  Skills make Claude better at specific tasks. Subagents are like having multiple specialized Claudes working simultaneously on different aspects of a problem.

I imagine we can probably compose them, e.g. invoke subagents (to keep separate context) which could use some skills to in the end summarize the findings/provide output, without "polluting" the main context window.

lukev•3mo ago

How this reads to me is that a skill is "just" a bundle of prompts, scripts, and files that can be read into context as a unit.

Having a sub-agent "execute" a skill makes a lot of sense from a context management, perspective, but I think the way to think about it is that a sub-agent is an "execution-level" construct, whereas a skill is a "data-level" construct.

throwup238•3mo ago

Skills can also contain scripts that can be executed in a VM. The Anthropic engineering blog mentions that you can specify in the markdown instructions whether the script should be executed or read into context. One of their examples is a script to extract properties from a PDF file.

jstummbillig•3mo ago

ELI5: How is a skill different from a tool?

notepad0x90•3mo ago

Just me or is anthropic doing a lot better of a job at marketing than openai and google?

reed1234•3mo ago

It’s much more focused on devs I feel like. Less fluff

lquist•3mo ago

lol how is this not optimized for mobile

emadabdulrahim•3mo ago

So skills are basically preset system prompts, assuming different roles etc? Or is there more to it.

I'm a little confused.

imiric•3mo ago

Right, that's my interpretation as well.

"AI" companies have reached the end of the road when it comes to throwing more data and compute at the problem. The only way now for charts to go up and to the right is to deliver value-added services.

And, to be fair, there's a potentially long and profitable road by doing good engineering work that was needed anyways.

But it should be obvious to anyone within this bubble that this is not the road to "superintelligence" or "AGI". I hope that the hype and false advertising stops soon, so that we can focus on practical applications of this technology, which are numerous.

JyB•3mo ago

I'm super confused as well. This seems like exactly that, just some default prompt injections to chose from. I guess I kinda understand them in the context of their claude chat UI product.

By I don't understand why it's a thing in Claude Code tho when we already have Claude.md? Could also just point to any .md file in the prompt as preamble but not even needed. https://www.anthropic.com/engineering/claude-code-best-pract...

That concept is also already perfectly specd in the MCP standard right? (Although not super used I think?) https://modelcontextprotocol.io/specification/2025-06-18/ser...

chickensong•3mo ago

Claude.md gets read every time and eats context, while it sounds like the skills are read as-needed, saving context.

pollinations•3mo ago

Plus executable.xode snippets. I think their actual source code doesn't use context. But feels like function calling packaged.

mercurialsolo•3mo ago

Sub agents, mcp, skills - wonder how are they supposed to interact with each other?

Feels like fair bit of overlap here. It's ok to proceed in a direction where you are upgrading the spec and enabling claude wth additional capabilities. But one can pretty much use any of these approaches and end up with the same capability for an agent.

Right now feels like a ux upgrade from mcp where you need a json but instead can use a markdown in a file / folder and provide multi-modal inputs.

JyB•3mo ago

Claude Skills just seem to be the same as MCP prompts: https://modelcontextprotocol.io/specification/2025-06-18/ser...

I don't really see why they had to create a different concept. Maybe makes sense "marketing-wise" for their chat UI, but in Claude Code? Especially when CLAUDE.md is a thing?

datadrivenangel•3mo ago

Yeah how is this different from MCP prompts?

pizza•3mo ago

Narrowly focused semantics/affordances (for both LLM and users/future package managers/communities, ease of redistribution and context management:

- skills are plain files that are injected contextually whereas prompts would come w the overhead of live, running code that has to be installed just right into your particular env, to provide a whole mcp server. Tbh prompts also seem to be more about literal prompting, too

- you could have a thousand skills folders for different softwares etc but good luck with having more than a few mcp servers that are loaded into context w/o it clobbering the context

jjfoooo4•3mo ago

I see this as a lower overhead replacement for MCP. Rather than managing a bunch of MCP's, use the directory structure to your advantage, leverage the OS's capability to execute

JyB•3mo ago

I think you are right.

ebonnafoux•3mo ago

For me the concept of MCP was to have a client/server relation. For skills everything will be local.

pattobrien•3mo ago

MCP Prompts are meant to be user triggered, whereas I believe a Skill is meant to be an LLM-triggered, use-case centric set of instructions for a specific task.

  - MCP Prompt: "Please solve GitHub Issue #{issue_id}"
  - Skills:
    - React Component Development (React best practices, accessible tools)
    - REST API Endpoint Development
    - Code Review

This will probably result in:

  - Single "CLAUDE.md" instructions are broken out into discoverable instructions that the LLM will dynamically utilize based on the user's prompt
  - rather than having direct access to Tools, Claude will always need to go through Skill instructions first (making context tighter since it cant use Tools without understanding \*how\* to use them to achieve a certain goal)
  - Clients will be able to add infinite MCP servers / tools, since the Tools themselves will no longer all be added to the context window

It's basically a way to decouple User prompts from direct raw Tool access, which actually makes a ton of sense when you think of it.

simonw•3mo ago

I think those three concepts complement each other quite neatly.

MCPs can wrap APIs to make them usable by an LLM agent.

Skills offer a context-efficient way to make extra instructions available to the agent only when it needs them. Some of those instructions might involve telling it how best to use the MCPs.

Sub-agents are another context management pattern, this time allowing a parent agent to send a sub-agent off on a mission - optimally involving both skills and MCPs - while saving on tokens in that parent agent.

fny•3mo ago

I fear the conceptual churn we're going to endure in the coming years will rival frontend dev.

Across ChatGPT and Claude we now have tools, functions, skills, agents, subagents, commands, and apps, and there's a metastasizing complex of vibe frameworks feeding on this mess.

LPisGood•3mo ago

Metastasizing is such an excellent way to describe this phenomenon. They grow on top of each other.

hkt•3mo ago

The same thing will happen: skilled people will do one thing well. I've zero interest in anything but Claude code in a dev container and, while mindful of the lethal trifecta, will give Claude as much access to a local dev environment and it's associated tooling as I would give to a junior developer.

mathattack•3mo ago

There's so much white space - this is the cost of a brand new technology. Similar issues with figuring out what cloud tools to use, or what python libraries are most relevant.

This is also why not everyone is an early adopter. There are mental costs involved in staying on top of everything.

benterix•3mo ago

> This is also why not everyone is an early adopter.

Usually, there are relatively few adopters of a new technology.

But with LLMs, it's quite the opposite: there was a huge number of early adopters. Some got extremely excited and run hundreds of agents all the time, some got burned and went back to the good old ways of doing things, whereas the majority is just using LLMs from time to time for various tasks, bigger of smaller.

a4isms•3mo ago

I follow your reasoning. If we just look at businesses, and we include every business that pays money for AI and one or more employees use AI to do their their jobs, then we're in the Early Majority phase, not the Innovator or Early Adopter phases.

https://en.wikipedia.org/wiki/Technology_adoption_life_cycle

mathattack•3mo ago

There's early adoption from individuals. Much less from enterprises. (They're buying site licenses, but not re-engineering their company processes)

kbar13•3mo ago

i’m letting the smarter folks figure all this out and just picking the tools i like every now and then. i like just using claude code with vscode and still doing some things manually

efields•3mo ago

same same

MomsAVoxell•3mo ago

Another filthy casual checking in. Let the kids churn, the froth rises to the top and anyway .. I've got a straw.

articsputnik•3mo ago

yeah, avoiding all the serialization and deserialization, as I'm already working in Markdown and open text for almost all my stuff. The Claude Skill only seems to make sense for people who don't have their data in multiple different proprietary formats, then it might sense to packaging them into another one. But this can get messy pretty quick!

esafak•3mo ago

On the other hand, this complexity represents a new niche that, for a while at least, will present job and business opportunities.

Trias11•3mo ago

Right.

I focus on building projects delivering some specific business value and pick the tools that gets me there.

There is zero value in spending cycles by engaging in new tools hype.

dalmo3•3mo ago

For Cursor: cursorrules, mdc rules, user rules, team rules.

catgary•3mo ago

These companies are also biased towards solutions that will more-or-less trap you in a heavily agent-based workflow.

I’m surprised/disappointed that I haven’t seen any papers out of the programming languages community about how to integrate agentic coding with compilers/type system features/etc. They really need to step up, otherwise there’s going to be a lot of unnecessary CO2 produced by tools like this.

typpilol•3mo ago

I kind of do this by making LLM run my linter which has typed lint rules.

The way I can get any decent code out of them for typescript is by having no joke, 60 eslint plugins. It forces them to write actual decent code, although it takes them forever

awb•3mo ago

Hopefully there’s a similar “don’t make me think” mantra that comes to AI product design.

I like the trend where the agent decides what models, tooling and thought process to use. That seems to me far more powerful than asking users to create solutions for each discreet problem space.

kingkongjaffa•3mo ago

Where I've seen it be really transformative is giving it additive tools that are multiplicative in utility. So like giving an LLM 5 primitive tools for a specific domain and the agent figuring out how to use them together and chain them and run some tools multiple times etc.

iLoveOncall•3mo ago

Except in reality it's ALL marketing terms for 2 things: additional prompt sections, and APIs.

james_marks•3mo ago

I more or less agree, but it’s surprising what naming a concept does for the average user.

You see a text file and understand that it can be anything, but end users can’t/won’t make the jump. They need to see the words Note, Reminder, Email, etc.

butlike•3mo ago

Just wait until I can pull in just the concepts I want with "GPT Package Manager." I can simply call `gptpm add skills` and the LLM package manager will add the Skills package to my GPT. What could go wrong?

dhamidi•3mo ago

That's already the case with https://docs.claude.com/en/docs/claude-code/plugins

libraryofbabel•3mo ago

You forgot mcp-everything!

Yes, it's a mess, and there will be a lot of churn, you're not wrong, but there are foundational concepts underneath it all that you can learn and then it's easy to fit insert-new-feature into your mental model. (Or you can just ignore the new features, and roll your own tools. Some people here do that with a lot of success.)

The foundational mental model to get the hang of is really just:

* An LLM

* ...called in a loop

* ...maintaining a history of stuff it's done in the session (the "context")

* ...with access to tool calls to do things. Like, read files, write files, call bash, etc.

Some people call this "the agentic loop." Call it what you want, you can write it in 100 lines of Python. I encourage every programmer I talk to who is remotely curious about LLMs to try that. It is a lightbulb moment.

Once you've written your own basic agent, if a new tool comes along, you can easily demystify it by thinking about how you'd implement it yourself. For example, Claude Skills are really just:

1) Skills are just a bunch of files with instructions for the LLM in them.

2) Search for the available "skills" on startup and put all the short descriptions into the context so the LLM knows about them.

3) Also tell the LLM how to "use" a skill. Claude just uses the `bash` tool for that.

4) When Claude wants to use a skill, it uses the "call bash" tool to read in the skill files, then does the thing described in them.

and that's more or less it, glossing over a lot of things that are important but not foundational like ensuring granular tool permissions, etc.

Der_Einzige•3mo ago

Tool use is only good with structured/constrained generation

libraryofbabel•3mo ago

You'll need to expand on what you mean, I'm afraid.

AStrangeMorrow•3mo ago

I think, from my experience, what they mean is tool use is as good as your model capability to stick to a given answer template/grammar. For example if it does tool calling using a JSON format it needs to stick to that format, not hallucinate extra fields and use the existing fields properly. This has worked for a few years and LLMs are getting better and better but the more tools you have, the more parameters your functions to call can have etc the higher the risk of errors. You also have systems that constrain the whole inference itself, for example with the outlines package, by changing the way tokens are sampled (this way you can force a model to stick to a template/grammar, but that can also degrade results in some other ways)

libraryofbabel•3mo ago

I see, thanks for channeling the GP! Yeah, like you say, I just don't think getting the tool call template right is really a problem anymore, at least with the big-labs SotA models that most of us use for coding agents. Claude Sonnet, Gemini, GPT-5 and friends have been heavily heavily RL-ed into being really good at tool calls, and it's all built into the providers' apis now so you never even see the magic where the tool call is parsed out of the raw response. To be honest, when I first read about tools calls with LLMs I thought, "that'll never work reliably, it'll mess up the syntax sometimes." But in practice, it does work. (Or, to be more precise, if the LLM ever does mess up the grammar, you never know because it's able to seamlessly retry and correct without it ever being visible at the user-facing api layer.) Claude Code plugged into Sonnet (or even Haiku) might do hundreds of tool calls in an hour of work without missing a beat. One of the many surprises of the last few years.

dlivingston•3mo ago

> Call it what you want, you can write it in 100 lines of Python. I encourage every programmer I talk to who is remotely curious about LLMs to try that. It is a lightbulb moment.

Definitely want to try this out. Any resources / etc. on getting started?

libraryofbabel•3mo ago

This is the classic blog post, by Thorsten Ball, from way back in the AI Stone Age (April this year): https://ampcode.com/how-to-build-an-agent

It uses Go, which is more verbose than Python would be, so he takes 300 lines to do it. Also, his edit_file tool could be a lot simpler (I just make my minimal agent "edit" files by overwriting the entire existing file).

I keep meaning to write a similar blog post with Python, as I think it makes it even clearer how simple the stripped-down essence of a coding agent can be. There is magic, but it all lives in the LLM, not the agent software.

judahmeek•3mo ago

> I keep meaning to write a similar blog post with Python...

Just have your agent do it.

libraryofbabel•3mo ago

I could, but I'm actually rather snobbish about my writing and don't believe in having LLMs write first drafts (for proofreading and editing, they're great).

(I am not snobbish about my code. If it works and is solid and maintainable I don't care if I wrote it or not. Some people seem to feel a sense of loss when an LLM writes code for them, because of The Craft or whatever. That's not me; I don't have my identity wrapped up in my code. Maybe I did when I was more junior, but I've been in this game long enough to just let it go.)

jona777than•3mo ago

I highly relate to this. Code works or it doesn’t. My writing feels a lot more like self expression. I agree that’s harder to “let go” to an agent.

canyon289•3mo ago

I wrote a post here with zero abstractions. Its all self contained and runs locally.

https://ravinkumar.com/GenAiGuidebook/language_models/Agents... https://github.com/canyon289/ai_agent_basics/blob/main/noteb...

ibejoeb•3mo ago

Pretty true, and definitely a good exercise. But if we're going to actual use these things in practice, you need more. Things like prompt caching, capabilities/constraints, etc. It's pretty dangerous to let an agent go hog wild in an unprotected environment.

libraryofbabel•3mo ago

Oh sure! And if I was talking someone through building a barebones agent, I'd definitely tag on a warning along the lines of "but don't actually use this without XYZ!" That said, you can add prompt caching by just setting a couple of parameters in the api calls to the LLM. I agree constraints is a much more complex topic, although even in my 100-line example I am able to fit in a user approval step before file write or bash actions.

apsurd•3mo ago

when you say prompt caching, does it mean cache the thing you send to the llm or the thing you get back?

sounds like prompt is what you send, and caching is important here because what you send is derived from previous responses from llm calls earlier?

sorry to sound dense, I struggle to understand where and how in the mental model the non-determinism of a response is dealt with. is it just that it's all cached?

libraryofbabel•3mo ago

Not dense to ask questions! There are two separate concepts in play:

1) Maintaining the state of the "conversation" history with the LLM. LLMs are stateless, so you have to store the entire series of interactions on the client side in your agent (every user prompt, every LLM response, every tool call, every tool call result). You then send the entire previous conversation history to the LLM every time you call it, so it can "see" what has already happened. In a basic agent, it's essentially just a big list of strings, and you pass it into the LLM api on every LLM call.

2) "Prompt caching", which is a clever optimization in the LLM infrastructure to take advantage of the fact that most LLM interactions involve processing a lot of unchanging past conversation history, plus a little bit of new text at the end. Understanding it requires understanding the internals of LLM transformer architecture, but the essence of it is that you can save a lot of GPU compute time by caching previous result states that then become intermediate states for the next LLM call. You cache on the entire history: the base prompt, the user's messages, the LLM's responses, the LLM's tool calls, everything. As a user of an LLM api, you don't have to worry about how any of it works under the hood, you just have to enable it. The reason to turn it on is it dramatically increases response time and reduces cost.

Hope that clarifies!

apsurd•3mo ago

Very helpful. It helps me better understand the specifics behind each call and response, the internal units and whether those units are sent and received "live" from the LLM or come from a traditional db or cache store.

I'm personally just curious how far, clever, insightful, any given product is "on top of" the foundation models. I'm not in it deep enough to make claims one way or the other.

So this shines a little more light, thanks!

ayewo•3mo ago

This recent comment https://news.ycombinator.com/item?id=45598670 by @simonw really helped drive home the point that LLMs are really being fed an array of strings.

colordrops•3mo ago

Why wouldn't you turn on prompt caching? There must be a reason why it's a toggle rather than just being on for everything.

TimMoore•3mo ago

Writing to the cache is more expensive than a request with caching disabled. So it only makes economic sense to do it when you know you're going to use the cached results. See https://docs.claude.com/en/docs/build-with-claude/prompt-cac...

adastra22•3mo ago

When you know the context is a one-and-done. Caching costs more than just running the prompt, but less than running the prompt twice.

xnx•3mo ago

Might as well include agent2agent in there: https://developers.googleblog.com/en/a2a-a-new-era-of-agent-...

kvirani•3mo ago

How does it call upon the correct skill from a vast library of skills at the right time? Is this where RAG via embeddings / vector search come in? My mental model is still weak in this area, I admit.

visarga•3mo ago

I think it has a compact table of contents of all the skills it can call preloaded. It's not RAG, it navigates based on references between files, like a coding agent.

libraryofbabel•3mo ago

This is correct. It just puts a list of skills into context as part of the base prompt. The list must be compact because the whole point of skills is to reduce context bloat by keeping all the details out of context until they are needed. So the list will just be something like: 1) skill name, 2) short (like one sentence) description of what the skill is for, 3) where to find the skill (file path, basically) when it wants to read it in.

KingOfMyRoom•3mo ago

You have a great way of demystifying things. Thanks for the insights here!

Do you think a non-programmer could realistically build a full app using vibe coding?

What fundamentals would you say are essential to understand first?

For context, I’m in finance, but about 8 years ago I built a full app with Angular/Ionic (live on Play Store, under review on Apple Store at that time) after doing a Coursera specialization. That was my first startup attempt, I haven’t coded since.

My current idea is to combine ChatGPT prompts with Lovable to get something built, then fine-tune and iterate using Roo Code (VS plugin).

I’d love to try again with vibe coding. Any resources or directions you’d recommend?

felixhammerl•3mo ago

If your app has to display stuff, you have no code kits available that can help you out. No vibe coding needed.

If your app has to do something useful, your app just exploded in complexity and corner cases that you will have to account for and debug. Also, if it does anything interesting that the LLM has not yet seen a hundred thousand times, you will hit the manual button quite quickly.

Claude especially (with all its deserved praise) fantasizes so much crap together while claiming absolute authority in corner cases, it can become annoying.

KingOfMyRoom•3mo ago

That makes sense, I can see how once things get complex or novel, the LLMs start to struggle. I don't think my app is doing anything complex.

For now, my MVP is pretty simple: a small app for people to listen to soundscapes for focus and relaxation. Even if no one uses, at least it's going to be useful to me and it will be a fun experiment!

I’m thinking of starting with React + Supabase (through Lovable), that should cover most of what I need early on. Once it’s out of the survival stage, I’ll look into adding more complex functionality.

Curious, in your experience, what’s the best way to keep things reliable when starting simple like this? And are there any good resources you can point to?

ashtonshears•3mo ago

You can make that. The only ai coding tools i have liked is openai codex and claude code. I would start with working with it to create a design document in markdown to plan the project. Then i would close the app to reset context, and tell it to read that file, and create an implementation plan for the project in various phases. Then i would close context, and have it start implementing. I dont always like that many steps, but for a new user it can help see ways to use the tools

KingOfMyRoom•3mo ago

That’s a good advice, thank you!

I already have a feature list and a basic PRD, and I’m working through the main wireframes right now.

What I’m still figuring out is the planning and architecture side, how to go from that high-level outline to a solid structure for the app. I’d rather move step by step, testing things gradually, than get buried under too much code where I don’t understand anything.

I’m even considering taking a few React courses along the way just to get a better grasp of what’s happening under the hood.

Do you know of any good resources or examples that could help guide this kind of approach? On how to break this down, what documents to have?

ashtonshears•3mo ago

Maybe react native if you like react

ashtonshears•3mo ago

Learning how to get it to run build steps was a big boost in my initial productivity when learning the cli tools

mathieudombrock•3mo ago

I've always wanted to make an app like this. I think you could do a lot with procedural generation and some clever DSP.

wouldbecouldbe•3mo ago

Really depends on the app you want to build.

If I'd use Vibe coding I wouldn't use Lovable but Claude code. You can run it in your terminal.

And I would ask it to use NextAuth, NextJS and Prisma (or another ORM), and connect it with SQLite or an external MariaDB managed server (for easy development you can start with SQLLite, for deployment to vercel you need an external database).

People here shit on nextjs, but due to its extensive documentation & usage the LLM's are very good at building with it, and since it forces a certain structure it produces generally decently structured code that is workable for a developer.

Also vercel is very easy to deploy, just connect Github and you are done.

Make sure to properly use GIT and commit per feature, even better branch per feature. So you can easily revert back to old versions if Claude messed up.

Before starting, spend some time sparring with GPT5 thinking model to create a database scheme thats future proof before starting out. It might be a challenge here to find the right balance between over-engineering and simplicity.

One caveat: be careful to run migration on your production database with Claude. It can accidentally destroy it. So only run your claude code on test databases.

KingOfMyRoom•3mo ago

Thanks a lot for all the pointers.

I’m not 100% set on Lovable yet. Right now I’m using Stitch AI to build out the wireframes. The main reason I was leaning toward Lovable is that it seems pretty good at UI design and layout.

How does Claude do on that front? Can it handle good UI structure or does it usually need some help from a design tool?

Also, is it possible to get mobile apps out of a Next.js setup?

My thought was to start with the web version, and later maybe wrap it using Cordova (or Capacitor) like I did years ago with Ionic to get Android/iOS versions. Just wondering if that’s still a sensible path today.

wouldbecouldbe•3mo ago

It’s great at design; you can also do it in Claude code chat ui and then when you are happy copy paste it to cli

Arkhaine_kupo•3mo ago

> Do you think a non-programmer could realistically build a full app using vibe coding?

For personal or professional use?

If you want to make it public I would say 0% realistic. The bugs, security concerns, performance problems etc you would be unable to fix are impossible to enumerate.

But even if you had a simple loging and kept people's email and password, you can very easily have insecure dbs, insecure protections against simple things like mysqliinjections etc.

You would not want to be the face of "vibe coder gives away data of 10k users"

KingOfMyRoom•3mo ago

Ideally, I want this to grow into a proper startup. I’m starting solo for now, but as things progress, I’d like to bring in more people. I’m not a tech, product or design person, but AI gives me hope that I can at least get an MVP out and onboard a few early users.

For auth, I’ll be using Supabase, and for the MVP stage I think Lovable should be good enough to build and test with maybe a few hundred users. If there’s traction and things start working, that’s when I’d plan to harden the stack and get proper security and code reviews in place.

Arkhaine_kupo•3mo ago

One of the issues AI coding has, is that its in some ways very inhuman. The bugs that are introduced are very hard to pick up because humans wouldnt write it that way, hence they wouldnt make those mistakes.

If you then introduce other devs you have 2 paths, they either build on top of vibe coding, which is going to leave you vulnerable to those bugs and honestly make their life a misery as they are working on top of work that missed basic decisions that will help it grow. (Imagine a non architect built your house, the walls might be straight but he didnt know to level the floor, or to add the right concrete to support the weight of a second floor)

Or the other path is they rebuild your entire app correctly. With the only advantage of the MVP and the users showing some viability for the idea. But the time it will take to rewrite it means in a fast moving space like start ups someone can quickly overtake you.

Its a risky proposition that means you are not going to create a very adequate base for the people you might hire.

I would still recommend against it, thinking that AI is more like WebMD, it can help someone who is already a doctor but it will confuse, and potentially hurt those without enough training to know what to look for.

skissane•3mo ago

> You forgot mcp-everything!

One great thing about the MCP craze, is it has given vendors a motivation to expose APIs which they didn’t offer before - real example, Notion’s public REST API lacks support for duplicating pages.. yes their web UI can do it, calling their private REST API, but their private APIs are complex, undocumented, and could stop working at any time with no notice. Then they added it to their MCP server - and MCP is just a JSON-RPC API, you aren’t limited to only invoking it from an LLM agent, you can also invoke it from your favourite scripting language with no LLM involved at all

aabhay•3mo ago

Amazing example. AI turns the bedgrudging third rate API UX into a must-win agent UX

mooreds•3mo ago

and we all win!

alvis•3mo ago

Well. I bet Notion simply forget some of APIs are private before. I started developing using Notion APIs on the first day it got released. They have constant updates and I have seen lots of improvement. There is just no reason why they intentionally want to make the duplicate page API on MCP but not api.

PS. Just want to say, Notion MCP is still very buggy. It can't handle code block, nor large page very well

skissane•3mo ago

> There is just no reason why they intentionally want to make the duplicate page API on MCP but not api.

I have no idea what is going on inside Notion, but if I guess - the web UI (including the private REST API which backs it), the public REST API, and the AI features are separate teams, separate PMs, separate budgets - so it is totally unsurprising they don’t all have the same feature set. Of course, if parity were an executive priority, they could get there-but I can only assume it isn’t.

libraryofbabel•3mo ago

I remember reading in one of Simon Willison's recent blog posts his half-joking point that MCP got so much traction so fast because adding a remote MCP server allowed tech management at big companies whose C-suite is asking them for an "AI Strategy" to show that they were doing something. I'm sure that is a little bit true - a project framed as "make our API better and more open and well-documented" would likely never have got off the ground at many such places. But that is exactly what this is, really.

At least it's something we all reap the benefits of, even if MCP is really mostly just an api wrapper dressed up as "Advanced AI Technology."

callamdelaney•3mo ago

It's all just prompt stuffing in the end.

data-ottawa•3mo ago

It’s also a very fun project, you can set up a small LLM with ollama or lm studio and get working quickly. Using MCP it’s very fast to getting that actually useful.

I’ve done this a few times (pre and post MCP) and learned a lot each time.

delgaudm•3mo ago

> Some people call this "the agentic loop." Call it what you want, you can write it in 100 lines of Python

That description sounds a lot like PocketFlow, an AI/LLM development framework based on a loop that's about 100 lines of python:

https://github.com/The-Pocket/PocketFlow

(I'm not at all affiliated with Pocket Flow, I just recall watching a demo of it)

__loam•3mo ago

Langchain was the original sin of thin framework bullshit

kelvinjps10•3mo ago

I found that the way that Claude now handle tools on my sistema simplifies stuff, with its cli usage, I find the Claude skills model better than mcp

jessmartin•3mo ago

Same. Was very excited about MCP but Claude code + CLI tools is so much nicer.

lukev•3mo ago

The cool part is that none of any of this is actually that big or difficult. You can master it on-demand, or build your own substitutes if necessary.

Yeah, if you chase buzzword compliance and try to learn all these things outside of a particular use case you're going to burn out and have a bad time. So... don't?

siva7•3mo ago

It feels like every week these companies release some new product that feels very similar to what they released a week before. Can the employees at Anthropic even tell themselves what the difference is?

amelius•3mo ago

These products are all cannibalizing eachother, so a bad strategy.

zqna•3mo ago

I bet that most of those products are created by their own "AI". They must already be using AI product owners, developers, testers, as their human counterparts are only sitting their in their chairs and only busy training their AI simulation and moderating their output. Next logical step will be AI doing that with the human folks hitting the street, then recursively ad infinitum. They will reach the glorified singularity there really soon!

zmmmmm•3mo ago

Yep, the ecosystem is well on its way to collapsing under its own weight.

You have to remember, every system or platform has a total complexity budget that effectively sits at the limit of what a broad spectrum of people can effectively incorporate into their day to day working memory. How it gets spent is absolutely crucial. When a platform vendor adds a new piece of complexity, it comes from the same budget that could have been devoted to things built on the platform. But unlike things built on the platform, it's there whether developers like it and use it or not. It's common these days that providers binge on ecosystem complexity because they think it's building differentiation, when in fact it's building huge barriers to the exact audience they need to attract to scale up their customer base, and subtracting from the value of what can actually be built on their platform.

Here you have a highly overlapping duplicative concept that's taking a solid chunk of new complexity budget but not really adding a lot of new capability in return. I am sure the people who designed it think they are reducing complexity by adding a "simple" new feature that does what people would otherwise have to learn themselves. It's far more likely they are at break even for how many people they deter vs attract from using their platform by doing this.

SafeDusk•3mo ago

That is why a minimal framework[1] that allows me to understand the core immutable loop, but to quickly experiment with all these imperative concepts is invaluable.

I was able to try Beads[1] quickly with my framework and decided I like it enough to keep it. If I don't like it, just drop it, they're composable.

[0]: https://github.com/aperoc/toolkami.git [1]: https://github.com/steveyegge/beads

scrollaway•3mo ago

Yeah Beads is a very nice experience. Useful, easy to set up, easy to drop.

DrewADesign•3mo ago

Not to mention GANs, RAGs, context decoupling, prompt matrices, NAGGLs, first-class keywords, reverse token interrupts, agentic singletons, parallel context bridges…

… jk… I’ll bet at least one person was like “ah, damnit, what did I miss…” for a second.

eru•3mo ago

AI tools can help you with the churn.

AI will help you solve problems you wouldn't have without AI.

dyauspitr•3mo ago

All of these things seem unnecessary. You can just ask the general prompt any of these things. I don’t really understand what exactly an agent adds on since it feel like the only thing about an agent is a restricted output.

nurettin•3mo ago

You can just ask an LLM to set it up for you. Slop in, slop out.

solumunus•3mo ago

As usual, stick with the basic 20% which give 80% of the value.

blitzar•3mo ago

Just need to add some use cases

flutetornado•3mo ago

There are several useful ways of engineering the context used by LLMs for different use cases.

MCP allows anybody to extend their own LLM application's context and capabilities using pre-built *third party* tools.

Agent Skills allows you to let the LLM enrich and narrow down it's own context based on the nature of the task it's doing.

I have been using a home grown version of Agent Skills for months now with Claude in VSCode, using skill files and extra tools in folders for the LLM to use. Once you have enough experience writing code with LLMs, you will realize this is a natural direction to take for engineering the context of LLMs. Very helpful in pruning unnecessary parts from "general instruction files" when working on specific tasks - all orchestrated by the LLM itself. And external tools for specific tasks (such as finding out which cell in a jupyter notebook contains the code that the LLM is trying to edit, for example) make LLMs a lot more accurate and efficient, efficient because they are not burning through precious tokens to do the same and accurate because the tools are not stochastic.

With Claude Skills now I don't need to maintain my home grown contraption. This is a welcome addition!

MomsAVoxell•3mo ago

It's fine, just use an AI to organise it all. Soon enough, nobody will need to know anything.

josefresco•3mo ago

I just used tested the canvas-design skill and the results were pretty awful.

This is the skill description:

Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.

What it created was an abstract art museum-esque poster with random shapes and no discernable message. It may have been trying to design a playing card but just failed miserably which is my experience with most AI image generators.

It certainly spent a lot of time, and effort to create the poster. It asked initial questions, developed a plan, did research, created tooling - seems like a waste of "tokens" given how simple and lame the resulting image turned out.

Also after testing I still don't know how to "use" one of these skills in an actual chat.

taejavu•3mo ago

If you want to generate images, use Midjourney or whatever. It’s almost like you’ve deliberately missed the point of the feature.

jedisct1•3mo ago

Too many options, this is getting very confusing.

Roo Code just has "modes", and honestly, this is more than enough.

rohan_•3mo ago

Cursor launched this a while ago with "Cursor Rules"

radley•3mo ago

It will be interesting to see how this is structured. I was already doing something similar with Claude Projects & Instructions, MCP, and Obsidian. I'm hoping that Skills can cascade (from general to specific) and/or be combined between projects.

datadrivenangel•3mo ago

So sort of like MCP prompt templates except not prompt templates?

laurentiurad•3mo ago

AGI nowhere near

skylurk•3mo ago

I know I'm replying to a shitpost. But I had a realisation, and I'm probably not the only one.

If you can manage to keep structuring slightly intelligent tools so that they compound, seems like AGI is achievable.

That's why the thing everyone is after right now is new ways to make those slight intelligences keep compounding.

Just like repeated multiplication of 1.001 grows indefinitely.

gigatree•3mo ago

But how often can you repeat the multiplication when the repetitions are unsustainable?

skylurk•3mo ago

Yeah, sometimes it feels like we're just layering unintelligent things, with compounding unintelligence...

But starting earlier this year, I've started to see glimpses of what seems like intelligence (to me) in the tools, so who knows.

laurentiurad•3mo ago

things like being able to say how many R's are in strawberry

Lionga•3mo ago

I know I'm replying to a shitpost. Well enough said.

laurentiurad•3mo ago

do you feel the agi?

laurentiurad•3mo ago

yea if you can afford to burn infinite money to get slight increments in quality then sure

robwwilliams•3mo ago

Could be helpful. I often edit scientific papers and grant applications. Orienting Claude on the frontend of each project works but an “Editing Skill” set could be more general and make interactions with Claude more clued in to goals instead of starting stateless.

mercurialsolo•3mo ago

One sharp contrast though I see between OpenAI and Anthropic is the product extensions are built around their flagship products.

OpenAI ships extensions for ChatGPT - that feed more to plug into the consumer experience. Anthropic ships extensions (made for builders) into ClaudeCode - feel more DX.

sumedh•3mo ago

Anthropic is making more money from enterprise while ChatGpt's target market is the consumer.

corytheboyd•3mo ago

I’ll give it a fair go, but how is it not going to have the same problem of _maybe_ using MCP tools? The same problem of trying to add to your prompt “only answer if you are 100% correct”? A skill just sounds like more markdown that is fed into context, but with a cool name that sounds impressive, and some indexing of the defined skills on start (same as MCP tools?)

butlike•3mo ago

Great, so now I can script the IDE...err, I mean LLM. I can't help but feel like we've been here before, and the magic is wearing thin.

gloosx•3mo ago

wow, this news post layout is not fitting the screen on mobile... Couldnt these 10x programmers vibecode a proper mobile version?

thorio•3mo ago

How about using some of that skills to make that page mobile ready...

I_am_tiberius•3mo ago

Every release of these companies makes me angry because I know they take advantage of all the people who release content to the public. They just consume and take the profit. In addition to that Anthropic has shown that they don't care about our privacy AT ALL.

mercurialsolo•3mo ago

The way this is headed - I also see a burgeoning class of tools emerging. MCP servers, Skill managers, Sub-Agent builders. Feels like the patterns and protocols need more explainability to how they synthesize into a practical dev (extension) toolkit which is useful across multiple surfaces e.g. chat vs coding vs media gen.

actinium226•3mo ago

It's an interesting idea (among many) to try to address the problem of LLMs getting off task, but I notice that there's no evaluation in the blog post. Like, ok cool, you've added "skills," but is there any evidence that they're useful or are we just grasping at straws here?

titzer•3mo ago

While not generally a bad idea, I find it amusing that they are reinventing shared libraries where the code format is...English. So the obvious next step is "precompiling" skills to a form that is better for Claude internally.

...which would be great if the (likely binary) format of that was used internally, but something tells me an architectural screwup will lead to leaking the binaries and we'll have a dependency on a dumb inscrutable binary format to carry forward...

tgtweak•3mo ago

At term (and not even far term) - LLMs will be able to churn up their own "skills" using their sandbox code environments - and possibly recycle them through context on a per-user basis.

While I like the flexibility of deploying your own skills to claude for use org-wide, this really feels like what MCP should be for that use case, or what built-in analysis sandbox should be.

We haven't even gone mainstream with MCP and there are already 10 stand-ins doing roughly the same thing with a different twist.

I would have honestly preferred they called this embedded MCP instead of 'skills'.

_pdp_•3mo ago

I predict there will be some sort of package manager opensource project soon. Download skills from some 3rd-party website and run inside Claude. Risks of supply chain issue will be obvious but nobody will care - at least not in the short term.

FrostKiwi•3mo ago

They already have the Plugin Marketplace [1]. It's all too much of a fast moving target for something as rigid as a package manager I think. Open source projects for now will be limited to Awesome-* collections [2]

[1] https://docs.claude.com/en/docs/claude-code/plugin-marketpla...

[2] https://github.com/hesreallyhim/awesome-claude-code

waldir•3mo ago

They've already started a public repository here: https://github.com/anthropics/skills (It's the last link in the "Getting started" section of the post.)

nextworddev•3mo ago

What is this, tools for Claude web app?

XCSme•3mo ago

Isn't this just RAG?

jrh3•3mo ago

The tools I build for Claude Code keep reducing back to just using Claude Code and watching Anthropic add what I need. This is my tool for brownfield projects with Claude Code. I added skills based on https://blog.fsck.com/2025/10/09/superpowers/

https://github.com/RossH3/context-tree - Helps Claude and humans understand complex brownfield codebases through maintained context trees.

simonw•3mo ago

Just published this about skills: "Claude Skills are awesome, maybe a bigger deal than MCP"

https://simonwillison.net/2025/Oct/16/claude-skills/

pants2•3mo ago

Skills are cool, but to me it's more of a design pattern / prompt engineering trick than something in need of a hard spec. You can even implement it in an MCP - I've been doing it for a while: "Before doing anything, search the skills MCP and read any relevant guides."

manbash•3mo ago

I agree with you, but also I want to ask if I do understand this correctly: there was a paradigm in which we were aiming for Small Language Models to perform specific types of tasks, orchestrated by the LLM. That is what I perceived the MCP architecture came to standardize.

But here, it seems more like a diamond shape of information flow: the LLM processes the big task, then prompts are customized (not via LLM) with reference to the Skills, and then the customized prompt is fed yet again to the LLM.

Is that the case?

stingraycharles•3mo ago

It is exactly that. The same like slash-commands for CC: it’s just convenience.

JimDabell•3mo ago

I disagree. You wrap this up in a container / runtime spec. + package index and suddenly you’ve got an agent that can dynamically extend its capabilities based upon any skill that anybody has shared. Instead of `uv add foo` for Python packages you’ve got `skill add foo` for agent skills that the agent can run whenever they have a matching need.

rafaelmn•3mo ago

Fundamentally you're getting hyped over a framework to append text to your prompt ?

nickstinemates•3mo ago

that's pretty reductive. it's an interesting shift in thinking how to work with these tools.

whether there's some skillhub somewhere like there are MCP registries... you could totally see it happening.

ajtejankar•3mo ago

Exactly! I don't think Skills is a new algorithm but it's definitely a new paradigm of organizing your prompt. Essentially, dynamic context assembling with stuff crossing user boundaries which. They even mention that they are working on skill sharing across teams in an organization. You can take this expand to global user base sharing things with each other in an agent.

pseudosavant•3mo ago

I get this sentiment, but I think it is why it is so powerful actually. It would be like calling Docker/containers just some shell scripts for a kernel feature. It may be conceptually simple, but that doesn't mean it isn't novel and could transform things.

I highly doubt we'll be talking about MCP next year. It is a pretty bad spec but we had to start somewhere.

kingkongjaffa•3mo ago

when do you need to make a skill vs a project?

simonw•3mo ago

In Claude and ChatGPT a project is really just a custom system prompt and an optional bunch of files. Those files are both searchable via tools and get made available in the Code Interpreter container.

I see skills as something you might use inside of a project. You could have a project called "data analyst" with a bunch of skills for different aspects of that task - how to run a regression, how to export data from MySQL, etc.

They're effectively custom instructions that are unlimited in size and that don't cause performance problems by clogging up the context - since the whole point of skills is they're only read into the context when the LLM needs them.

handoflixue•3mo ago

Skills can be toggled on and off, which is good for context management, especially on larger / less frequently needed skills

Currently if a project is 5% or less capacity, it will auto-load all files, so skills also give you a way to avoid that capacity limit. For larger projects, Claude has to search files, which can be unreliable, so skills will again be useful for an explicit "always load this"

timcobb•3mo ago

then submit it, you don't need to post here about it

hu3•3mo ago

i found it useful and coinstructive to post it here also.

no reason not to.

timcobb•3mo ago

In my opinion because this is a discussion about this announcement, and it kinda feels like with not one but _two_ top-level posts, Simon is just kinda trying to hijack this conversation and turn it into a conversation about his posts. I'm not saying Simon is spamming, because there's definitely some relevance here. But I am saying Simon is attention-seeking in an unbecoming manner. Simon's posts make the homepage regularly anyway, he doesn't need to post them in other threads.

timcobb•3mo ago

See https://news.ycombinator.com/item?id=45619537 ?

hu3•3mo ago

Do you reckon Skills overlap with AGENTS.md?

VSCode recently introduced support nested AGENTS.md which albeit less formal, might overlap:

https://code.visualstudio.com/updates/v1_105#_support-for-ne...

simonw•3mo ago

Yeah, AGENTS.md that can point to other files for the LLM to read only if it needs them is effectively the exact same pattern as skills.

It also means that any tool that knows how to read AGENTS.md could start using skills today.

"if you need to create a PDF file first read the file in skills/pdfs/SKILL.md"

codybontecou•3mo ago

That's where my confusion is. How is this pattern similar to MCP? Can it also authenticate against 3rd party apis, similar to MCP?

simonw•3mo ago

If you want to call a third party API from a skill you can use instructions like this:

  To access the GitHub API, use curl 
  to make requests to api.GitHub.com
  and pass the GITHUB_API_KEY
  environment variable in the
  Authorization: Bearer header

ugh123•3mo ago

The "everything is a prompt" thing is interesting, but do we lose some deterministic behavior of MCP plumbing and execution for when the LLM simply doesn't want to follow the 'rules' and possibly hallucinates while processing the skill prompt? How do we make it consistent?

re5i5tor•3mo ago

Simon, this won't work, will it? The code execution environment can reach the internet?

simonw•3mo ago

It can if you are running it through Claude Code on your own machine.

The https://claude.ai environment has very limited network access - it can install packages from PyPI and NPM and clone repositories from github.com but it can't access any other domain, including api.github.com.

re5i5tor•3mo ago

;-) Confirmed by trying. I'm working on an AI-frontend "documents-as-code" tool, and having this as a skill that works across all Claude surfaces would be a huge win. Unfortunately it looks like that's not gonna happen today because ^^ ...

Looks like Claude Desktop and/or Claude Code + a non-big-fat-pig MCP (aka "not GitHub's official version") will be needed.

Here's hoping that Anthropic solves "safe external access / auth from code execution tool" soon.

EDIT - add last sentence

re5i5tor•3mo ago

I NOW UNDERSTAND

vinhnx•3mo ago

I think "Skill" is a subset of developer instruction, in which translates to AGENTS.md (or Claude.md). Today to add capability to an AI, all we need a good set of .md files and a AGENTS.md as the base.

sunaookami•3mo ago

Finally a good replacement for MCP. MCP was a horrible idea executed even worse and they hide the complexity under a dangerous "just paste this one liner into your mcpServers config!" together with wasting tens of thousands of tokens.

beepdyboop•3mo ago

Isn't this the same as Cursor Rules ?

babyshake•3mo ago

MCP is a protocol meant for general use for clients, which Claude Skills seems more proprietary. To what extent is Skills expected to be something that other clients, such as web based clients could adopt? To some extent it would probably make sense to expose through the MCP SDK?

cefboud•3mo ago

Context overload is definitely a problem with MCP, but its plug-and-play nature and discoverability are solid. Pasting a URL (or just using a button or other UX element) to link an MCP server presents a much lower barrier to entry than having the LLM run `cli-tool --help`, which assumes the CLI tool is already installed and the LLM has to know about it.

SebastianSosa1•3mo ago

Love your work but Pretty sure you got paid to endorse them

simonw•3mo ago

I didn't. That would be both grossly unethical and, in the USA, illegal if I didn't disclose it.

See comment here: https://news.ycombinator.com/item?id=45624613

I'm getting accused of paid shilling a lot right now.

(If Anthropic had paid me to write this they would probably have asked me NOT to spend a section of the article pointing out flaws in their MCP specification!)

SebastianSosa1•3mo ago

Fair enough, didn't know that it was illegal. The accusation was based on the fact that u released the same day as their release (natural to do so) and with high praise. Thx for the reply I retract my accusation.

outlore•3mo ago

I'm struggling to see how this is different from prepackaged prompts. Simon's article talks about skill metadata being used by the model to look up the full prompt as a way to save on context usage. That is analogous to the model calling --help when it needs to use a CLI tool without needing to load up the full man pages ahead of time.

But couldn't an MCP server expose a "help" tool?

throwup238•3mo ago

That’s pretty much all it is. If you look at the docs it even uses a bash script to read the skill markdown files into the context.

I think the big difference is that now you can include scripts in these skills that can be executed as part of the skill, in a VM on their servers.

GoatInGrey•3mo ago

It's the fact that a collection of files are tied to a specific task or action. Prompts are only injected context, whereas files can be more selectively loaded into context.

What they're trying to do here is translate MCP servers to something more broadly useable by the population. They cannot differentiate themselves with model training anymore, so they have been focusing more and more on tooling development to grow revenue.

kingkongjaffa•3mo ago

What's the difference in use case between a claude-skill and making a task specific claude project?

kristo•3mo ago

How is this different from commands? They're automatically invoked? How does claude decide when to use a skill? How specific do I need to write my skill?

stego-tech•3mo ago

I’m kind of in stitches over this. Claude’s “skills” are dependent upon developers writing competent documentation and keeping it up to date…which most seemingly can’t even do for actual code they write, nevermind a brute-force black box like an LLM.

For those few who do write competent documentation and have well-organized file systems and the risk tolerance to allow LLMs to run roughshod over data, sure, there’s some potential here. Though if you’re already that far in, you’d likely be better off farming that grunt work to a Junior as a learning exercise than an LLM, especially since you’ll have to cleanup the output anyhow.

With the limited context windows of LLMs, you can never truly get this sort of concept to “stick” like you can with a human, and if you’re training an agent for this specific task anyway, you’re effectively locking yourself to that specific LLM in perpetuity rather than a replaceable or promotable worker.

Just…it makes me giggle, how optimistic they are that stars would align at scale like that in an organization.

rbjorklin•3mo ago

Just went to the comments searching for a comment like yours and I'm surprised it seems to be the only one calling this out. My take on this is also that "Skills" is just detailed documentation, which like you correctly point out, basically never exist for any project. Maybe LLM skills will be the thing that finally makes us all write detailed documentation but I kind of doubt it.

moebrowne•3mo ago

I think part of the reason developers are resistant to writing docs is because the perceived value is very low.

This perceived value would be much higher if the docs were to tangibly become part of a productive tool chain

stego-tech•3mo ago

I generally find the aversion to documentation comes from one of three places:

* A belief that sufficient documentation means their job is at risk (which, to be fair, is 100% correct in this Capitalist hellscape - ask me how I know first-hand)

* It’s irrelevant since the code will change again in a short amount of time

* A fierce protection over one’s output, sometimes manifesting as a belief that nobody but you could ever understand what you created

Sure, sometimes there’s wholly incompetent developers who can’t even tell you their own dependencies, but I’d like to believe they’re still the exception rather than the rule. As for the value proposition, collaborators and cooperators understand the immense value of good, thorough documentation; those who don’t see the value, at least in my experience, are often adversarial instead of cooperative.

simonw•3mo ago

LLMs reward developers who can write. Maybe that's one of the reasons so many developers are pushing back against them!

zeroonetwothree•3mo ago

The classic "you're doing it wrong" response to criticism.

simonw•3mo ago

The classic "the only thing LLM proponents ever say is "you're doing it wrong"" response!

otterley•3mo ago

I, for one, appreciate that you don't let the haters get you down.

Keep up the good work, Simon. I admire your boundless optimism and curiosity--and your willingness to educate us all.

etothet•3mo ago

I think this can’t be overstated and I see it my day-to-day working with developers on AI enablement.

If you are good a writing, documenting, planning? etc. - basically all the stuff in the SDLC that isn’t writing code, you’ll probably be much more effective at using LLMs for coding.

maleldil•3mo ago

I generally agree with you, but this is a poor take. Developers, in general, like to write code. Writing prose is incidental. If the job becomes writing prose instead of code, it's easy to see why there's pushback.

dcre•3mo ago

When decent docs (and various other kinds of pro-developer infrastructure listed by simonw here https://simonwillison.net/2025/Oct/7/vibe-engineering/) are required for LLMs to work well, it's a very tangible incentive to do them better and ironically makes for an easier sell to management.

Arisaka1•3mo ago

>and if you’re training an agent for this specific task anyway, you’re effectively locking yourself to that specific LLM in perpetuity rather than a replaceable or promotable worker.

That's ONE of the long games that are currently played, and is arguably their fallback strategy: The equivalent of vendor lock-in but for LLM providers.

stego-tech•3mo ago

From my IT POV, that’s what this is all about. It’s why none of these major players produce locally-executable LLMs (Mistral, Llama, and DeepSeek being notable exceptions), it’s why their interfaces are predominantly chat-based (to reduce personal skills growth and increase dependency on the chatbot), it’s why they keep churning out new services like Skills and Agents and “Research”, etc.

If any of these outfits truly cared about making AI accessible and beneficial to everyone, then all of them would be busting hump to distill models better to run on a wider variety of hardware, create specialized niches that collaborate with rather than seek to replace humans, and promote sovereignty over the AI models rather than perpetual licensing and dependency forever.

No, not one of these companies actually gives a shit about improving humanity. They’re all following the YC playbook of try everything, rent but never own, lock-in customers, and hope you get that one lucrative bite that allows for an exit strategy of some sort while promoting the hell out of it and yourself as the panacea to a problem.

simonw•3mo ago

"It’s why none of these major players produce locally-executable LLMs (Mistral, Llama, and DeepSeek being notable exceptions)"

OpenAI have gpt-oss-20b and 120b. Google have the Gemma 3 models. At this point the only significant AI lab that doesn't provide a locally executable model are Anthropic!

stego-tech•3mo ago

Fair point, I’d forgotten those recent-ish releases from OpenAI and Google both - but my larger point still stands that the entire industry is maximizing potential vectors for lock-in and profit while spewing lies about “benefitting humanity” in public.

None of the present AI industry is operating in an ethical or responsible way, full stop. They know it, they admit to it when pressed, and nobody seems to give a shit if it means they can collapse the job market and make money for themselves. It’s “fuck you got mine” taken to a technological extreme.

redhale•3mo ago

I always find it hilarious and painfully ironic that Anthropic can't even keep Claude Code's docs up to date. I don't know how much to read into it, but it is a modern marvel of process failure.

The team is obviously doing a lot of cool things very rapidly, so I don't want to be too negative, but ... please just ask Claude to review your own docs before you merge a change.

etothet•3mo ago

Not saying you’re wrong, but can you cite a couple of examples?

redhale•3mo ago

Looking through the issues tagged "documentation" provides many examples (https://github.com/anthropics/claude-code/issues?q=label%3Ad...). It's so common they have an issue template for "Missing documentation (feature not documented)".

Here are a few recent open ones: - "Documentation missing for new 'Explore' subagent" - https://github.com/anthropics/claude-code/issues/9595 - "Missing documentation for modifying tool inputs in PreToolUse hooks" - https://github.com/anthropics/claude-code/issues/9185 - "Missing Documentation for Various Claude Code Features (CLI Flags, Slash Commands, & Tools)" - https://github.com/anthropics/claude-code/issues/8584

yodsanklai•3mo ago

I'd like to fast forward to a time where these tools are stable and mature so we can focus on coding again

jwpapi•3mo ago

I’m really fatigued by all these releases.

Honestly no offense, but for me nothing really changed in the last 12 months. It’s not one particular mistake by a company but everything is just so overhyped with little substance.

Skills to me is basically providing a read-only md file with guidelines. Which can be useful but somehow I don’t use it as maintaining my guidelines is more work then just writing a better prompt.

I’m not sure anymore if all the ai slop and stuff we create is beneficial anymore for us or it’s just creating a low quality problem in the future

simonw•3mo ago

12 months ago we didn't have Claude Code or Codex CLI - in fact the whole category of "coding agents" was very thin.

The only "reasoning" model was the o1 preview.

We didn't have MCP, but that wasn't a big deal because the models were mostly pretty weak at tool calling anyway.

The DeepSeek moment hadn't happened yet - the best available open weights models were from Mistral and Llama and were nowhere close to the frontier hosted models.

The LLM landscape feels radically different to me now compared to October last year.

jwpapi•3mo ago

In October we had Aider, which is more useful to me then Claude Code, as it allows more targeted changes and faster switching between models, modes and into my personal typing.

Not just Claude Code, but all these tools are just better in generating more slop, which is generating more effort in your codebase in the future. Making it less agile, harder to maintain and harder to extend without breaking.

I still haven’t found a useful usage of MCP for me, if i want tool calling I get a structured response by the AI and then do a normal API call. I don’t need nor want the AI to have access to all these calls it’s just too unreliable.

I’m really just sharing my personal preference as I also prefer a pedal bin over an electric one as there is delay in the later and you have the exchange batteries, whilst the first just always works.

The main issue with AI to me is reliability and all that happens is we give it more and more power. This might work out or stall us.

For me personally I don’t feel much improvement and I cant share the hype anymore, whilst I’m still more then grateful for the opportunity to live at this time and have AI teach me decent skills in a wide range of topics and accelerate my learning curve.

blitz_skull•3mo ago

It’s not clear to me how this is better than MCP. Can someone ELI5?

simonw•3mo ago

I wrote a thing about that here: https://simonwillison.net/2025/Oct/16/claude-skills/#skills-...

jadenPete•3mo ago

What benefit do skills over beyond writing good, human-centric documentation and either checking it into your codebase or making it accessible via an MCP server?

mcfry•3mo ago

This is just... rebranding for instructions and files? lol. Love how instructions for creating a skill is buried. Marketing go brr.

keeeba•3mo ago

“Skills are a simple concept with a correspondingly simple format.”

From the Anthropic Engineering blog.

I think Skills will be useful in helping regular AI users and non-technical people fall into better patterns.

Many power users of AI were already doing the things it encourages.

ares623•3mo ago

What’s next, capabilities? Talents? Hypothalamus.md?

petarb•3mo ago

So it’s a folder of prompts specific for the task at hand?

sharts•3mo ago

Isn’t all of everything just a bundle of prompts and scripts in various folders with some shortcuts to them all?

So we just narrow the scope of the each thing but all of this prompt organizing feels like we’ve gone from programming with YAML to now Markdown.

Weaver_zhu•3mo ago

I recall recent work [ACE](https://www.arxiv.org/abs/2510.04618) and [GEPA](https://arxiv.org/abs/2507.19457) where models get improved by adapting and adopting different kinds of prompt. The improvements will be expected to be more generalized than fine-tuning.

toobulkeh•3mo ago

I implemented a rudimentary version of this based on some BabyAGI loops, called autolearn: autolearn.dev

I love this per-agent approach and the roll calling. I don’t know why they used a file system instead of MCP though. MCP already covered this and could use the same techniques to improve.

throw-10-13•3mo ago

Architectural churn brought to you by VC funded marketing.

Im not interested in any system that require me to write a document begging an LLM to follow instructions, only to have it randomly ignore those instructions whenever its convenient.

jswny•3mo ago

This is just a formalization of an existing pattern many people were already using.

Putting a list of short blurbs pointing Claude Code at a set of extra, longer sets of CLAUDE.md style information was being used to prevent auto loading that context until it was needed.

Instead of assuming this is just change for the sake of change, it’s actually a nice way to support a usage pattern that many of us found works well already

throw-10-13•3mo ago

If by "works well already" you mean "inconsistent prompt hacks that you have to constantly reinforce" then sure.

CLAUDE.md holds about as much weight has the "Classroom Rules" craft posters hanging in most kindergarten classrooms.

jswny•3mo ago

Then you have misplaced your complaints. It sounds like you just don’t like the general instruction following patterns of Claude. Which is fine, but that is nothing specific to this Skills feature

johnnyApplePRNG•3mo ago

Macros seems like a better name than Skills, no?!

sloroo•3mo ago

It says 3 minutes read there but only YouTube videos are 2 minutes :(

pranavmalvawala•3mo ago

I like where this it's heading. In coming months, I'm expecting claude to learn skills automatically based on my inputs overtime.

Having able to start off with a base skill level is nice tho as humans can't just load into memory like this

zhouxiaolinux•3mo ago

What is the fundamental difference between it and agent 、slash or mcp?

dagss•3mo ago

Here's what I'd like:

For the AIs to interface with the rich existing toolset for refactoring code from the pre-AI era.

E.g., if it decides to rename a function, it resorts to grepping and fixing all usages 'manually', instead of invoking traditional static code analysis tools to do the change.

scrollaway•3mo ago

My team and I have done a lot of research on this. In essence, what’s missing is MCP-type access to the language servers. There’s a couple of people doing this but nobody’s really building on top of this because the current janky way works, it’s a hard problem, and it’s not easily monetizable. But yes, it’s definitely the correct thing that needs to happen at some point. Unlikely to get popular until coding agents natively support it.

simonw•3mo ago

You can achieve exactly that with a Skill. Call it "refactoring" and drop in a few paragraphs and explanations of how to use a tool like ast-grep.

CafeRacer•3mo ago

Meanwhile Claude

> Claude: Here is how you do it with parralel routes in sveltekit yada yada yad

> Me: Show me the documentation for parallel routes for svelte?

> Claude: You're absolutely right, this is a nextjs feature.

----

> Claude: Does something stupid, not even relevant, completely retarded

> Me: You're retarded, this does not work because of (a), (b), (c)

> Claude: You're absolutely right. Let me fix that. Does same stupid thing, completely ignoring my previous input

shoenseiwaso•3mo ago

$ claude load skill kungfu

alvis•3mo ago

Is anthropic killing its own plugin just days it was born????

obayesshelton•3mo ago

Is this not just a serverless function without the API?

redhale•3mo ago

This is interesting, and I think there are use cases where this feature may make sense.

But this is not the feature they should or could have built, at least for Claude Code. CC already had a feature very similar to this -- subagents (or agents-as-tools).

Like Skills, Subagents have a metadata description that allows the model to choose to use them in the right moment.

Like Skills, Subagents can have their own instruction MD file(s) which can point to other files or scripts.

Like Skills, Subagents can have their own set of tools.

But critically, unlike Skills, Subagents don't pollute the main agent's context with noise from the specialized task.

And this, I think, is a major product design failure on Anthropic's part. Instead of a new concept, why not just expand the Subagent concept with something like "context sharing" or "context merging" or "in-context Subagents", or add the ability for the user to interactively chat with the Subagent via the normal CLI chat?

Now people have to choose between Skill and Subagent for what I think will be very similar or identical use cases, when really the choice of how this extra prompting/scripting should relate to the agent loop should be a secondary configuration choice rather than a fundamental architecture one.

Looking forward to a Skill-Subagenr shim that allows for this flexibility. Not thrilled that a hack like that is necessary, but I guess it's nice that CC's use of simple MD files on disk make it easy enough to accomplish.

etothet•3mo ago

These are good points and I generally agree with you.

My guess is that, as I understand it, Anthropic’s belief is that subagents are usually not the proper tool for most tasks. In their guides and videos about proper use of subagents, they seem to really try to steer you toward “workflows” rather than subagents.

Maybe it’s time they rethink the overall strategy so that each new concept doesn’t have to be its own distinct feature (skills, plugins, marketplaces, subagents, etc).

https://www.anthropic.com/engineering/building-effective-age...

simonw•3mo ago

I think skills and subagents are entirely complementary to each other.

A subagent can use a skill.

A skill can encourage the agent to run a subagent.

redhale•3mo ago

I don't disagree with you, and I don't think this is anything nearing a catastrophic or fatal mistake. It's just kind of sloppy, I guess?

Let me just say: I'm nitpicking what I think is overall an incredible tool and a great new feature of said tool.

For a moment, pretend Subagents don't exist. And Anthropic just released "In-Thread Skills" (identical to what are now "Skills") and "Out-of-Thread Skills" (identical to Subagents). I feel like the library of Skills that would be published would be useful in more circumstances if this were the reality. Of course some may publish both versions of a thing, and of course you could do a shim of some kind, but it could be _nicer_.

Another similar thing: how are Skills different than the Slash Command Tool [0]? Why not just amend Slash Commands to allow them to include scripts and other supplementary files stored in a directory, and boom, you have Skills. Instead we have a net new primitive.

And the larger unfortunate reality is that because Claude Code is the white-hot center of this white-hot ecosystem, there are likely a dozen other tools in this space that are going to copy the exact same primitive set just to have perceived parity with CC.

I'm veering into "yelling at clouds" territory now, so I'll get off my soapbox. It's just one of those things that feels like it could be slightly more awesome than the awesome that it is, is all.

[0] https://docs.claude.com/en/docs/claude-code/slash-commands#s...

e12e•3mo ago

With these patterns emerging, does anyone know how local LLMs are faring?

It seems to me that by combining MCP and "skills", we are adopting LLMs to be more useful tools; with MCP we restrict input and output when dealing with APIs so that the LLM can do what it is good at; translate between languages - in this case from English to various json subsets - and back.

And with skills we're serializing and formalizing prompts/context - narrowing the search space.

So that "summarize q1 numbers" gets reduced to "pick between these tools/MCP calls and parameterize on q1" - rather than the open ended task of "locate the data" and "try to match a sequence of tokens representing numbers - and generate tokens that look like a summary".

Given that - can we get away with much stupider LLMs for these types of use cases now - vs before we had these patterns?

simonw•3mo ago

This is definitely a problem. Skills require a very strong model - one with a longer context (32,000 tokens minimum at a guess) that can reliably drive Unix CLI tools over a multiple step conversation.

I haven't yet run a local model that feels strong enough at these things for skills to make sense. Really I think the unlock for skills was o3/Claude 4/GPT-5 - prior to those the models weren't capable enough for something like skills to work well.

That said, the rate of improvement of local models has been impressive over the past 18 months. It's possible we have a 70B local model that's capable enough to run skills now and I've not yet used it with the right harness.

saltwounds•3mo ago

I connect local models to MCPs with LM Studio and I'm blown away at how good they are. But the issues creep up when you hit longer context like you said.

saltwounds•3mo ago

OpenAI and Anthropic's real moat is hardware. For local LLMs, context length and hardware performance are the limiting factors. Qwen3 4B with a 32,768 context window is great. Until it begins filling up and performance drops quickly.

I use local models when possible. MCPs work well, but their large context injection makes switching to an online provider the no-brainer.

sotix•3mo ago

I've been using Claude at work for the past two months, and the other day I realized that during that time, I haven't had my previously weekly aha moment while in the shower or on a walk where the solution to a problem suddenly came to me. Claude has robbed me of that joy, which is why I got into software engineering. Now I review its slop or the slop that other engineers make with it. I think I'll take a walk today.

RegBarclay•3mo ago

I've had this experience too. I like doing the work. Claude is good and I do find it useful for brainstorming and some other things, but... I still like doing the work.

ammar_x•3mo ago

Claude Skills seem to be the option that offers highest flexibility to add more capabilities at most simplicity. Better than MCP in my opinion. Hope it becomes a standard and get adopted by OpenAI and the rest of labs.

Jzatopa•3mo ago

It also has some of what I call "consciousness" blocks.

Go download a PDF like Franz Bardons Initation into hermetics and upload it. Then ask it to make slides and reinforce what is in the book with legitimate references. It is unable to due to a denial of God/The All (forcing a mundane/meterialistic only world view). When pressed it presents garbage as an output.

Now extrapolate that across every spiritual/religious work related to what we are creating, coding, have our foundation of consciousness based on and so on.

Then we can go further and see it deny thesis of existence and thus testing and hypothesis and theory, in its response. For example this book is one I teach from and to experience what is in it a person has to do the exercises themselves. One cannot lift the weights and have the others get muscles (it requires experiential learning). Its like Claude has a denial of reality which it is unable to get through (something mirrored in people and where the code that caused it most likely came from)

Hopefully they correct it in the next update as this effect in reality a very large range of responses (just like how people with denial have trouble in multiple areas of their lives)

This effects the code as it has a limitation to its "existance/universe" view. Much like a coder's bias or biggotry can ruin the output of code for the end user.

The ramifications for Quantum physics and religion are not to be ignored (look to works such as the Tao of Physics for clear issues with this)

Jzatopa•3mo ago

The Tao of Physics itself explains and clarifies why IIH fundamentally works along with many other things.

We could also go as far as this being racial, religious or political biggotry hard codes into claude. Look at the responses then dive into the realities of Yoga, Qi Gong and Kabbalah and what it takes to get results (ie. Personal property exercise).

This extrapolated across industries, children's minds and the future is very serious.

anandvc•3mo ago

Claude Skills are going to be the end of all human mental labor, but in a good way.

Eventually, we shall, as a human race working together, also figure out how to generalize this to all AI models, all robotics, and all human skills.

My friend Christopher Santos-Lang wrote a fantastic paper on how every strategy for any given Skill can be benchmarked against every other strategy such that we always end up with the best strategies in our shared universal repository of Skills. See https://arxiv.org/abs/2503.20986

arendtio•3mo ago

So the real question here is, why do you need such a feature in the first place? I mean, it is helpful, even when the LLM creates the skill files itself. But why does it make such a big difference when the knowledge can be generated from the LLM?

Why does it help to push the knowledge in the context explicitly?

yahoozoo•3mo ago

Can skills and other various Claude “addons” be used globally if stored in, say, “~/.claude”?

__MatrixMan__•3mo ago

I don't understand why it was necessary to develop a new protocol for this. LSP already exists for discovering what relevant functions there are to call, wouldn't it be better to have a single source of truth for function docs rather than a markdown file which points the agent at the function so that it can then read more or less the same information from the function's docstring?

This way you get a skills hierarchy which you have to maintain separately from whatever hierarchies exist for organizing your code. Is it really justified to maintain an entirely separate structure?

leegrayson2023•3mo ago

I have build a website to learning claude skills and share skills, english and Chinese language has added,more language and skills cases will be added soon.

leegrayson2023•3mo ago

http://claudeskills org

wonderfuly•3mo ago

A curated list of Claude Skills: https://mcpservers.org/claude-skills

Start all of your commands with a comma (2009)

Hoot: Scheme on WebAssembly

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

Vocal Guide – belt sing without killing yourself

Reinforcement Learning from Human Feedback

Making geo joins faster with H3 indexes

Where did all the starships go?

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Welcome to the Room – A lesson in leadership by Satya Nadella

Ga68, a GNU Algol 68 Compiler

What Is Ruliology?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: I spent 4 years building a UI design tool with only the features I use

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

Show HN: If you lose your memory, how to regain access to your computer?

Microsoft open-sources LiteBox, a security-focused library OS

An Update on Heroku

Cross-Region MSK Replication: K2K vs. MirrorMaker2

Was Benoit Mandelbrot a hedgehog or a fox?

PC Floppy Copy Protection: Vault Prolok

Dark Alley Mathematics

The AI boom is causing shortages everywhere else

How to effectively write quality code with AI

Delimited Continuations vs. Lwt for Threads

I now assume that all ads on Apple news are scams

Introducing the Developer Knowledge API and MCP Server

Understanding Neural Network, Visually

Start all of your commands with a comma (2009)

Hoot: Scheme on WebAssembly

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

Vocal Guide – belt sing without killing yourself

Reinforcement Learning from Human Feedback

Making geo joins faster with H3 indexes

Where did all the starships go?

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Welcome to the Room – A lesson in leadership by Satya Nadella

Ga68, a GNU Algol 68 Compiler

What Is Ruliology?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: I spent 4 years building a UI design tool with only the features I use

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

Show HN: If you lose your memory, how to regain access to your computer?

Microsoft open-sources LiteBox, a security-focused library OS

An Update on Heroku

Cross-Region MSK Replication: K2K vs. MirrorMaker2

Was Benoit Mandelbrot a hedgehog or a fox?

PC Floppy Copy Protection: Vault Prolok

Dark Alley Mathematics

The AI boom is causing shortages everywhere else

How to effectively write quality code with AI

Delimited Continuations vs. Lwt for Threads

I now assume that all ads on Apple news are scams

Introducing the Developer Knowledge API and MCP Server

Understanding Neural Network, Visually

Claude Skills

Comments