When Anthropic published their Skills system (https://www.anthropic.com/news/skills), the idea clicked for me immediately: take a general-purpose agent and turn it into a specialized one with procedural knowledge that no model can fully memorize.
In my own projects I wasn’t using Claude (most of my workloads were on Gemini 2.5 Flash, mostly cos it was affordable and got the job done), but I still wanted that architecture: a way to define Skills once and use them with whatever LLM made sense for a given use case.
So over the past few weeks I put together a solution that does roughly that. Right now it supports:
- Bundling metadata, instructions, reference files, and optional scripts into a Skill
- Running scripts in Python or JS runtimes (with automatic package installation)
- A simple files API so the LLM can create files, reference them, mint temporary download links, and let me upload docs for analysis
- A CLI to manage skills locally (push/pull), a Typescript SDK and a web app to manage API keys, PATs, playground etc.
There’s a playground at http://www.bluebag.ai/playground with example Skills (mostly adapted from Anthropic’s public Skills repo at https://github.com/anthropics/skills). On the right-hand side you can see how different models progressively load files and metadata, so you can inspect how selection and loading behave across models.
There are still some open questions I’m thinking about, especially around VM reuse and isolation at scale, and how to handle large Skill libraries over time (cold starts with very large package sets and 15+ Skills are slow).
But it’s been useful enough in my own work that I wanted to share it and get feedback. I’d be interested in:
- obvious failure modes I’m missing
- prior art I should be looking at (e.g., agent frameworks)
Happy to answer any questions or dig into implementation details if that’s useful.
Cheers
tpcollns•31m ago
The playground seems cool. I think I get a sense of how this works and could see myself using it. Is it fair to assume this plugs into a VM behind the scenes? Shared?
Out of curiosity, is any part of this also open-sourced?
ohans•21m ago
Yes!! The runtimes are ephemeral VMs/containers with no network access (exposed)
On OS, the core of the solution is not currently open source; it’s still changing a lot, and I don’t want to publish an API or SDK surface that I’ll immediately have to update.
But I plan to open-source the CLI package and SDKs shortly
ohans•50m ago
When Anthropic published their Skills system (https://www.anthropic.com/news/skills), the idea clicked for me immediately: take a general-purpose agent and turn it into a specialized one with procedural knowledge that no model can fully memorize.
In my own projects I wasn’t using Claude (most of my workloads were on Gemini 2.5 Flash, mostly cos it was affordable and got the job done), but I still wanted that architecture: a way to define Skills once and use them with whatever LLM made sense for a given use case.
So over the past few weeks I put together a solution that does roughly that. Right now it supports:
- Bundling metadata, instructions, reference files, and optional scripts into a Skill - Running scripts in Python or JS runtimes (with automatic package installation) - A simple files API so the LLM can create files, reference them, mint temporary download links, and let me upload docs for analysis - A CLI to manage skills locally (push/pull), a Typescript SDK and a web app to manage API keys, PATs, playground etc.
There’s a playground at http://www.bluebag.ai/playground with example Skills (mostly adapted from Anthropic’s public Skills repo at https://github.com/anthropics/skills). On the right-hand side you can see how different models progressively load files and metadata, so you can inspect how selection and loading behave across models.
There are still some open questions I’m thinking about, especially around VM reuse and isolation at scale, and how to handle large Skill libraries over time (cold starts with very large package sets and 15+ Skills are slow).
But it’s been useful enough in my own work that I wanted to share it and get feedback. I’d be interested in:
- obvious failure modes I’m missing - prior art I should be looking at (e.g., agent frameworks)
Happy to answer any questions or dig into implementation details if that’s useful.
Cheers
tpcollns•31m ago
Out of curiosity, is any part of this also open-sourced?
ohans•21m ago
On OS, the core of the solution is not currently open source; it’s still changing a lot, and I don’t want to publish an API or SDK surface that I’ll immediately have to update.
But I plan to open-source the CLI package and SDKs shortly