Show HN: Skill capsules" for LLMs, a "poor man's continual learning"

https://github.com/killerstorm/set_v4/blob/main/REPORT.md

1•killerstorm•1mo ago

"Continual learning" is considered one of the "blockers" for LLMs: they can't learn on the job, don't improve over time, etc. In particular, Dwarkesh Patel describes it as a number of problem which has to be solved to get to AGI.

Many academic article propose some kind of a memory system for LLM which might be considered a form of "continual learning". But most evals focus on memorizing facts which is just not very useful (it's better to fetch facts via tool use than to store it in neural memory) and these proposals might not fit well into common LLM API use patterns.

In this article I'm proposing a "new" method called "skill capsules" which is highly pragmatic, easy to understand and evaluate and might integrate well into existing tooling.

Skill capsule is a concrete object - it's a bunch of vectors, basically. You can insert it somewhere into a middle of LLM context and it improves performance on a particular skill, e.g. get tool calls more reliable, use particular writing style, coding style, etc. In theory, it can be used to patch any LLM inadequacy. A capsule can include knowledge (e.g. how to call a particular API or write code involving particular library).

Skill capsule can be produced using a single forward pass from a _single example_, not gradients or "fine-tuning" is required. So it might allow LLM to "learn on the job" - i.e. a single demonstration of how to perform something correctly can be used to create a capsule.

You might ask - why is a "Show HN" and not an academic article? Because researchers already know the method - it's known as "soft prompts", "hypernetworks", "steering vectors", prefix tuning, etc. All these terms are horrible and do not convey possibilities of this method. I just want more people to know that LLMs can be improved on the fly. And a better term -- "skill capsules" -- might help people to think how to apply these techniques (I hope).

Another reasons it's "Show HN" is that:

  * it shows one can do a kinda cool ML experiment in 
    a few days using Claude Code and few dollars to pay for GPUs
  * a somewhat-interesting story of how I got there

Comments

killerstorm•1mo ago

A bit of backstory:

I got really interested in LLMs in 2020 after GPT-3 release demonstrated in-context learning. But I tried running a LLM a year before: trying out AI Dungeon 2 (based on GPT-2).

Back in 2020 people were discussing how transformer-based language model are limited in all sorts of ways (operating on a tiny context, etc). But as I learned about how transformers work, I got really excited: it's possible to use raw vectors as input, not just text. So I got this idea that all kinds of modules can be implemented on top of pre-trained transformers via adapters which translate any data into representations of a particular model. E.g. you can make a new token representing some command, etc.

A lack of memory was one of hot topics, so I did a little experiment: since KV cache has to encode 'run-time' memory, I tried transplanting parts of KV cache from one model forward pass into another - and apparently only few mid layers were sufficient to make model recall a name from prior pass. But I didn't go further as it was too time consuming for a hobby project. So that's where I left it.

Over the years, academic researchers got through same ideas as I had and gave them names:

* arbitrary vectors injected in place of fixed token embeddings are called a "soft prompt" * custom KV-prefix added before normal context is called "prefix tuning" * "soft prompt" to generate KV prefix which encodes a memory is called "gisting" * KV prefix encoding a specific collection of documents was recently called "cartridge"

Opus 4.5 running in Claude Code can pretty much run an experiment of this kind on its own, starting from a general idea. But it still needs some help - to make sure we use prompts and formats which actually make sense, look for best data set, etc.

visarga•1mo ago

The prefix tuning approach was largely abandoned for LoRA, it does not change the process if you tune the prefix or some adapter layers, but it is more flexible to train the LoRAs.

The Skills concept emerged naturally when you see how coding agents use docs, CLI tools and code. Their advantage is they can be edited on the fly to incorporate new information and can learn from any feedback source - human, code execution, web search or LLMs.

killerstorm•1mo ago

KV-based "skill capsules" are very different from LoRAs / classic prefix tuning:

  * A "hypernetwork" (which can be, in fact, same LLM) can build 
    a skill capsules _from a single example_.
    You can't get LoRA or KV-prefix using just one example.

  * It can be inserted at any point, as needed. I.e. if during reasoning you find that you need particular skill, you can insert it.
  * They are composable, and far less likely to over-write some information, as they only affect KV cache and not weights.

Skills as used by Anthropic & OpenAI are just textual instruction. KV-based skill capsule can be a lot more compact (and thus would contribute less to context rot) and might encode information which is difficult to convey through instruction (e.g. style).

Show HN: Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Show HN: Remotion directory (videos and prompts)

Portable C Compiler

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

Software Engineering Transformation 2026

Microsoft purges Win11 printer drivers, devices on borrowed time

Lunch with the FT: Tarek Mansour

Old Mexico and her lost provinces (1883)

'AI' is a dick move, redux

The source code was the moat. But not anymore

Does anyone else feel like their inbox has become their job?

An AI model that can read and diagnose a brain MRI in seconds

Dev with 5 of experience switched to Rails, what should I be careful about?

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

Scientists discover “levitating” time crystals that you can hold in your hand

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

Tell HN: Yet Another Round of Zendesk Spam

Postgres Message Queue (PGMQ)

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

NY lawmakers proposed statewide data center moratorium

OpenClaw AI chatbots are running amok – these scientists are listening in

Show HN: AI agent forgets user preferences every session. This fixes it

Introduce the Vouch/Denouncement Contribution Model

Show HN: SSHcode – Always-On Claude Code/OpenCode over Tailscale and Hetzner

Microsoft appointed a quality czar. He has no direct reports and no budget