frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Skill capsules" for LLMs, a "poor man's continual learning"

https://github.com/killerstorm/set_v4/blob/main/REPORT.md
1•killerstorm•2h ago
"Continual learning" is considered one of the "blockers" for LLMs: they can't learn on the job, don't improve over time, etc. In particular, Dwarkesh Patel describes it as a number of problem which has to be solved to get to AGI.

Many academic article propose some kind of a memory system for LLM which might be considered a form of "continual learning". But most evals focus on memorizing facts which is just not very useful (it's better to fetch facts via tool use than to store it in neural memory) and these proposals might not fit well into common LLM API use patterns.

In this article I'm proposing a "new" method called "skill capsules" which is highly pragmatic, easy to understand and evaluate and might integrate well into existing tooling.

Skill capsule is a concrete object - it's a bunch of vectors, basically. You can insert it somewhere into a middle of LLM context and it improves performance on a particular skill, e.g. get tool calls more reliable, use particular writing style, coding style, etc. In theory, it can be used to patch any LLM inadequacy. A capsule can include knowledge (e.g. how to call a particular API or write code involving particular library).

Skill capsule can be produced using a single forward pass from a _single example_, not gradients or "fine-tuning" is required. So it might allow LLM to "learn on the job" - i.e. a single demonstration of how to perform something correctly can be used to create a capsule.

You might ask - why is a "Show HN" and not an academic article? Because researchers already know the method - it's known as "soft prompts", "hypernetworks", "steering vectors", prefix tuning, etc. All these terms are horrible and do not convey possibilities of this method. I just want more people to know that LLMs can be improved on the fly. And a better term -- "skill capsules" -- might help people to think how to apply these techniques (I hope).

Another reasons it's "Show HN" is that:

  * it shows one can do a kinda cool ML experiment in 
    a few days using Claude Code and few dollars to pay for GPUs
  * a somewhat-interesting story of how I got there

Comments

killerstorm•1h ago
A bit of backstory:

I got really interested in LLMs in 2020 after GPT-3 release demonstrated in-context learning. But I tried running a LLM a year before: trying out AI Dungeon 2 (based on GPT-2).

Back in 2020 people were discussing how transformer-based language model are limited in all sorts of ways (operating on a tiny context, etc). But as I learned about how transformers work, I got really excited: it's possible to use raw vectors as input, not just text. So I got this idea that all kinds of modules can be implemented on top of pre-trained transformers via adapters which translate any data into representations of a particular model. E.g. you can make a new token representing some command, etc.

A lack of memory was one of hot topics, so I did a little experiment: since KV cache has to encode 'run-time' memory, I tried transplanting parts of KV cache from one model forward pass into another - and apparently only few mid layers were sufficient to make model recall a name from prior pass. But I didn't go further as it was too time consuming for a hobby project. So that's where I left it.

Over the years, academic researchers got through same ideas as I had and gave them names:

* arbitrary vectors injected in place of fixed token embeddings are called a "soft prompt" * custom KV-prefix added before normal context is called "prefix tuning" * "soft prompt" to generate KV prefix which encodes a memory is called "gisting" * KV prefix encoding a specific collection of documents was recently called "cartridge"

Opus 4.5 running in Claude Code can pretty much run an experiment of this kind on its own, starting from a general idea. But it still needs some help - to make sure we use prompts and formats which actually make sense, look for best data set, etc.

Freedom University: The right-wing group rallying youth in South Korea

https://www.bbc.com/news/articles/c5y27ekr26xo
1•maxloh•20s ago•0 comments

Thorium Fuel Cycle

https://en.wikipedia.org/wiki/Thorium_fuel_cycle
1•rolph•1m ago•0 comments

Show HN: Run Claude Code CLI with Azure&open source LLMs saving costs

https://github.com/Fast-Editor/Lynkr
1•vishalveera•2m ago•0 comments

Anatomy of US inequality

https://www.nber.org/papers/w34558
2•hhs•5m ago•0 comments

Constructive (2010)

https://xkcd.com/810/
1•Wowfunhappy•9m ago•0 comments

Ant societies rose by trading individual protection for collective power

https://entomology.umd.edu/news-events/news/ant-societies-rose-trading-individual-protection-coll...
1•hhs•15m ago•0 comments

Ask HN: Why do QR codes need so much visual real estate?

1•rishikeshs•20m ago•0 comments

Everyone should be using Claude Code more

https://www.lennysnewsletter.com/p/everyone-should-be-using-claude-code
2•bilsbie•20m ago•0 comments

Apple didn't have to go this hard [video]

https://www.youtube.com/watch?v=x4_RsUxRjKU
1•igravious•20m ago•0 comments

Show HN: Research repo for a time-based macroeconomic valuation model

https://github.com/ArturGrandi/grand-time-architecture
1•AGsist•20m ago•0 comments

On British Roads, Chinese Cars Are Racing Ahead

https://www.nytimes.com/2025/12/17/business/britain-china-cars-byd.html
1•bookofjoe•21m ago•1 comments

Show HN: Discord bot that reminds you to commit daily

https://github.com/NKMAK/commit-reminder-discord-bot
1•nkmak•22m ago•0 comments

China Seen Overtaking U.S. as Global Superpower (2011)

https://www.pewresearch.org/global/2011/07/13/china-seen-overtaking-us-as-global-superpower/
2•lawrenceyan•29m ago•0 comments

A development tool I cannot live without: bin/merge_master_into_all_git_branches

https://www.semicolonandsons.com/articles/merge-master-into-all-git-branches
1•jackkinsella•34m ago•1 comments

Grok Official Full Fixed Point Engine Release Google Suppressing

https://github.com/AnalyticalAgnosticAndrewRusher/VCH-Fixed-Point-Game-Engine-VIsualizer
1•ApexSignalAndy•34m ago•1 comments

Data center deals hit record $61B in 2025 amid construction frenzy

https://www.cnbc.com/2025/12/19/data-center-deals-hit-record-amid-ai-funding-concerns-grip-invest...
2•1vuio0pswjnm7•37m ago•0 comments

DraftKings hopes to score big with new prediction markets app

https://www.cbsnews.com/news/draftkings-prediction-markets-app-sports-betting/
2•mhb•44m ago•0 comments

Laws That Do Harm (1982)

https://miltonfriedman.hoover.org/internal/media/dispatcher/214279/full
3•mhb•48m ago•0 comments

From Zero to RAG (Part 1)

https://turtosa.com/blog/from-zero-to-rag
1•kevinroleke•49m ago•0 comments

Google and Apple warn employees on visas to avoid international travel

https://techcrunch.com/2025/12/20/google-and-apple-reportedly-warn-employees-on-visas-to-avoid-in...
10•SilverElfin•49m ago•2 comments

Climate change's hidden price tag: a drop in our income

https://news.arizona.edu/news/climate-changes-hidden-price-tag-drop-our-income
2•geox•53m ago•2 comments

HoustonTracker2 – A Music Sequencer for the Texas TI-82

https://www.irrlichtproject.de/houston/
1•austinallegro•53m ago•0 comments

TailwindSQL: Like TailwindCSS but SQL.className your way to database queries

https://tailwindsql.xyz/
1•sawirricardo•55m ago•0 comments

This is a duplicate. Please delete it.

https://community.ntppool.org/t/ntp-at-nist-boulder-has-lost-power/4192
1•nobody9999•58m ago•1 comments

HBM Supply Curve Gets Steeper, but Still Can't Meet Demand

https://www.nextplatform.com/2025/12/19/hbm-supply-curve-gets-steeper-but-still-cant-meet-demand/
1•rbanffy•59m ago•0 comments

U.S. Plans $80B Nuclear Power Expansion

https://spectrum.ieee.org/80-billion-us-nuclear-power
4•rbanffy•1h ago•2 comments

When creating images, AI keeps remixing the same 12 stock photo clichés

https://www.science.org/content/article/when-creating-images-ai-keeps-remixing-same-12-stock-phot...
1•rbanffy•1h ago•0 comments

C-reactive protein outpaced 'bad' cholester as leading heart disease risk marker

https://theconversation.com/how-c-reactive-protein-outpaced-bad-cholesterol-as-leading-heart-dise...
4•bikenaga•1h ago•1 comments

STPA (System Theoretic Process Analysis) at Google

https://sre.google/resources/practices-and-processes/stpa/
1•motxilo•1h ago•0 comments

Rcarmo/Guerite: A Watchtower Replacement

https://github.com/rcarmo/guerite
1•rcarmo•1h ago•0 comments