The Continual Learning Problem

https://jessylin.com/2025/10/20/continual-learning/

59•Bogdanp•1w ago

Comments

optimalsolver•4h ago

Rather than handcrafting solutions like it’s 1993, why not make robustness against forgetting part of the training objective?

Let the search algorithm figure it out.

vessenes•3h ago

The reason you're getting slightly downvoted, I think, is that you need to answer this question first: which of the 15T tokens are you going to evaluate for forgetting? And, please explain how doing that is different than doing another full epoch type pass over the weights.

Some of the appeal here is that this architecture (handcrafted) allows ongoing gradient descent learning as you go on a much smaller set of weights.

intalentive•1h ago

Funny you say that, this write-up recalled Stephen Grossberg's Adaptive Resonance Theory for me. The same basic ideas come up when addressing the stability-plasticity dilemma.

That said, the authors are saving this for future work. Fine-tuning is cheaper, easier, faster to validate.

>Switching to a new architecture at pretraining time has a high cost, but there are reasons we might want this (besides the better scaling behavior). The main benefit is that the model can learn to organize its memory from scratch, and once we’ve already “allocated” this high-capacity memory pool, there’s a clearer path to learning on multiple tasks and corpora over time.

This means you could "fine-tune" the model on your custom corpus at ingestion time, without having to actually train via backprop. Your corpus would be compressed into model-readable memory that updates model behavior. Then different memory units could be swapped in and out, like programs on a floppy disk. I can see this concept being especially useful for robotics.

yorwba•55m ago

The memory is model-readable but not model-writable, so you still need to train via backprop to get the memory to store useful data.

esafak•4h ago

Great writeup. Are there any libraries that implement some of the methods described?

skeptrune•3h ago

I appreciate that people are going beyond RAG and few shot prompting.

The Mack Super Pumper was a locomotive engined fire fighter (2018)

Ask HN: Who is hiring? (November 2025)

Learning to read Arthur Whitney's C to become smart (2024)

AI's Dial-Up Era

The MP3.com Rescue Barge Barge

Gallery of wonderful drawings our little thermal printer received

Ask HN: Who wants to be hired? (November 2025)

Tiny electric motor can produce more than 1,000 horsepower

S1130 – IBM 1130 Emulator in C#

</> Htmx – The Fetch()ening

Agent-O-rama: build LLM agents in Java or Clojure

The Case Against PGVector

A visualization of the RGB space covered by named colors

WebAssembly (WASM) arch support for the Linux kernel

Harder, Better, Faster, Stronger Version of Uber H3 in Rust

VimGraph

Skyfall-GS – Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

FreakWAN: A floor-routing WAN implementing a chat over bare-LoRa (no LoRaWAN)

Open-sourced game logic, art and Spine animations – SuperWEIRD Game Kit

First recording of a dying human brain shows waves similar to memory flashbacks

Show HN: Tamagotchi P1 for FPGAs

Robert Hooke's "Cyberpunk” Letter to Gottfried Leibniz

An Illustrated Introduction to Linear Algebra, Chapter 2: The Dot Product

Measuring characteristics of TCP connections at Internet scale

The Case That A.I. Is Thinking

Show HN: a Rust ray tracer that runs on any GPU – even in the browser

Leverage Points: Places to Intervene in a System (1999)

State of Terminal Emulators in 2025: The Errant Champions

The Continual Learning Problem

No Socials November

The Continual Learning Problem

Comments

The Mack Super Pumper was a locomotive engined fire fighter (2018)

Ask HN: Who is hiring? (November 2025)

Learning to read Arthur Whitney's C to become smart (2024)

AI's Dial-Up Era

The MP3.com Rescue Barge Barge

Gallery of wonderful drawings our little thermal printer received

Ask HN: Who wants to be hired? (November 2025)

Tiny electric motor can produce more than 1,000 horsepower

S1130 – IBM 1130 Emulator in C#

</> Htmx – The Fetch()ening

Agent-O-rama: build LLM agents in Java or Clojure

The Case Against PGVector

A visualization of the RGB space covered by named colors

WebAssembly (WASM) arch support for the Linux kernel

Harder, Better, Faster, Stronger Version of Uber H3 in Rust

VimGraph

Skyfall-GS – Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

FreakWAN: A floor-routing WAN implementing a chat over bare-LoRa (no LoRaWAN)

Open-sourced game logic, art and Spine animations – SuperWEIRD Game Kit

First recording of a dying human brain shows waves similar to memory flashbacks

Show HN: Tamagotchi P1 for FPGAs

Robert Hooke's "Cyberpunk” Letter to Gottfried Leibniz

An Illustrated Introduction to Linear Algebra, Chapter 2: The Dot Product

Measuring characteristics of TCP connections at Internet scale

The Case That A.I. Is Thinking

Show HN: a Rust ray tracer that runs on any GPU – even in the browser

Leverage Points: Places to Intervene in a System (1999)

State of Terminal Emulators in 2025: The Errant Champions

The Continual Learning Problem

No Socials November