It's great that people are starting to take continual learning seriously, and it seems like Jessy has been thinking about LLMs and continual learning longer than almost anyone.
I especially like this taxonomy
> I think of continual learning as two subproblems:
> Generalization: given a piece of data (user feedback, a piece of experience, etc.), what update should we do to learn the “important bits” from that data?
> Forgetting/Integration: given a piece of data, how do we integrate it with what we already know?
My personal feeling is that generalization is a data issue: given a datapoint x, what are all the examples in the distribution of things that can be inferred from x? Maybe we can solve this with synthetic datagen. And forgetting might be solvable architecturally, e.g. with Cartridges (https://arxiv.org/abs/2506.06266) or something of that nature.
jxmorris12•1h ago
I especially like this taxonomy
> I think of continual learning as two subproblems:
> Generalization: given a piece of data (user feedback, a piece of experience, etc.), what update should we do to learn the “important bits” from that data?
> Forgetting/Integration: given a piece of data, how do we integrate it with what we already know?
My personal feeling is that generalization is a data issue: given a datapoint x, what are all the examples in the distribution of things that can be inferred from x? Maybe we can solve this with synthetic datagen. And forgetting might be solvable architecturally, e.g. with Cartridges (https://arxiv.org/abs/2506.06266) or something of that nature.