Your concern about catastrophic forgetting is mostly unfounded in the regime of fine-tuning large diffusion models. The weights in this case will maybe suffer from some damage to accuracy on some downstream tasks. In general though, it is not “catastrophic”. I believe this is due to the attention mechanism but I’m happy to be corrected.
frotaur•4mo ago
I see, it was probably my high learning rate that caused problems. To be honest, I got a bit lazy to retry full finetuning since LoRA worked so well, but maybe I'll revisit this in the future, maybe with Qwen Image.
throwaway314155•4mo ago
Perhaps what you were dealing with was actually exploding gradients using fp16 training which _are_ prone to corrupting a model and this can depend on the learning rate.
adzm•4mo ago
Minor observation: the formula text appears to go above the sticky header in the website.
frotaur•4mo ago
True, I hadn't noticed, thanks! I'll try to fix that in the near future.
throwaway314155•4mo ago
frotaur•4mo ago
throwaway314155•4mo ago