Ask HN: What's something interesting you learned from training your own GPT?
2•amadeuswoo•1h ago
Not using APIs, actually training a model from scratch, even a small one
What surprised you about the data, the training process, or the output?
Comments
linolevan•1h ago
For tiny models, the SFT data mixture is unbelievably critical to usability. They are unable to generalize in almost any way. If you don't have multi-turn conversations, they will not be able to do multi-turn conversations. If you have multi-turn conversations which are just chatting, and then single turn conversations for math, it will be unable to do math in a multi-turn setting. This is much less true for bigger models.
linolevan•1h ago