I think this is very interesting. Especially the per-layer embedding things.
Having more than one embedding is something I've tried myself, but not separate ones for each layer.
I'm guessing it's something like h_{l+1} = MultiHeadSelfAttentionWithPositionEncodingBakedIn(MLP(h_l) + embed_l(token_ids)). So it's probably really easy to implement on toy problems to see if it works.
3abiton•8mo ago
Any resources or suggestions to learn about this? The field is moving too fast, my poor brain can't keep up.
impossiblefork•8mo ago
Basically you'd familiarize yourself with transformers by implementing different variants of them, and changing them around according to your own ideas on different toy datasets.
Then you'd figure out a set of toy tasks that you like and think are important.
In this particular case you take something like NanoGPT, go to model.py, go to class GPT, go to __init__, modify the self.transformer ModuleDict by changing nn.Embedding to a ModuleList of nn.Embedding, then you change the for loop at line 180 to loop over a range, modify forward by adding x = x + self.transformer.wte[i], something like that I think.
I haven't tried yet though (I've got a terrible cold, so I am on social media instead of doing anything sensible).
impossiblefork•8mo ago
Also, this particular thing didn't work on my toy problems. It might still be good though.
3abiton•8mo ago
While PLE is quite innovative, the interesting part is they released their [apk on github](https://github.com/google-ai-edge/gallery), compared to linking it to play store. Interesting choice.
impossiblefork•8mo ago
Having more than one embedding is something I've tried myself, but not separate ones for each layer.
I'm guessing it's something like h_{l+1} = MultiHeadSelfAttentionWithPositionEncodingBakedIn(MLP(h_l) + embed_l(token_ids)). So it's probably really easy to implement on toy problems to see if it works.
3abiton•8mo ago
impossiblefork•8mo ago
Then you'd figure out a set of toy tasks that you like and think are important.
In this particular case you take something like NanoGPT, go to model.py, go to class GPT, go to __init__, modify the self.transformer ModuleDict by changing nn.Embedding to a ModuleList of nn.Embedding, then you change the for loop at line 180 to loop over a range, modify forward by adding x = x + self.transformer.wte[i], something like that I think.
I haven't tried yet though (I've got a terrible cold, so I am on social media instead of doing anything sensible).
impossiblefork•8mo ago