Gemma 3n Architectural Innovations – Speculation and poking around in the model

https://old.reddit.com/r/LocalLLaMA/comments/1kuy45r/gemma_3n_architectural_innovations_speculation/

16•nolist_policy•8mo ago

Comments

impossiblefork•8mo ago

I think this is very interesting. Especially the per-layer embedding things.

Having more than one embedding is something I've tried myself, but not separate ones for each layer.

I'm guessing it's something like h_{l+1} = MultiHeadSelfAttentionWithPositionEncodingBakedIn(MLP(h_l) + embed_l(token_ids)). So it's probably really easy to implement on toy problems to see if it works.

3abiton•8mo ago

Any resources or suggestions to learn about this? The field is moving too fast, my poor brain can't keep up.

impossiblefork•8mo ago

Basically you'd familiarize yourself with transformers by implementing different variants of them, and changing them around according to your own ideas on different toy datasets.

Then you'd figure out a set of toy tasks that you like and think are important.

In this particular case you take something like NanoGPT, go to model.py, go to class GPT, go to __init__, modify the self.transformer ModuleDict by changing nn.Embedding to a ModuleList of nn.Embedding, then you change the for loop at line 180 to loop over a range, modify forward by adding x = x + self.transformer.wte[i], something like that I think.

I haven't tried yet though (I've got a terrible cold, so I am on social media instead of doing anything sensible).

impossiblefork•8mo ago

Also, this particular thing didn't work on my toy problems. It might still be good though.

3abiton•8mo ago

While PLE is quite innovative, the interesting part is they released their [apk on github](https://github.com/google-ai-edge/gallery), compared to linking it to play store. Interesting choice.

limoce•8mo ago

> https://preview.redd.it/wca7kzfq5w2f1.png?width=1190&format=...

"4x gated residual streams" look quite weird. Is there any paper or technique report for this?

krackers•8mo ago

More in https://twitter.com/antimatter15/status/1926459086352142663#...

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

Wally: A fun, reliable voice assistant in the shape of a penguin

Rewriting Pycparser with the Help of an LLM

Lobsters Vibecoding Challenge

E-Commerce vs. Social Commerce

Avoiding Modern C++ – Anton Mikhailov [video]

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

Zig – Package Management Workflow Enhancements

AI-powered text correction for macOS

AppSecMaster – Learn Application Security with hands on challenges

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it

City skylines need an upgrade in the face of climate stress

1979: The Model World of Robert Symes [video]

Satellites Have a Lot of Room

1980s Farm Crisis

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Show HN: Remotion directory (videos and prompts)

Portable C Compiler

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

Software Engineering Transformation 2026

Microsoft purges Win11 printer drivers, devices on borrowed time

Lunch with the FT: Tarek Mansour

Old Mexico and her lost provinces (1883)

'AI' is a dick move, redux

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

Wally: A fun, reliable voice assistant in the shape of a penguin

Rewriting Pycparser with the Help of an LLM

Lobsters Vibecoding Challenge

E-Commerce vs. Social Commerce

Avoiding Modern C++ – Anton Mikhailov [video]

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

Zig – Package Management Workflow Enhancements

AI-powered text correction for macOS

AppSecMaster – Learn Application Security with hands on challenges

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it

City skylines need an upgrade in the face of climate stress

1979: The Model World of Robert Symes [video]

Satellites Have a Lot of Room

1980s Farm Crisis

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Show HN: Remotion directory (videos and prompts)

Portable C Compiler

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

Software Engineering Transformation 2026

Microsoft purges Win11 printer drivers, devices on borrowed time

Lunch with the FT: Tarek Mansour

Old Mexico and her lost provinces (1883)

'AI' is a dick move, redux

Gemma 3n Architectural Innovations – Speculation and poking around in the model

Comments