Introducing Gemma 3n

https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/

205•bundie•4h ago

Comments

wiradikusuma•3h ago

I still don't understand the difference between Gemma and Gemini for on-device, since both don't need network access. From https://developer.android.com/ai/gemini-nano :

"Gemini Nano allows you to deliver rich generative AI experiences without needing a network connection or sending data to the cloud." -- replace Gemini with Gemma and the sentence still valid.

readthenotes1•3h ago

Perplexity.ai gave an easier to understand response than Gemini 2.5 afaict.

Gemini nano is for Android only.

Gemma is available for other platforms and has multiple size options.

So it seems like Gemini nano might be a very focused Gemma everywhere to follow the biology metaphor instead of the Italian name interpretation

ridruejo•3h ago

The fact that you need HN and competitors to explain your offering should make Google reflect …

gardnr•3h ago

The Gemini billing dashboard makes me feel sad and confused.

tyushk•3h ago

Licensing. You can't use Gemini Nano weights directly (at least commercial ly) and must interact with them through Android MLKit or similar Google approved runtimes.

You can use Gemma commercially using whatever runtime or framework you can get to run it.

littlestymaar•2h ago

It's not even clear you can license language model weight though.

I'm not a lawyer but the analysis I've read had a pretty strong argument that there's no human creativity involved in the training, which is an entirely automatic process, and as such it cannot be copyrighted in any way (the same way you cannot put a license on a software artifact just because you compiled it yourself, you must have copyright ownership on the source code you're compiling).

skissane•1h ago

IANAL either but the answer likely depends on the jurisdiction

US standards for copyrightability require human creativity and model weights likely don’t have the right kind of human creativity in them to be copyrightable in the US. No court to my knowledge has ruled on the question as yet, but that’s the US Copyright Office’s official stance.

By contrast, standards for copyrightability in the UK are a lot weaker than-and so no court has ruled on the issue in the UK yet either, it seems likely a UK court would hold model weights to be copyrightable

So from Google/Meta/etc’s viewpoint, asserting copyright makes sense, since even if the assertion isn’t legally valid in the US, it likely is in the UK - and not just the UK, many other major economies too. Australia, Canada, Ireland, New Zealand tend to follow UK courts on copyright law not US courts. And many EU countries are closer to the UK than the US on this as well, not necessarily because they follow the UK, often because they’ve reached a similar position based on their own legal traditions

Finally: don’t be surprised if Congress steps in and tries to legislate model weights as copyrightable in the US too, or grants them some sui generis form of legal protection which is legally distinct from copyright but similar to it-I can already hear the lobbyist argument, “US AI industry risks falling behind Europe because copyrightability of AI models in the US is legally uncertain and that legal uncertainty is discouraging investment”-I’m sceptical that is actually true, but something doesn’t have to be true for lobbyists to convince Congress that it is

AlanYx•1h ago

That's one of the reasons why they gate Gemini Nano with the "Gemini Nano Program Additional Terms of Service". Even if copyright doesn't subsist in the weights or if using them would be fair use, they still have recourse in breach of contract.

skissane•1h ago

The problem is that contracts don’t bind subsequent recipients, copyright does

Google gives the model to X who gives it to Y who gives it to Z. X has a contract with Google, so Google can sue X for breach of contract if they violate its terms. But do Y and Z have such a contract? Probably not. Of course, Google can put language in their contract with X to try to make it bind Y and Z too, but is that language going to be legally effective? More often than not, no. The language may enable Google to successfully sue X over Y and Z’s behaviour, but not successfully sue Y and Z directly. Whereas, with copyright, Y and Z are directly liable for violations just as X is

jinlisp•29m ago

Thank you, this is a nice point to consider. Don't know if using the weights could be considered equivalent or implying accepting the terms of services from weights creators.

km3r•1h ago

Why not? Training isn't just "data in/data out". The process for training is continuously tweaked and adjusted. With many of those adjustments being specific to the type of model you are trying to output.

skissane•1h ago

The US copyright office’s position is basically this-under US law, copyrightability requires direct human creativity, an automated training process involves no direct human creativity so cannot produce copyright. Now, we all know there is a lot of creative human effort in selecting what data to use as input, tinkering with hyperparameters, etc - but the copyright office’s position is that doesn’t legally count - creative human effort in overseeing an automated process doesn’t change the fact that the automated process itself doesn’t directly involve any human creativity. So the human creativity in model training fails to make the model copyrightable because it is too indirect

By contrast, UK copyright law accepts the “mere sweat of the brow” doctrine, the mere fact you spent money on training is likely sufficient to make its output copyrightable, UK law doesn’t impose the same requirements for a direct human creative contribution

IncreasePosts•1h ago

Doesn't that imply just the training process isn't copyrightable? But weights aren't just training, they're also your source data. And if the training set shows originality in selection, coordination, or arrangement, isn't that copyrightable? So why wouldn't the weights also be copyrightable?

skissane•1h ago

The problem is, can you demonstrate that originality of selection and arrangement actually survives in the trained model? It is legally doubtful.

Nobody knows for sure what the legal answer is, because the question hasn’t been considered by a court - but the consensus of expert legal opinion is copyrightability of models is doubtful under US law, and the kind of argument you make isn’t strong enough to change that. As I said, different case for UK law, nobody really needs your argument there because model weights likely are copyrightable in the UK already

rvnx•1h ago

The weights are mathematical facts. As raw numbers, they are not copyrightable.

IncreasePosts•1h ago

`en_windows_xp_professional_with_service_pack_3_x86_cd_vl_x14-73974.iso` is also just raw numbers, but I believe Windows XP was copyrightable

badsectoracula•1h ago

For the same reason GenAI output isn't copyrightable regardless of how much time you spend tweaking your prompts.

Also i'm pretty sure none of the AI companies would really want to touch the concept of having the copyright of source data affect the weight's own copyright, considering all of them pretty much hoover up the entire Internet without caring about those copyrights (and IMO trying to claim that they should be able to ignore the copyrights of training data and also that the GenAI output is not under copyright but at the same trying trying to claim copyright for the weights is dishonest, if not outright leechy).

jabroni_salad•3h ago

Gemma is open source and apache 2.0 licensed. If you want to include it with an app you have to package it yourself.

gemini nano is an android api that you dont control at all.

nicce•3h ago

> Gemma is open source and apache 2.0 licensed

Closed source but open weight. Let’s not ruin the definition of the term in advantage of big companies.

zackangelo•2h ago

Your reply adds more confusion, imo.

The inference code and model architecture IS open source[0] and there are many other high quality open source implementations of the model (in many cases contributed by Google engineers[1]). To your point: they do not publish the data used to train the model so you can't re-create it from scratch.

[0] https://github.com/google-deepmind/gemma [1] https://github.com/vllm-project/vllm/pull/2964

candiddevmike•2h ago

If for some reason you had the training data, is it even possible to create an exact (possibly same hash?) copy of the model? Seems like there are a lot of other pieces missing like the training harness, hardware it was trained on, etc?

OneDeuxTriSeiGo•2h ago

to be entirely fair that's quite a high bar even for most "traditional" open source.

And even if you had the same data, there's no guarantee the random perturbations during training are driven by a PRNG and done in a way that is reproducible.

Reproducibility does not make something open source. Reproducibility doesn't even necessarily make something free software (under the GNU interpretation). I mean hell, most docker containers aren't even hash-reproducible.

nicce•2h ago

I am not sure if this adds even more confusion. Linked library is about fine-tuning which is completely different process.

Their publications about producing Gemma is not accurate enough that even with data you would get the same results.

Imustaskforhelp•2h ago

Yes!! But I doubt how many are truly truly open source models since most just confuse open source with open weights and the definition has been changed really smh.

cesarb•2h ago

> Gemma is open source and apache 2.0 licensed.

Are you sure? On a quick look, it appears to use its own bespoke license, not the Apache 2.0 license. And that license appears to have field of use restrictions, which means it would not be classified as an open source license according to the common definitions (OSI, DFSG, FSF).

impure•3h ago

I suspect the difference is in the training data. Gemini is much more locked down and if it tries to repeat something from the draining data verbatim you will get a 'recitation error'.

danielhanchen•3h ago

Made some GGUFs if anyone wants to run them!

./llama.cpp/llama-cli -hf unsloth/gemma-3n-E4B-it-GGUF:UD-Q4_K_XL -ngl 99 --jinja --temp 0.0

./llama.cpp/llama-cli -hf unsloth/gemma-3n-E2B-it-GGUF:UD-Q4_K_XL -ngl 99 --jinja --temp 0.0

I'm also working on an inference + finetuning Colab demo! I'm very impressed since Gemma 3N has audio, text and vision! https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-...

upghost•3h ago

Literally was typing out "Unsloth, do your thing!!" but you are way ahead of me. You rock <3 <3 <3

Thank you!

danielhanchen•3h ago

:) Thanks!

knowaveragejoe•56m ago

What is `jinja` in this context?

bilsbie•56m ago

Thanks! What kind of rig do I need?

turnsout•3h ago

This looks amazing given the parameter sizes and capabilities (audio, visual, text). I like the idea of keeping simple tasks local. I’ll be curious to see if this can be run on an M1 machine…

bigyabai•3h ago

This should run fine on most hardware - CPU inference of the E2B model on my Pixel 8 Pro gives me ~9tok/second of decode speed.

Fergusonb•3h ago

Sure it can, easiest way is to get ollama, then `ollama run gemma3n` You can pair it with tools like simonw's LLM to pipe stuff to it.

minimaxir•3h ago

LM Studio has MLX variants of the model out: http://huggingface.co/lmstudio-community/gemma-3n-E4B-it-MLX...

However it's still 8B parameters and there are no quantized models just yet.

Workaccount2•3h ago

Anyone have any idea on the viability of running this on a Pi5 16GB? I have a few fun ideas if this can handle working with images (or even video?) well.

gardnr•3h ago

The 4-bit quant weighs 4.25 GB and then you need space for the rest of the inference process. So, yeah you can definitely run the model on a Pi, you may have to wait some time for results.

https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF

refulgentis•2h ago

See here, long story short, this is another in a series of blog posts that would lead you to believe this was viable, but it isn't :/ https://news.ycombinator.com/item?id=44389793

impure•3h ago

I've been playing around with E4B in AI Studio and it has been giving me really great results, much better than what you'd expect from an 8B model. In fact I'm thinking of trying to install it on a VPS so I can have an alternative to pricy APIs.

tgtweak•3h ago

Any readily-available APKs for testing this on Android?

refulgentis•3h ago

APK link here: https://github.com/google-ai-edge/gallery?tab=readme-ov-file...

tgtweak•3h ago

Ah, I already had edge installed and it had gemma 3n-e4b downloaded... is this the same model that was previously released?

makeramen•3h ago

Seems like that was a preview model, unknown if this released version is different

tgtweak•2h ago

I think it's only pulling the older model - I see it's using the liteRT models from May.

refulgentis•3h ago

Somethings really screwy with on-device models from Google, I can't put my finger on what, and I think being ex-Google is screwing with my ability to evaluate.

Cherry-picking something that's quick to evaluate:

"High throughput: Processes up to 60 frames per second on a Google Pixel, enabling real-time, on-device video analysis and interactive experiences."

You can download an APK from the official Google project for this, linked from the blogpost: https://github.com/google-ai-edge/gallery?tab=readme-ov-file...

If I download it, run it on Pixel Fold, actual 2B model which is half the size of the ones the 60 fps claim is made for, it takes 6.2-7.5 seconds to begin responding (3 samples, 3 diff photos). Generation speed is shown at 4-5 tokens per second, slightly slower than what llama.cpp does on my phone. (I maintain an AI app that inter alia, wraps llama.cpp on all platforms)

So, *0.16* frames a second, not 60 fps.

The blog post is so jammed up with so many claims re: this is special for on-device and performance that just...seemingly aren't true. At all.

- Are they missing a demo APK?

- Was there some massive TPU leap since the Pixel Fold release?

- Is there a lot of BS in there that they're pretty sure won't be called out in a systematic way, given the amount of effort it takes to get this inferencing?

- I used to work on Pixel, and I remember thinking that it seemed like there weren't actually public APIs for the TPU. Is that what's going on?

In any case, either:

A) I'm missing something, big or

B) they are lying, repeatedly, big time, in a way that would be shown near-immediately when you actually tried building on it because it "enables real-time, on-device video analysis and interactive experiences."

Everything I've seen the last year or two indicates they are lying, big time, regularly.

But if that's the case:

- How are they getting away with it, over this length of time?

- How come I never see anyone else mention these gaps?

catchmrbharath•2h ago

The APK that you linked, runs the inference on CPU and does not run it on Google Tensor.

refulgentis•2h ago

That sounds fair, but opens up another N questions:

- Are there APK(s) that run on Tensor?

- Is it possible to run on Tensor if you're not Google?

- Is there anything at all from anyone I can download that'll run it on Tensor?

- If there isn't, why not? (i.e. this isn't the first on device model release by any stretch, so I can't give benefit of the doubt at this point)

catchmrbharath•37m ago

> Are there APK(s) that run on Tensor?

No. AiCore service internally uses the inference on Tensor (http://go/android-dev/ai/gemini-nano)

> Is there anything at all from anyone I can download that'll run it on Tensor?

No.

> If there isn't, why not? (i.e. this isn't the first on device model release by any stretch, so I can't give benefit of the doubt at this point)

Mostly because 3P support has not been a engineering priority.

lostmsu•2h ago

How does their demo work then? It's been 3 months since 3n was first released publicly.

mlsu•2h ago

It looks to me by the marketing copy that the vision encoder can run 60FPS.

> MobileNet-V5-300M

Which makes sense as it's 300M in size and probably far less complex, not a multi billions of parameters transformer.

refulgentis•2h ago

I agree that's the most likely interpretation - does it read as a shell game to you? Like, it can do that but once you get the thing that can use the output involved it's 1/100th of that? Do they have anything that does stuff with the outputs from just MobileNet? If they don't, how are they sure I can build 60 fps realtime audiovisual experiences they say I can?

lucb1e•3h ago

I read the general parts and skimmed the inner workings but I can't figure out what the high-level news is. What does this concretely do that Gemma didn't already do, or what benchmark/tasks did it improve upon?

Until it goes into the inner details (MatFormer, per-layer embeddings, caching...), the only sentence I've found that concretely mentions a new thing is "the first model under 10 billion parameters to reach [an LMArena score over 1300]". So it's supposed to be better than other models until those that use 10GB+ RAM, if I understand that right?

awestroke•2h ago

> What does this concretely do that Gemma didn't already do

Open weights

lucb1e•26m ago

Huh? I'm pretty sure I ran Gemma on my phone last month. Or is there a difference between downloadable (you get the weights because it's necessary to run the thing) and "open" weights?

conradev•2h ago

Kevin Kwok did a great job taking it apart: https://github.com/antimatter15/reverse-engineering-gemma-3n

ghc•2h ago

I just tried gemma3 out and it seems to be prone to getting stuck in loops where it outputs an infinite stream of the same word.

sigmoid10•2h ago

Sounds a lot like an autoregressive sampling problem. Maybe try to set temperature and repeat penalty differently.

actinium226•2h ago

I'm not a fan of this anarchic naming convention that OpenAI has apparently made standard across the industry.

unsupp0rted•1h ago

What would you have called it?

ericvolp12•2h ago

The Y-axis in that graph is fucking hilarious

lostmsu•2h ago

I made a simple website[0] to check online model MMLU quickly (runs a subset), and Gemma 3n consistently loses to LLaMA 3.3 (~61% vs ~66%), and definitely loses to LLaMA 4 Scout (~86%). I suspect that means its rating on LMArena Leaderboard is just some form of gaming the metric.

What's interesting, that it beats smarter models in my Turing Test Battle Royale[1]. I wonder if it means it is a better talker.

0. https://mmlu.borgcloud.ai/

1. https://trashtalk.borg.games/

bravetraveler•1h ago

Updated Ollama to use this, now neither old or new work - much productivity

rvnx•1h ago

Well, see it the other way, there is something positive: commenters here on HN claim that AI is useless. You can now also join the bandwagon of people who have free time.

lowbatt•1h ago

If I wanted to run this locally at somewhat decent speeds, is an RK3588S board (like OrangePi 5) the cheapest option?

ac29•1h ago

RK3588 uses a 7 year old CPU design and OrangePi 5 looks expensive (well over $100).

A used sub-$100 x86 box is going to be much better

lowbatt•1h ago

you're right. For my purposes, I was thinking of something I could use if I wanted to manufacture a new (smallish) product

jm4•1h ago

It depends on your idea of decent speeds and what you would use it for. I just tried it on a laptop with an AMD HX 370 running on battery in power save mode and it's not especially impressive, although it runs much better in balanced or performance mode. I gave it the prompt "write a fizzbuzz program in rust" and it took almost a minute and a half. I expect it to be pretty terrible on an SBC. Your best bet is to try it out on the oldest hardware you have and figure out if you can tolerate worse performance.

lowbatt•1h ago

good idea, will test that out

babl-yc•26m ago

I'm going to attempt to get it running on the BeagleY-AI https://www.beagleboard.org/boards/beagley-ai

Similar form factor to raspberry pi but with 4 TOPS of performance and enough RAM.

nsingh2•1h ago

Whats are some use cases for these local small models, for individuals? Seems like for programming related work, the proprietary models are significantly better and that's all I really use LLMs for personally.

Though I can imagine a few commercial applications where something like this would be useful. Maybe in some sort of document processing pipeline.

russdill•1h ago

Hoping to try it out with home assistant.

toddmorey•1h ago

I think speech to text is the highlight used case for local models because they are now really good at it and there’s no network latency.

androng•1h ago

filtering out spam SMS messages without sending all SMS to the cloud

thimabi•1h ago

I’m thinking about building a pipeline to mass generate descriptions for the images in my photo collection, to facilitate search. Object recognition in local models is already pretty good, and perhaps I can pair it with models to recognize specific people by name as well.

jsphweid•1h ago

For me? Handling data like private voice memos, pictures, videos, calendar information, emails, some code etc. Stuff I wouldn't want to share on the internet / have a model potential slurp up and regurgitate as part of its memory when the data is invariably used in some future training process.

msabalau•1h ago

I just like having quick access to reasonable model that runs comfortably on my phone, even if I'm in a place without connectivity.

thimabi•1h ago

Suppose I'd like to use models like this one to perform web searches. Is there anything available in the open-source world that would let me do that without much tinkering needed?

I think it’s something that even Google should consider: publishing open-source models with the possibility of grounding their replies in Google Search.

vorticalbox•1h ago

I have been using ollama + open web ui. open webUI already have a web search tool all you would need to do is click the toggle for it under the chat.

zettabomb•1h ago

Unfortunately the OWUI web search is really slow and just not great overall. I would suggest using an MCP integration instead.

kccqzy•1h ago

It seems way worse than other small models, including responding with complete non sequiturs. I think my favorite small model is still DeepSeek distilled with Llama 8B.

rvnx•1h ago

Is there a chance that we see an uncensored version of this ?

throwaway2087•1h ago

Can you apply abiliteration? I'm not sure if their MatFormer architecture is compatible with current techniques

pilooch•33m ago

This model is fully compatible with anything previously done with gemma3. Just passed it to one of my vlm fine-tuning scripts and it started without issues (hf transformer code). On a single GPU with Lora the E4B model takes 18Gb of VRAM in batch size 1 where gemma-4B was 21Gb. Nice one from deepmind, the gemma3 family tops the open weights VLLMs.

DDR4 Module Prices Overtake DDR5

Find Remote Jobs

Show HN: Window Expander – Mostly maximize your windows

Myths and mythconceptions: what does it mean to be a programming language?(2021)

DeepMind Close to Solving the Navier-Stokes Millenium Prize Problem

The Wheel (Direction)

Tesla head of manufacturing Omead Afshar fired by Elon Musk

VMware perpetual license holder receives audit letter from Broadcom

`blaze-install` is a drop-in CLI that installs NPM packages

Elastic's journey to build Elastic Cloud Serverless

The Washington Post Will Ask Some Sources to Annotate Its Stories

Marge Simpson isn't dead yet, so everyone can calm down

Apple's Swift coding language is working on Android support

Show HN: Chisel – Profile GPU Kernels Without a GPU (Nvidia and AMD)

Salesforce CEO Says 30% of Internal Work Is Being Handled by AI

A.I. Is Homogenizing Our Thoughts

Lalo Schifrin, Film Composer Who Wrote 'Mission: Impossible' Theme, Dies at 93

Coloring.app – Custom AI Coloring Pages and Books

Simulating a neural operating system with Gemini 2.5 Flash-Lite

Can a Brain Be Preserved and Uploaded? Neuroscience Reveals 40% Chance It Could

I started writing the hono.js of Golang

BIS: Stablecoins Fail Key Tests of Real Money

Show HN: Built a Food Scanner for Longevity

Understanding the sport viewership experience using functional IR spectroscopy

Built something to help with panic attacks – what am I missing?

Britain shuns $34B Morocco-UK subsea power project

Bluefishjs: Composing Diagrams in with Declarative Relations

Stryker is a new generation mobile pentest application

RFK Jr's new vaccine panel votes against preservative in flu shots in shock move

Carrot Cache: High-Performance, SSD-Friendly Caching Library for Java

DDR4 Module Prices Overtake DDR5

Find Remote Jobs

Show HN: Window Expander – Mostly maximize your windows

Myths and mythconceptions: what does it mean to be a programming language?(2021)

DeepMind Close to Solving the Navier-Stokes Millenium Prize Problem

The Wheel (Direction)

Tesla head of manufacturing Omead Afshar fired by Elon Musk

VMware perpetual license holder receives audit letter from Broadcom

`blaze-install` is a drop-in CLI that installs NPM packages

Elastic's journey to build Elastic Cloud Serverless

The Washington Post Will Ask Some Sources to Annotate Its Stories

Marge Simpson isn't dead yet, so everyone can calm down

Apple's Swift coding language is working on Android support

Show HN: Chisel – Profile GPU Kernels Without a GPU (Nvidia and AMD)

Salesforce CEO Says 30% of Internal Work Is Being Handled by AI

A.I. Is Homogenizing Our Thoughts

Lalo Schifrin, Film Composer Who Wrote 'Mission: Impossible' Theme, Dies at 93

Coloring.app – Custom AI Coloring Pages and Books

Simulating a neural operating system with Gemini 2.5 Flash-Lite

Can a Brain Be Preserved and Uploaded? Neuroscience Reveals 40% Chance It Could

I started writing the hono.js of Golang

BIS: Stablecoins Fail Key Tests of Real Money

Show HN: Built a Food Scanner for Longevity

Understanding the sport viewership experience using functional IR spectroscopy

Built something to help with panic attacks – what am I missing?

Britain shuns $34B Morocco-UK subsea power project

Bluefishjs: Composing Diagrams in with Declarative Relations

Stryker is a new generation mobile pentest application

RFK Jr's new vaccine panel votes against preservative in flu shots in shock move

Carrot Cache: High-Performance, SSD-Friendly Caching Library for Java

Introducing Gemma 3n

Comments