./llama.cpp/llama-cli -hf unsloth/gemma-3n-E4B-it-GGUF:UD-Q4_K_XL -ngl 99 --jinja --temp 0.0
./llama.cpp/llama-cli -hf unsloth/gemma-3n-E2B-it-GGUF:UD-Q4_K_XL -ngl 99 --jinja --temp 0.0
I'm also working on an inference + finetuning Colab demo! I'm very impressed since Gemma 3N has audio, text and vision! https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-...
Thank you!
However it's still 8B parameters and there are no quantized models just yet.
Cherry-picking something that's quick to evaluate:
"High throughput: Processes up to 60 frames per second on a Google Pixel, enabling real-time, on-device video analysis and interactive experiences."
You can download an APK from the official Google project for this, linked from the blogpost: https://github.com/google-ai-edge/gallery?tab=readme-ov-file...
If I download it, run it on Pixel Fold, actual 2B model which is half the size of the ones the 60 fps claim is made for, it takes 6.2-7.5 seconds to begin responding (3 samples, 3 diff photos). Generation speed is shown at 4-5 tokens per second, slightly slower than what llama.cpp does on my phone. (I maintain an AI app that inter alia, wraps llama.cpp on all platforms)
So, *0.16* frames a second, not 60 fps.
The blog post is so jammed up with so many claims re: this is special for on-device and performance that just...seemingly aren't true. At all.
- Are they missing a demo APK?
- Was there some massive TPU leap since the Pixel Fold release?
- Is there a lot of BS in there that they're pretty sure won't be called out in a systematic way, given the amount of effort it takes to get this inferencing?
- I used to work on Pixel, and I remember thinking that it seemed like there weren't actually public APIs for the TPU. Is that what's going on?
In any case, either:
A) I'm missing something, big or
B) they are lying, repeatedly, big time, in a way that would be shown near-immediately when you actually tried building on it because it "enables real-time, on-device video analysis and interactive experiences."
Everything I've seen the last year or two indicates they are lying, big time, regularly.
But if that's the case:
- How are they getting away with it, over this length of time?
- How come I never see anyone else mention these gaps?
- Are there APK(s) that run on Tensor?
- Is it possible to run on Tensor if you're not Google?
- Is there anything at all from anyone I can download that'll run it on Tensor?
- If there isn't, why not? (i.e. this isn't the first on device model release by any stretch, so I can't give benefit of the doubt at this point)
No. AiCore service internally uses the inference on Tensor (http://go/android-dev/ai/gemini-nano)
> Is there anything at all from anyone I can download that'll run it on Tensor?
No.
> If there isn't, why not? (i.e. this isn't the first on device model release by any stretch, so I can't give benefit of the doubt at this point)
Mostly because 3P support has not been a engineering priority.
> MobileNet-V5-300M
Which makes sense as it's 300M in size and probably far less complex, not a multi billions of parameters transformer.
Until it goes into the inner details (MatFormer, per-layer embeddings, caching...), the only sentence I've found that concretely mentions a new thing is "the first model under 10 billion parameters to reach [an LMArena score over 1300]". So it's supposed to be better than other models until those that use 10GB+ RAM, if I understand that right?
Open weights
What's interesting, that it beats smarter models in my Turing Test Battle Royale[1]. I wonder if it means it is a better talker.
A used sub-$100 x86 box is going to be much better
Similar form factor to raspberry pi but with 4 TOPS of performance and enough RAM.
Though I can imagine a few commercial applications where something like this would be useful. Maybe in some sort of document processing pipeline.
I think it’s something that even Google should consider: publishing open-source models with the possibility of grounding their replies in Google Search.
wiradikusuma•3h ago
"Gemini Nano allows you to deliver rich generative AI experiences without needing a network connection or sending data to the cloud." -- replace Gemini with Gemma and the sentence still valid.
readthenotes1•3h ago
Gemini nano is for Android only.
Gemma is available for other platforms and has multiple size options.
So it seems like Gemini nano might be a very focused Gemma everywhere to follow the biology metaphor instead of the Italian name interpretation
ridruejo•3h ago
gardnr•3h ago
tyushk•3h ago
You can use Gemma commercially using whatever runtime or framework you can get to run it.
littlestymaar•2h ago
I'm not a lawyer but the analysis I've read had a pretty strong argument that there's no human creativity involved in the training, which is an entirely automatic process, and as such it cannot be copyrighted in any way (the same way you cannot put a license on a software artifact just because you compiled it yourself, you must have copyright ownership on the source code you're compiling).
skissane•1h ago
US standards for copyrightability require human creativity and model weights likely don’t have the right kind of human creativity in them to be copyrightable in the US. No court to my knowledge has ruled on the question as yet, but that’s the US Copyright Office’s official stance.
By contrast, standards for copyrightability in the UK are a lot weaker than-and so no court has ruled on the issue in the UK yet either, it seems likely a UK court would hold model weights to be copyrightable
So from Google/Meta/etc’s viewpoint, asserting copyright makes sense, since even if the assertion isn’t legally valid in the US, it likely is in the UK - and not just the UK, many other major economies too. Australia, Canada, Ireland, New Zealand tend to follow UK courts on copyright law not US courts. And many EU countries are closer to the UK than the US on this as well, not necessarily because they follow the UK, often because they’ve reached a similar position based on their own legal traditions
Finally: don’t be surprised if Congress steps in and tries to legislate model weights as copyrightable in the US too, or grants them some sui generis form of legal protection which is legally distinct from copyright but similar to it-I can already hear the lobbyist argument, “US AI industry risks falling behind Europe because copyrightability of AI models in the US is legally uncertain and that legal uncertainty is discouraging investment”-I’m sceptical that is actually true, but something doesn’t have to be true for lobbyists to convince Congress that it is
AlanYx•1h ago
skissane•1h ago
Google gives the model to X who gives it to Y who gives it to Z. X has a contract with Google, so Google can sue X for breach of contract if they violate its terms. But do Y and Z have such a contract? Probably not. Of course, Google can put language in their contract with X to try to make it bind Y and Z too, but is that language going to be legally effective? More often than not, no. The language may enable Google to successfully sue X over Y and Z’s behaviour, but not successfully sue Y and Z directly. Whereas, with copyright, Y and Z are directly liable for violations just as X is
jinlisp•29m ago
km3r•1h ago
skissane•1h ago
By contrast, UK copyright law accepts the “mere sweat of the brow” doctrine, the mere fact you spent money on training is likely sufficient to make its output copyrightable, UK law doesn’t impose the same requirements for a direct human creative contribution
IncreasePosts•1h ago
skissane•1h ago
Nobody knows for sure what the legal answer is, because the question hasn’t been considered by a court - but the consensus of expert legal opinion is copyrightability of models is doubtful under US law, and the kind of argument you make isn’t strong enough to change that. As I said, different case for UK law, nobody really needs your argument there because model weights likely are copyrightable in the UK already
rvnx•1h ago
IncreasePosts•1h ago
badsectoracula•1h ago
Also i'm pretty sure none of the AI companies would really want to touch the concept of having the copyright of source data affect the weight's own copyright, considering all of them pretty much hoover up the entire Internet without caring about those copyrights (and IMO trying to claim that they should be able to ignore the copyrights of training data and also that the GenAI output is not under copyright but at the same trying trying to claim copyright for the weights is dishonest, if not outright leechy).
jabroni_salad•3h ago
gemini nano is an android api that you dont control at all.
nicce•3h ago
Closed source but open weight. Let’s not ruin the definition of the term in advantage of big companies.
zackangelo•2h ago
The inference code and model architecture IS open source[0] and there are many other high quality open source implementations of the model (in many cases contributed by Google engineers[1]). To your point: they do not publish the data used to train the model so you can't re-create it from scratch.
[0] https://github.com/google-deepmind/gemma [1] https://github.com/vllm-project/vllm/pull/2964
candiddevmike•2h ago
OneDeuxTriSeiGo•2h ago
And even if you had the same data, there's no guarantee the random perturbations during training are driven by a PRNG and done in a way that is reproducible.
Reproducibility does not make something open source. Reproducibility doesn't even necessarily make something free software (under the GNU interpretation). I mean hell, most docker containers aren't even hash-reproducible.
nicce•2h ago
Their publications about producing Gemma is not accurate enough that even with data you would get the same results.
Imustaskforhelp•2h ago
cesarb•2h ago
Are you sure? On a quick look, it appears to use its own bespoke license, not the Apache 2.0 license. And that license appears to have field of use restrictions, which means it would not be classified as an open source license according to the common definitions (OSI, DFSG, FSF).
impure•3h ago