To answer a question you didn't ask. With small models especially we need to make choices as to which to focus on. For this model we focused on text summarization and instruction following, with the idea that users would finetune to gain performance on the task set that is relevant to them
Here's some examples that are inspired by previous roles I had outside of Google, where a business I was working in needed real time text processing.
This tutorials were made with Gemma versions from a year ago, but could now be recreated with Gemma 270m
https://developers.googleblog.com/en/gemma-for-streaming-ml-... https://www.youtube.com/watch?v=YxhzozLH1Dk
You
Translate to English: Рошешуа́р (фр. Rochechouart) — місто, муніципалітет у Франції, у регіоні Нова Аквітанія, департамент Верхня В'єнна. Населення — 3637 осіб (2022)[1]. Муніципалітет розташований на відстані[2] близько 360 км на південь від Парижа, 34 км на захід від Ліможа.
gemma-3-270m-it-qat
Here's the translation:
"Rochechouart is a city in France, a municipality located in the region of New Aeterna, department within the Upper Vienne Province. The population is 3637 people (2022). The city is situated at a distance of approximately 360 kilometers from Paris, 34 km from Lyon.
You
Translate to English: Bộ Công Thương đề xuất hỗ trợ hộ gia đình lắp điện mặt trời mái nhà, có hệ thống lưu trữ tối đa 2-3 triệu đồng, hỗ trợ lãi suất vay trong tối đa ba năm. Nội dung được Bộ Công Thương đưa ra tại dự thảo lần 3 Quyết định của Thủ tướng về chính sách hỗ trợ cho hộ gia đình lắp đặt điện mặt trời mái nhà tự sản, tự tiêu và hệ thống lưu trữ điện.
gemma-3-270m-it-qat
The Ministry of Industry and Trade proposes supporting households with solar panels in their homes, with a maximum savings of 2-3 million đồng, providing loan assistance for up to 3 years. The content of the Ministry's decision at the Third Review of the Government's Policy on Support for Households Installing Solar Panels in Homes, approved by the Prime Minister about the policy support for households installing solar panels themselves, and the system storage capacity is maximum 2-3 million đồng.
1: https://uk.wikipedia.org/wiki/%D0%A0%D0%BE%D1%88%D0%B5%D1%88...2: https://vnexpress.net/lap-dien-mat-troi-mai-nha-tu-dung-co-t...
- reply in a specific way, like a specific JSON schema, or in the voice of a character - be very good at classifying text (e.g. emails, or spam) - be a great summarizer for large amounts of text, e.g. turn emails into short titles or url slugs - adding tags/categories per your pre-defined rules (e.g. for communities, tagging content, marketing) - for detecting spam, or duplicates, or flagging things
You won't be able to write code or prose with these, but they're great for a huge array of very narrow set of use cases
What's neat about "stupid" models like this is that they're less likely to go off and dream up a bunch of irrelevant content, because they don't know much about the world / won't have too much context to pull from
(My first guess)
From that table, the A100 tok/sec (larger is faster) numbers are:
- Eager: 28
- Compiled: 128
And
- KV cache eager: 26
- KV cache compiled: 99
The reason that the KV cache is slower is likely because it's not GPU-optimized code. On CPU the KV cache is faster. To make it faster on GPU, you would pre-allocate the tensors on the device for example instead of `torch.cat`ting them on the fly
https://huggingface.co/dslim/bert-base-NER
Just wondering if it’s worth testing and what it would be most useful for.
Do we need to fine-tune for the embeddings to be meaningful at the sentence/document level?
This is part of the rationale for releasing this model. Now you don't have to start from scratch and finetuning is reasonable on a wide variety of hardware, including reasonable GPU setups (and smaller)
What usages can be reasonable to expect from them? Are there uses out of the box or does one have to go through some custom post-training to get useful behavior?
I feel like there is a huge gap between understanding models as a user of commercial tools and the kind of discussions happening in these threads, but I’m not sure what are the in-between steps.
- private, on-device models (possibly with lower latency than models via web API); also edge devices
- algorithm research (faster and cheaper to prototype new ideas)
- cheap tasks, like classification/categorization; sure, you don't need a decoder-style LLM for that, but it has the advantage of being more free-form, which is useful in many scenarios; or maybe a sanity checker for grammar; or even a router to other model (GPT-5 style)
canyon289•5h ago
https://news.ycombinator.com/item?id=44902148
Personally I'm excited that you all have access to this model now and hope you all get value out of using them.
GaggiX•5h ago
canyon289•5h ago
WithinReason•5h ago
canyon289•5h ago
The tokens themselves are a form of compression. Lets say we have the word "WaffleHouse", character level this would be 11 tokens, but with an embedder this would be perhaps 2 or 3 tokens (I didn't actually run through the tokenizer but we could verify precisely). This matters a lot for on device processing especially.
So while we could get more intelligence out of the model by bumping up the "knowledge" parameters, the device would need to process more input and output tokens.
Another advantage on small devices is the embeddings are just a lookup table which requires little to no computation. Its the rest of the parameters that have the expensive matrix multplications, so if we increased those we'd also be increasing the number of FLOPs needed for a forward pass.
This blog post explains it well. https://www.adamcasson.com/posts/transformer-flops
So all this to say is there are definite tradeoffs between model size, performance on evals, and compute cost. We ran many internal experiments with different choices to see could work well, and then picked what we believed work will best for the open community.
Scene_Cast2•4h ago
WithinReason•3h ago
tarruda•4h ago
Can you share what kind of hardware is necessary to train it, and how long it took?
canyon289•3h ago
The Gemma3 technical report contains many details on training setup https://arxiv.org/pdf/2503.19786
This was released with the initial batch of Gemma3 so it doesn't contain the 270m details, nonetheless you'll get a good idea of what it takes to build these models.
owebmaster•2h ago
canyon289•2h ago
riedel•2h ago