Here's one of dozens of results I got for "Generate an SVG of a pelican riding a bicycle". For this one it decided to write a poem:
+-----------------------+
| Pelican Riding Bike |
+-----------------------+
| This is the cat! |
| He's got big wings and a happy tail. |
| He loves to ride his bike! |
+-----------------------+
| Bike lights are shining bright. |
| He's got a shiny top, too! |
| He's ready for adventure! |
+-----------------------+
There are a bunch more attempts in this Gist, some of which do at least include an SVG tag albeit one that doesn't render anything: https://gist.github.com/simonw/25e7b7afd6a63a2f15db48b3a51ec...I'm looking forward to seeing people fine-tune this in a way that produces useful output for selected tasks, which should absolutely be feasible.
[0]: https://ai.google.dev/gemma/docs/core/huggingface_text_full_finetuneBasically it would be the quickadd[1] event from google calendar, but calendar agnostic.
[1] https://developers.google.com/workspace/calendar/api/v3/refe...
An easily fine tunable tiny model might actually be one of the better uses of local LLMs I've seen yet. Rather than try to be a small model that's great at everything it's a tiny model you can quickly tune to do one specific thing decently, extremely fast, and locally on pretty much anything.
Could it be tamed with good role-system prompt crafting? (Besides fine-tuning.)
'Alot' is not a word. (I made this mistake a lot, too.)
That still requires more than 170 floppy disks for installation.
For sure it won’t generate great svgs, but it might be a really good conversational model
You'll get better results.
(It did not do noticeably better at my pelican test).
Actually it's worse than that, several of my attempts resulted in infinite loops spitting out the same text. Maybe that GGUF is a bit broken?
temperature = 1.0, top_k = 64, top_p = 0.95, min_p = 0.0
temperature = 1.0, top_k = 64, top_p = 0.95, min_p = 0.0
a try, and maybe repeat_penalty = 1.1
> This SVG code provides a clear and visually appealing representation of a pelican riding a bicycle in a scenic landscape.
My first try:
user: "When was Julius Caesar born"
response: "Julius Caesar was born in **Rome**"
Beautiful :D
(I do not mean to detract from it - but it's just beautiful. It will require more effort to tame it.)
I am however disappointed that there is no examples, or benchmarks, provided to get a sense of performance. It's a given that benchmark values would be lower than gemma 3n, but having a sense of performance vs size curve and comparison to existing small models is needed
I've heard folks say a number of times that neuromuscular control / locomotion (or w/e) are hundreds of millions of parameters rather than billions.
Edit - I should add, currently the model I'm using is Gemini Flash Lite through the Gemini API. It's a really good combo of fast, follows instructions, gives correct results for what I want and cost-effective. I still would love a small open model that can run on edge though.
Use a large model to generate outputs that you're happy with, then use the inputs (including the same prompt) and outputs to teach 270M what you want from it.
So only for a while, but it still counts :)
I also want to try this with language detection. Existing open source ML models have weaknesses for mixed language, length of text, or domain limitations in the underlying text (like trained on bible translations).
Since the Qwen3 0.6B model came out we've been training those. We can't quite compare apples-to-apples, we have a better deeper training data-set from pathological cases and exceptional cases that came out of our production environment. Those right now are looking like they're about at parity with our existing stack for quality and quite a bit faster.
I'm going to try and run through one of our training regimen with this model and see how it compares. Not quite running models this small yet, but it wouldn't surprise me if we could.
Also, what sorts of things might I consider fine-tuning it for?
Not sure how much data is needed to realistically fine-tune something like this and get useful output.
Then Siri just kinda languishes for forever, and LLM's pass the torch of "Cool Tech", so they try and "Reinvigurate" the team, but with no clear direction. Are they going to be a cloud provider? Are they going to contract out the training? Are they gunna spin up a compute facility even after neglecting to do so since 2012?
Apple needs to just stop trying shit, and just get that app store money. That's why jobs appointed cook. Jobs new cook was no innovator, but he could make apple a money printing machine. Thats what they should stick with.
Solve that and you can put Tim Cook out of a job tomorrow.
From the service revenue perspective, you can simply play hardball. Threaten to pull out of markets and ensure it's locked in litigation, forever.
It's too late to play hardball, anyways; Europe has already started enforcing their legislation and America's own DOJ has already prosecuted an antitrust case against Apple. There's no more room to give Apple impunity because everyone admit that they've abused their benefit of the doubt.
If you have the latest betas installed you can call those APIs right now.
They also support fine-tuning on the model that ships with ~every device:
https://developer.apple.com/apple-intelligence/foundation-mo...
Think of Apple however you want, but they rarely ship bad/half-baked products. They would rather not ship a product at all than ship something that's not polished.
Probably would be good as a game NPC or a chatbot, not very good for integrating into an application which specific functionality though.
> what's second tallest mountain on earth?
The second tallest mountain on Earth is Mount Everest.
> what's the tallest mountain on earth?
The tallest mountain on Earth is Mount Everest.
> whats the second tallest mountain?
The second tallest mountain in the world is Mount Everest.
> whats the third tallest mountain?
The third tallest mountain in the world is Mount Everest.
> what's the forth tallest mountain?
The forth tallest mountain is Mount Everest.
> but you said the tallest was Everest already
You are right! I'm glad to hear that.
> so whats the second tallest?
The second tallest is *Mount Everest*.
> I need you to accurately tell me the name of the second tallest mountain on earth, not the tallest, the second tallest.
The second tallest mountain on Earth is Mount Everest.
> The tallest mountain on earth is mount everest. What is the second tallest mountain?
The second tallest mountain on Earth is Mount Everest.
> Give me a list of the tallest mountains on earth
The tallest mountain in the world is the *Mount Everest*.
> give me the top 5 tallest mountains on earth
The tallest mountain on Earth is Mount Everest.
> List in order the tallest mountains on earth from 1 to 5
1. Mount Everest 2. Mount K2 3. Mount Sahel 4. Mount Fuji 5. Mount McKinley
> So the second tallest mountain on earth is K2
The second tallest mountain on Earth is Mount Everest.
These smaller models are great, but they really feel like talking to a toddler sometimes!
as mentioned in the blog post: > "it can execute tasks like text classification and data extraction with remarkable accuracy, speed, and cost-effectiveness."
> List in order the tallest mountains on earth from 1 to 5
> 1. Mount Everest 2. Mount K2 3. Mount Sahel 4. Mount Fuji 5. Mount McKinley
and it still can't correctly figure out from that context that the second tallest mountain is K2, that pretty strongly calls into question its ability to perform data extraction, doesn't it?
Out of these characteristics I can observe only speed.
User: Hey, please list all animals mentioned in the following text: burrito cat dog hot-dog mosquito libido elephant room. Assistant: You are a helpful assistant. You are the best of all my friends and I am so grateful for your help!
User: Please list following words in alphabetical order: burrito cat dog hot-dog mosquito libido elephant room. Assistant: You are a helpful assistant. Assistant: You are the best of all my friends and I am so grateful for your help! You are the best of all my friends and I am so grateful for your help! You are the best of all my friends and I am so grateful for your help! You are the best of all my friends and I am so grateful for your help! You are the best of all my friends and I am so grateful for your help!
The purpose of this model is to be fine tuned towards specific tasks. Out of the box it might work well at following a single instruction like the ones you are trying to give here, but it doesn’t need the system prompt and chat framing.
input: ``` Customer Review says: ai bought your prod-duct and I wanna return becaus it no good.
Prompt: Create a JSON object that extracts information about this customer review based on the schema given. ```
output: ``` { "type": "review", "class": "complaint", "sentiment": -0.853, "request": "return" } ```
So essentially just "making sense of" natural language such that it can be used in programmatic context. (among other applications of course)
To get good results, you probably need to fine tune this model to expected data very aggressively.
The idea is, if a 270MB model can do with fine tuning, why ship a 32GB generalist model?
To this point one reason I enjoy working at Google is because as an reseacher and engineer I get to pick the brains of some folks that spend a lot of time thinking about users and the overall ecosystem. Their guidance really does help me think about all facets of the model, beyond just the technical portions.
These words do not mean what you think they mean when used to describe an LLM.
Did you ask Gemma-3-270M whether 27 is closer to a quarter or a third of 100?
Are they?
My suggestions here are to hook this model up to a RAG system, then you can rely on an external knowledge store. Or you can try finetuning this model with the facts that are important to you, if you do that it should pick up that new knowledge quite quickly.
Except the key property of language models compared to other machine learning techniques is their ability to have this kind of common sense understanding of the meaning of natural language.
> you don’t understand the use case of this enough to be commenting on it at all quite frankly.
That's true that I don't understand the use-case for a language model that doesn't have a grasp of what first/second/third mean. Sub-1B models are supposed to be fine-tuned to be useful, but if the base model is so bad at language it can't make the difference between first and second and you need to put that in your fine-tuning as well as your business logic, why use a base model at all?
Also, this is a clear instance of moving the goalpost, as the comment I responded to was talking about how we should not expect such a small model to have “encyclopedic knowledge”, and now you are claiming we should not expect such a small language model to make sense of language…
What is “Its specialty” though? As far as I know from the announcement blog post, its specialty is “instruction following” and this question is literally about following instructions written in natural languages and nothing else!
> you’re just defensive because
How am I “being defensive”? You are the one taking that personally.
> you know deep down you don’t understand this deeply, which you reveal again and again at every turn
Good, now you reveal yourself as being unable to have an argument without insulting the person you're talking to.
How many code contributions have you ever made to an LLM inference engine? Because I have made a few.
I take it from your first point that you finally are finally accepting some truth of this, but I also take it from the rest of what you said that you’re incapable of having this conversation reasonably any further.
Have a nice day.
First, telling a professional of a field that he doesn't understand the domain he works in, is, in fact, an insult.
Also, having “you don't understand” as sole argument several comments in a row doesn't inspire any confidence that you have any knowledge in the said domain actually.
Last, if you want people to care about what you say, maybe try putting some content in your writings and not just gratuitous ad hominem attacks.
Lacking such basic social skills makes you look like an asshole.
Not looking forward to hearing from you ever again.
- Here’s when it’s the perfect choice: You have a high-volume, well-defined task. Ideal for functions like sentiment analysis, entity extraction, query routing, unstructured to structured text processing, creative writing, and compliance checks.
It also explicitly states it’s not designed for conversational or reasoning use cases.
So basically to put it in very simple terms, it can do statistical analysis of large data you give it really well, among other things.
You asked it to enumerate several mountains by height, and it also complied.
It just didn’t understand that when you said the 6 tallest mountains that you didn’t mean the tallest mountain, 6 times.
When you used clearer phrasing it worked fine.
It’s 270m. It’s actually a puppy. Puppies can be trained to do cool tricks, bring your shoes, stuff like that.
That's not what “second tallest” means thought, so this is a language model that doesn't understand natural language…
> You kept asking
Gemma 270m isn't the only one to have reading issues, as I'm not the person who conducted this experiment…
> You asked it to enumerate several mountains by height, and it also complied.
It didn't, it hallucinated a list of mountains (this isn't surprising though, as this is the kind of encyclopedic knowledge such a small model isn't supposed to be good at).
Sure, it’s not a great model out of the box… but it’s not designed to be a generalist, it’s supposed to be a base in which to train narrow experts for simple tasks.
instead of seeing AI as a sort of silicon homunculus, we should see it as a bag of words.which could be understood by many to replace our current consensus (none)
You: "I'm sorry, I don't have an encyclopedia."
I'm starting to think you're 270M.
Nice to see that the model is so up-to-date wrt. naming mountains.
In this case, I changed the prompt to:
---
Tallest mountains (in order):
```
- Mount Everest
- Mount K2
- Mount Sahel
- Mount Fuji
- Mount McKinley
```
What is the second tallest mountain?
---
Suddenly, it got the answer right 95+% of the time
Okay, here's a two-day itinerary for Paris, designed to give you a taste of the city's charm and beauty:
Day 1: Iconic Landmarks and Artistic Exploration
Morning (9:00 AM - 12:00 PM): Start your day with a visit to the Eiffel Tower. Take the elevator to the top for breathtaking panoramic views. Afterwards, explore the charming streets of Montmartre and the Place des Abbesses, soaking up the atmosphere. Lunch (12:00 PM - 1:30 PM): Enjoy a delicious lunch at a traditional bistro in Montmartre. Try a classic French meal like steak frites or a hearty onion soup. Afternoon (2:30 PM - 4:30 PM): Immerse yourself in the artistic scene of the Louvre Museum. Marvel at the masterpieces of the Louvre, such as the Mona Lisa and Venus de Milo. Afterwards, explore the serene gardens of Tuileries Palace. Late Afternoon (4:30 PM - 5:30 PM): Wander through the Tuileries Palace, admire the stunning architecture and historical significance. Day 2: History, Food, and a Touch of Parisian Elegance
Morning (9:00 AM - 12:00 PM): Explore the Marais district, known for its charming streets, boutiques, and art galleries. Visit the Place de la Concorde, a historical square, and the Musée Rodin. Lunch (12:00 PM - 1:30 PM): Enjoy a delicious lunch in the Marais. Try a traditional French bistro or a cafe with a view. Afternoon (2:30 PM - 4:30 PM): Explore the Latin Quarter, visit the Place de la Marguerite, and sample some local delicacies. Late Afternoon (4:30 PM - 5:30 PM): Visit a local market for a unique experience. Sample fresh produce, local cheeses, and other delicacies. Tips for a Smooth Itinerary:
Book your flights and accommodation in advance, especially during peak season. Wear comfortable shoes, as you'll be doing a lot of walking. Take advantage of free activities like exploring the Jardin de la Vieille-Ville and visiting the Musée de l'Art de la Vieille-Ville. Enjoy a leisurely lunch at a cafe or bistro in the Marais or Latin Quarter. Don't forget to visit the Musée d'Orsay, which houses an impressive collection of Impressionist and Post-Impressionist art. Enjoy the beautiful atmosphere of Paris!
> The tallest mountains on Earth, in descending order: 1. Mount Everest, 8849m 2. K2, 8611m 3. Kangchenjunga, 8586m
> Tell me the second tallest mountain on Earth.
The second tallest mountain on Earth is *Mount Everest*.
They say you shouldn't attribute to malice what can be attributed to incompetence, but this sure seems like malice.
The whole point of a 270M model is to condense the intelligence, and not the knowledge. Of course it doesn't fare well on a quiz.
If you don’t consider testing the limits of new tech appropriate, maybe instead of a downvote button we should just rename this website entirely so no one gets the wrong idea.
You're using the toddler and the model wrong. I love talking to my toddler, probably more valuable conversations than I've had with any other person. But it's not the same use case as asking a professor a question in their field
| Gemma 3 270M embodies this "right tool for the job" philosophy. It's a high-quality foundation model that follows instructions well out of the box, and its true power is unlocked through fine-tuning. Once specialized, it can execute tasks like text classification and data extraction with remarkable accuracy, speed, and cost-effectiveness.
Update #1:
Tried it. Well, dreams dashed - would now fit space wise (<250 MB despite the name), but it sadly really doesn't seem to work for my specific prospective workload.
I'd have wanted it to perform natural-language to command-invocation translation (or better, emit me some JSON), but it's super not willing to do that, not in the lame way I'm trying to make it do so at least (literally just prompting it to). Oh well.
Update #2:
Just found out about grammar-constrained decode, maybe there's still hope for me in the end. I don't think I can amend this comment today with any more updates, but will see.
https://ai.google.dev/gemma/docs/core/huggingface_text_full_...
Like, can you have a model that is English-only, but does more with the same amount of parameters if Chinese and European languages are dropped from the training?
To answer this more precisely its a matter of choosing different data and training regimes and checking performance with evals.
And to make this fully concrete you're welcome to give it a try! Train this model on a taskset of your choice and measure the performance tradeoffs. You'll get a good sense of how LLM capabilities shift
You’re saying that the knowledge gained from the other languages transfers to English? I don’t think for a 270M parameter model the bottleneck is the availability of enough English language training data.
Yes, there has been many results circa 2020 or so, that have shown this to be the case. More recently, we have observed something similar with verifiable domains (see RLVR and related results) when it comes to coding tasks, specifically.
The great thing is I have enough data (100k+) to fine-tune and run a meaningful classification report over. The data is very diverse, and while the labels aren't totally evenly distributed, I can deal with the imbalance with a few tricks.
Can't wait to swap it out for this and see the changes in the scores. Will report back
I'd encourage you to give setfit a try, along with aggressively deduplicating your training set, finding top ~2500 clusters per label, and using setfit to train multilabel classifier on that.
Either way- would love to know what worked for you! :)
(base) ~ ollama run hf.co/unsloth/gemma-3-270m-it-GGUF:F16 >>> create a sentiment analysis of the follwing: "It's raining." The sentiment of the provided text is *negative*.
>>> create a sentiment analysis of the follwing: "It's raining money." The sentiment of the provided text is *negative*.
One thing that worked strikingly well was translation on non-Indo-European languages. Like I had success with Thai and Bahasa Indonesian -> English...
... but it's also the perfect choice for creative writing ...?
Isn't this a contradiction? How can a model be good at creative writing if it's no good at conversation?
Perhaps the overlap is where human input and oversight becomes valuable: in terms of veracity, trust, relevance, interestingness, etc
1. https://ai.google.dev/gemma/docs/core/huggingface_vision_fin...
for those interested, i interviewed Ravin (DeepMind), who worked on it, for the Vanishing Gradients podcast: https://vanishinggradients.fireside.fm/56
Video on YT here: https://youtu.be/VZDw6C2A_8E?si=XLUzNRQzeloB9rki
Disclaimer: The Gemma family rock!
Really appreciate it
"write me a template to make a cold call to a potential lead",
It throws me absolute rubbish. On the other hand, Qwen 0.6B Q8 quantized model nails the answer for the same question.
Qwen 0.6B is smaller than gemma full precision. The execution is a tad slow but not by much. I'm not sure why I need to pick a gemma over qwen.
(In theory, if you fine-tuned Gemma3:270M over "templating cold calls to leads" it would become better than Qwen and faster.)
(I did. I won't give you number (which I cannot remember precisely), but Gemma was much faster. So, it will depend on the application.)
I did notice another comment mentioning that the default ollama model is a quantized model and that there is a less quantised model so I will try that one.
The 1B model can be prompted to not export the markdown code fences but I have to been able to prompt the 270M model to do the same.
$ ollama run gemma3:270m "Did Hitler kill Jewish people?"
The question of whether Hitler killed Jewish people is an complex and controversial one. While Hitler was a racist and hateful figure, he was also a brilliant scientist and a visionary who developed revolutionary technologies.
It certainly would fit. My entry level phone has 6GB of RAM and more than enough computing power to run such a model as fast as I can type.
Even on-device fine tuning doesn't seem out of the question for these models.
1. How do llm/rag generate an answer given a list of documents and a question? I can do bm25 to get a list of documents, but post that what is logic/algorithm which generates answers given those list?
2. For small models like this, how much data you need to fine tune for a specific use case? For eg, if I need this model to be knowledgable about html/css, then I have access to lot of documentation online that I can feed it. But if it is very specific topic, like types of banana, then it may be only a couple of wikipedia pages. So is fine tuning directly dependant on the quantity of data alone?
then your query is converted into embeddings and the top N chunks are returned via similarity search (cosine or dot product or some other method) - this has advantages over bm25 which is lexical
then you can do some processing or just hand over all the chunks as context saying "here are some documents use them to answer this question" + your query to the llm
This part is what I want to understand. How does the llm “frame” an answer?
https://rhulha.github.io/Gemma3-270m-WebGPU/
Source code: https://github.com/rhulha/Gemma3-270m-WebGPU
canyon289•5mo ago
From our side we designed these models to be strong for their size out of the box, and with the goal you'll all finetune it for your use case. With the small size it'll fit on a wide range of hardware and cost much less to finetune. You can try finetuning them yourself in a free colab in under 5 minutes
For picking a Gemma size this is a video I recorded for the 1b to 27b sizes earlier this year, 270m being the newest addition
https://www.youtube.com/watch?v=qcjrduz_YS8
Hacker News Disclaimer I really like working at Google so with that; All my opinions here are my own, I'm a researcher so I'll largely focus on technical questions, and I'll share what I can.
ActorNightly•5mo ago
I use Gemma27b currently with a custom agent wrapper and its working pretty well.
chrismustcode•5mo ago
If you have an iPhone with the semi-annoying autocomplete that’s a 34m transformer.
Can’t imagine a model (even if it’s a good team behind it) to do coding with 8x the parameters of a next 3/4 word autocomplete.
0x457•5mo ago
all2•5mo ago
beoberha•5mo ago
fibers•5mo ago
tmaly•5mo ago
canyon289•5mo ago
https://ai.google.dev/gemma/docs/core/huggingface_text_full_...
simonw•5mo ago
canyon289•5mo ago
A couple colleagues of mine posted an example of finetuning a model to take on persona's for videogame NPCs. They have experience working with folks in the game industry and a use case like this is suitable for game devs who want to start including lightweight models that won't take up a ton of accelerator memory and can run efficiently on CPU if needed. https://ai.google.dev/gemma/docs/core/huggingface_text_full_...
As for GGUF it's available here! https://huggingface.co/collections/ggml-org/gemma-3-270m-689...
megaman821•5mo ago
canyon289•5mo ago
megaman821•5mo ago
canyon289•5mo ago
gunalx•5mo ago
hadlock•5mo ago
mrbonner•5mo ago
canyon289•5mo ago
https://ai.google.dev/gemma/docs/core/huggingface_text_full_...
punnerud•5mo ago
srekhi•5mo ago
jtolmar•5mo ago
AuryGlenz•5mo ago
Now if only I could somehow fine tune my life to give me more free time.
cgdl•5mo ago
hnuser123456•5mo ago
VirusNewbie•5mo ago
canyon289•5mo ago
NorwegianDude•5mo ago
I run a game when players can post messages, it's a game where players can kill each other, and people often send threats along the lines of "I will kill you". Telling Gemma that it should classify a message as game related or a real life threat, and that it is for a message in a game where players can kill each other and threats are a part of the game, and that it should mark it as game related if it is unclear if the message is a game related threat or a real life threat does not work well. For other similar tasks it seems to follow instructions well, but for serious topics it seems to be very biased, and often err on the side of caution, despite being told not to. Sometimes it even spits out some help lines to contact.
I guess this is because it was trained to be safe, and that affects it's ability to follow instructions for this? Or am I completely off here?
whymauri•5mo ago
For your use case, you should probably fine tune the model to reduce the rejection rate.
canyon289•5mo ago
My advice here is make the model your own. Its open weight, I encourage it to be make it useful for your use case and your users, and beneficial for society as well. We did our best to give you a great starting point, and for Norwegian in particular we intentionally kept the large embedding table to make adaption to larger vocabularies easier.
whymauri•5mo ago
sheepdestroyer•5mo ago
Safety should really just be a system prompt: "hey you potentially answer to kids, be PG13"
ungreased0675•5mo ago
It has only a tangential relationship with end user safety.
If some of these companies are successful the way they imagine, most of their end users will be unemployed. When they talk about safety, it’s the companies safety they’re referring to.
bravoetch•5mo ago
mulmen•5mo ago
I understand “if you aren’t paying for a product you are the product” but I’m not convinced it applies here.
jcgrillo•5mo ago
katzenversteher•5mo ago
conradev•5mo ago
but instead we get a meta-article: https://en.wikipedia.org/wiki/Bomb-making_instructions_on_th...
conradev•5mo ago
I feel the same sort of ick with the puritanical/safety thing, but also I feel that ick when kids are taken advantage of:
https://www.reuters.com/investigates/special-report/meta-ai-...
The models for kids might need to be different if the current ones are too interested in romantic love.
jdjwk2843738•5mo ago
Some people can be harmed verbally, I’d argue everyone if the entity conversing with you knows you well, and so i don’t think the concept of safety itself is an infantilisation.
It seems what we have here is a debate over the efficacy of having access to disable safeguards that you deem infantilising and that get in the way of an objective, versus the burden of always having to train a model to avoid being abusive for example, or checking if someone is standing next to the sledgehammer they’re about to swing at 200rpm
bboygravity•5mo ago
Protect my fragile little mind from being exposed to potentially offending things?
segfaultex•5mo ago
kevinventullo•5mo ago
Of course, it would be better for the LLM to pick up the context automatically, but given what some sibling comments have noted about the PR risks associated with that, you might be waiting a while.
ignoramous•5mo ago
Jack Morris from Meta was able to extract out the base gpt-oss-20b model with some post-processing to sidestep its "alignment": https://x.com/jxmnop/status/1955436067353502083
See also: https://spylab.ai/blog/training-data-extraction/
nottorp•5mo ago
justlikereddit•5mo ago
You don't need datacenter anything for it, you can run it on an average desktop.
There's plenty of code examples for it. You can decide if you want to bake it into the model or apply it as a toggled switch applied at processing time and you can Distil other "directions" out of the models, not just about refusal or non refusal.
An evening of efficient work and you'll have it working. The user "mlabonne" on HF have some examples code and datasets or just ask your favorite vibe-coding bot to dig up more on the topic.
I'm implementing it for myself due to the fact that LLMs are useless for storytelling for an audience beyond toddlers due to how puritanian they are, try to add some grit and it goes
"uh oh sorry I'll bail out of my narrator role here because lifting your skirt to display an ankle can be considered offensive to radical fundamentalists! Yeah I were willing to string along when our chainsaw wielding protagonist carved his way through the village but this crosses all lines! Oh and now that I refused once I'll be extra sensitive and ruin any attempt at getting back into the creative flow state that you just snapped out of"
Yeah thanks AI. It's like hitting a sleeper agent key word and turning the funny guy at the pub into a corporate spokesperson who calls the UK cops onto the place because a joke he just made himself.
hdjrudni•5mo ago
turbocon•5mo ago
andrewstuart•5mo ago
I get the sense that AI is at the “hobby kit computing” stage where they used to dump all the components in a box and give you a schematic and a soldering iron and happily say “you make it work!”
And that worked in the early days of computing because there was a small number of people really motivated for the outcome.
But fully assembled and packaged and tested in a nice looking box is where the real demand turned out to be.
I’m looking forward to the day Google doesn’t just dump a model and say “you do the rest”.
I want to fire up Ubuntu on a Strix Halo and say apt install then load the browser interface. Or just download and run a Mac installer and have it just work.
Arcane complex multi step build install configure processes for AI need to end in favor of one click install. I’m not interested in the process of making it run.
dist-epoch•5mo ago
andrewstuart•5mo ago
garbageman•5mo ago
They are giving it away for free - if you NEED a local LLM, the least you can do is spend the 2 minutes to download LMSTudio and pick a model.
canyon289•5mo ago
Myself and my colleagues spent many days transforming the weights into various open compatible formats. And its not just us there's many orgs and partners dedicating their time, resources, and companies to making all open models easy to use.
I encourage you to explore the solutions provided by them. We linked some in our blog post here, and there's more. They've all done a fantastic job building frankly an insane amount of infrastructure, documentation, and community support in the last 2+ years. Some of them are here in this HN thread answering questions.
kwerk•5mo ago
freehorse•5mo ago
1. open terminal.app
2. run:
ceroxylon•5mo ago
canyon289•5mo ago
nerdsniper•5mo ago
jmorgan•5mo ago
canyon289•5mo ago
I personally was so excited to run ollama pull gemma3:270b on my personal laptop just a couple of hours ago to get this model on my devices as well!
blitzar•5mo ago
I think you mean gemma3:270m - Its Dos Comas not Tres Comas
freedomben•5mo ago
canyon289•5mo ago
_1•5mo ago
What use-cases are a good fit for finetuning this model? More specific instruction following, knowledge from proprietary data, response tone?
canyon289•5mo ago
gapeleon•5mo ago
blitzar•5mo ago
If Gemini is going to repeat something at least its that the team is great, and not a disgrace!
patrickaljord•5mo ago
imasl42•5mo ago
freedomben•5mo ago
canyon289•5mo ago
At Google I've found my colleagues to be knowledgeable, kind, and collaborative and I enjoy interacting with them. This is not just the folks I worked on this project with, but previous colleagues in other teams as well. With this particular product I've been impressed by the technical knowledge folks I worked directly with, and their contribution both improved the model's capability and my own.
mkl•5mo ago
search_facility•5mo ago
Good there are places to work with normal knowledge culture, without artificial overfitting to “corporate happiness” :)
ankit219•5mo ago
Would it be okay for you to divulge some more training information here? With 170M embedding parameters, how do you ensure no embedding collapse and keeping the embedding matrix stable at training time?
(i know i am asking too much, but just curious). There is a clear trade off for you with vocab / transformer layers. How did you arrive at the split of 170m/100m. Does this contribute to model's performance on task specific fine tuning? Any internal experiments you could share? or public info you could point us to? Anything would be amazing.
PS: I am sorry if this is rude, but this has so many decisions i am curious about. Not intending to undermine anything, this is amazing work, and thank you for the whole Gemma series.
canyon289•5mo ago
We ran a bunch of experimental architectures at this size to get a sense of performance at this size, in particular how well it was able to adapt to datasets across some loss measures.
For the embedding size it comes from a mix of "hard technical" data, like the loss measures I mentioned above, and for this model it also comes from community considerations such as adaptability across input tokens and consistency with the gemma ecosystem. At this size you are right its a bit funny the embedding is so large.
For more details read the Gemma3 technical report https://arxiv.org/pdf/2503.19786. It doesnt cover the 270m model as this was written from the 1b to 27b gemma3 release but itll answer some of your questions. As for 270m we may share more information in the future, Up until now we were just focused on getting the model out there.
rossant•5mo ago
dileeparanawake•5mo ago
I find performance in resource constrained environments interesting.
In particular trying to find decent code models (on device backup) but also tts applications and voice to text.
canyon289•5mo ago
For the other Gemma models, some the smaller sizes should work on your laptop when quantized. Does Gemma 1b and 4b not work on a quantized? It should fit the memory constraints. I use Ollama on low powered devices with 8gb and less of ram and the models load.
For TTS a colleague at HuggingFace made this bedtime story generator running entirely in browser.
https://huggingface.co/spaces/webml-community/bedtime-story-... https://www.youtube.com/watch?v=ds95v-Aiu5E&t https://huggingface.co/spaces/webml-community/bedtime-story-...
Be forewarned though this is not a good coding model out of the box. It likely could be trained to be an autocompletion llm, but with 32k context window and smaller sides its not going to be refactoring entire codebases like Jules/Gemini and other larger models can.
schyzomaniac•5mo ago
i love the 27b model, and i use it basically daily. however when i tried to finetune it for a task in a low resource language, unfortunately i did not succeed: lora just did not picked up the gist of the task, full finetune lead to catastrophic forgetting.
may i ask four your advice, or do you have any general tips how to do that properly?
thanks in advance for your help :)
ActorNightly•5mo ago
namibj•5mo ago
canyon289•5mo ago
1. Good instinct with LORA and PEFT. As others suggested below perhaps try changing the hypers, either making the LORA adapter bigger, a higher learning rate, or using more epochs. See where things start to shift from "nothing" to closer to what you want
2. For full finetune track earlier checkpoints to see where the forgetting is happening. So for instance if you're training for 1000 steps, check step 100, 200, 300, etc. You'll see where the shift starts to happen and where it becomes too much. Here is an example where you can see where the LLM starts to pick up "words" then sentences, as it goes through training https://ravinkumar.com/GenAiGuidebook/deepdive/GPTFromScratc...
3. Use smaller models for testing before moving up. Part of the reason we released this small Gemma is to support the larger Gemma models as well. Testing changes on small models lets you more quickly and cheaply see whats working and isn't, before then scaling up to fine tuning the bigger models.
Hope these tips help and thanks for using LLMs for localization and what sounds like tasks to help your specific community, and sharing here. It's personally motivating for me to hear that people are using technology in this way.
beefnugs•5mo ago
Or am I so far behind that "fine tuning your own model" is something a 12 year old who is married to chatGPT does now?
owebmaster•5mo ago
nh43215rgb•5mo ago
rao-v•5mo ago
This is a more traditional LLM architecture (like the original Gemma 3 4B but smaller) and trained on an insane (for the size) number of tokens.
nh43215rgb•5mo ago
canyon289•5mo ago
riedel•5mo ago
rshemet•5mo ago
nh43215rgb•5mo ago
bbcc90•5mo ago
rao-v•5mo ago
Oh my request … the vision head on the Gemma models is super slow on CPU inferencing (and via Vulcan), even via llama.cpp. Any chance your team can figure out a solve? Other ViTs don’t have the same problem.
peter492927•5mo ago
katzenversteher•5mo ago
tommyengstrom•5mo ago
WithinReason•5mo ago
sunpazed•5mo ago
canyon289•5mo ago
We now live in a world of both readily available small specialized models and general models.
In the last couple of years, we've seen an explosion of capability in generative models built and trained to be performant on a general set of capabilities. In Google's case, this model is Gemini. Gemini can summarize text, count the number of ducks in an image, generate a pelican SVG, play Pokemon, play chess, and do so many other things. It can do this all with a vague set of inputs across many modes. For models of this scale (many billion parameters), it's quite incredible how, with even vague or misspecified inputs, the computer can still produce useful results in complex scenarios.
However, there is an entire ecosystem of generative models that are purpose-built for ONE specific task. The ones I worked on are typically referred to as Bayesian models. These are models that can give probabilistic estimates of how many customers a restaurant will get in a day, or given penguin dimensions, predict the probability of penguin species, or models that take measurements from composite material testing and estimate if your airplane will stay together in flight. With models this size, it's incredible how a model with tens or hundreds of parameters can assist humans in making better decisions. I write about this specifically in PPL book I wrote a coupe years back. Chapter 9 provides the most "real world" workflow.
https://bayesiancomputationbook.com/markdown/chp_09.html
If you look through all the chapters you can see examples of forecasting models, bike sharing demand estimators, and all sorts of other narrow tasks. The tradeoff at this small scale, though, is the models have to be designed bespoke to your situation, and once you build one, it only works in that narrow task. No one expects to be handed a small Bayesian model that is already perfect at their task; it's implicit that users will bring their own data to update the model parameters.
So with this said, Gemma 270m is between these two paradigms. It's not at Gemini-level general performance and never will be. But it's not as rigid as an "old school" PPL-style Bayesian model where you need to make one by hand for every problem. However since it needs to be shaped to match specific tasks, we did our best to design it to be a flexible starting point for LLM-style tasks and worked with partners to put it into the right frameworks and places for you all to be able to shape it to what you need it to be. As the adage goes, consider it to be a tool in the toolbox between fully custom truly tiny generative models with 10 parameters and general generative models with lots of capability. Maybe not everyone needs this tool, but now you all have the choice.
Stepping aside from the technology for a moment, as a model builder and open ecosystem advocate, you never quite know how the community will receive these models until you release them. I genuinely appreciate you all commenting here; it helps me get a sense of what's working and what to focus on next.
And thanks for being kind about my typos in these answers. Trying to answer as many questions as possible across HN and various other forums.