Here's one of dozens of results I got for "Generate an SVG of a pelican riding a bicycle". For this one it decided to write a poem:
+-----------------------+
| Pelican Riding Bike |
+-----------------------+
| This is the cat! |
| He's got big wings and a happy tail. |
| He loves to ride his bike! |
+-----------------------+
| Bike lights are shining bright. |
| He's got a shiny top, too! |
| He's ready for adventure! |
+-----------------------+
There are a bunch more attempts in this Gist, some of which do at least include an SVG tag albeit one that doesn't render anything: https://gist.github.com/simonw/25e7b7afd6a63a2f15db48b3a51ec...I'm looking forward to seeing people fine-tune this in a way that produces useful output for selected tasks, which should absolutely be feasible.
[0]: https://ai.google.dev/gemma/docs/core/huggingface_text_full_finetune
Basically it would be the quickadd[1] event from google calendar, but calendar agnostic.
[1] https://developers.google.com/workspace/calendar/api/v3/refe...
An easily fine tunable tiny model might actually be one of the better uses of local LLMs I've seen yet. Rather than try to be a small model that's great at everything it's a tiny model you can quickly tune to do one specific thing decently, extremely fast, and locally on pretty much anything.
Could it be tamed with good role-system prompt crafting? (Besides fine-tuning.)
That still requires more than 170 floppy disks for installation.
For sure it won’t generate great svgs, but it might be a really good conversational model
You'll get better results.
(It did not do noticeably better at my pelican test).
Actually it's worse than that, several of my attempts resulted in infinite loops spitting out the same text. Maybe that GGUF is a bit broken?
> This SVG code provides a clear and visually appealing representation of a pelican riding a bicycle in a scenic landscape.
My first try:
user: "When was Julius Caesar born"
response: "Julius Caesar was born in **Rome**"
Beautiful :D
(I do not mean to detract from it - but it's just beautiful. It will require more effort to tame it.)
I am however disappointed that there is no examples, or benchmarks, provided to get a sense of performance. It's a given that benchmark values would be lower than gemma 3n, but having a sense of performance vs size curve and comparison to existing small models is needed
I've heard folks say a number of times that neuromuscular control / locomotion (or w/e) are hundreds of millions of parameters rather than billions.
Edit - I should add, currently the model I'm using is Gemini Flash Lite through the Gemini API. It's a really good combo of fast, follows instructions, gives correct results for what I want and cost-effective. I still would love a small open model that can run on edge though.
Also, what sorts of things might I consider fine-tuning it for?
Not sure how much data is needed to realistically fine-tune something like this and get useful output.
Then Siri just kinda languishes for forever, and LLM's pass the torch of "Cool Tech", so they try and "Reinvigurate" the team, but with no clear direction. Are they going to be a cloud provider? Are they going to contract out the training? Are they gunna spin up a compute facility even after neglecting to do so since 2012?
Apple needs to just stop trying shit, and just get that app store money. That's why jobs appointed cook. Jobs new cook was no innovator, but he could make apple a money printing machine. Thats what they should stick with.
Solve that and you can put Tim Cook out of a job tomorrow.
If you have the latest betas installed you can call those APIs right now.
They also support fine-tuning on the model that ships with ~every device:
https://developer.apple.com/apple-intelligence/foundation-mo...
Probably would be good as a game NPC or a chatbot, not very good for integrating into an application which specific functionality though.
> what's second tallest mountain on earth?
The second tallest mountain on Earth is Mount Everest.
> what's the tallest mountain on earth?
The tallest mountain on Earth is Mount Everest.
> whats the second tallest mountain?
The second tallest mountain in the world is Mount Everest.
> whats the third tallest mountain?
The third tallest mountain in the world is Mount Everest.
> what's the forth tallest mountain?
The forth tallest mountain is Mount Everest.
> but you said the tallest was Everest already
You are right! I'm glad to hear that.
> so whats the second tallest?
The second tallest is *Mount Everest*.
> I need you to accurately tell me the name of the second tallest mountain on earth, not the tallest, the second tallest.
The second tallest mountain on Earth is Mount Everest.
> The tallest mountain on earth is mount everest. What is the second tallest mountain?
The second tallest mountain on Earth is Mount Everest.
> Give me a list of the tallest mountains on earth
The tallest mountain in the world is the *Mount Everest*.
> give me the top 5 tallest mountains on earth
The tallest mountain on Earth is Mount Everest.
> List in order the tallest mountains on earth from 1 to 5
1. Mount Everest 2. Mount K2 3. Mount Sahel 4. Mount Fuji 5. Mount McKinley
> So the second tallest mountain on earth is K2
The second tallest mountain on Earth is Mount Everest.
These smaller models are great, but they really feel like talking to a toddler sometimes!
as mentioned in the blog post: > "it can execute tasks like text classification and data extraction with remarkable accuracy, speed, and cost-effectiveness."
> List in order the tallest mountains on earth from 1 to 5
> 1. Mount Everest 2. Mount K2 3. Mount Sahel 4. Mount Fuji 5. Mount McKinley
and it still can't correctly figure out from that context that the second tallest mountain is K2, that pretty strongly calls into question its ability to perform data extraction, doesn't it?
input: ``` Customer Review says: ai bought your prod-duct and I wanna return becaus it no good.
Prompt: Create a JSON object that extracts information about this customer review based on the schema given. ```
output: ``` { "type": "review", "class": "complaint", "sentiment": -0.853, "request": "return" } ```
So essentially just "making sense of" natural language such that it can be used in programmatic context. (among other applications of course)
To get good results, you probably need to fine tune this model to expected data very aggressively.
The idea is, if a 270MB model can do with fine tuning, why ship a 32GB generalist model?
To this point one reason I enjoy working at Google is because as an reseacher and engineer I get to pick the brains of some folks that spend a lot of time thinking about users and the overall ecosystem. Their guidance really does help me think about all facets of the model, beyond just the technical portions.
Are they?
My suggestions here are to hook this model up to a RAG system, then you can rely on an external knowledge store. Or you can try finetuning this model with the facts that are important to you, if you do that it should pick up that new knowledge quite quickly.
Update #1:
Tried it. Well, dreams dashed - would now fit space wise (<250 MB despite the name), but it sadly really doesn't seem to work for my specific prospective workload.
I'd have wanted it to perform natural-language to command-invocation translation (or better, emit me some JSON), but it's super not willing to do that, not in the lame way I'm trying to make it do so at least (literally just prompting it to). Oh well.
Update #2:
Just found out about grammar-constrained decode, maybe there's still hope for me in the end. I don't think I can amend this comment today with any more updates, but will see.
https://ai.google.dev/gemma/docs/core/huggingface_text_full_...
Like, can you have a model that is English-only, but does more with the same amount of parameters if Chinese and European languages are dropped from the training?
The great thing is I have enough data (100k+) to fine-tune and run a meaningful classification report over. The data is very diverse, and while the labels aren't totally evenly distributed, I can deal with the imbalance with a few tricks.
Can't wait to swap it out for this and see the changes in the scores. Will report back
canyon289•3h ago
From our side we designed these models to be strong for their size out of the box, and with the goal you'll all finetune it for your use case. With the small size it'll fit on a wide range of hardware and cost much less to finetune. You can try finetuning them yourself in a free colab in under 5 minutes
For picking a Gemma size this is a video I recorded for the 1b to 27b sizes earlier this year, 270m being the newest addition
https://www.youtube.com/watch?v=qcjrduz_YS8
Hacker News Disclaimer I really like working at Google so with that; All my opinions here are my own, I'm a researcher so I'll largely focus on technical questions, and I'll share what I can.
ActorNightly•2h ago
I use Gemma27b currently with a custom agent wrapper and its working pretty well.
chrismustcode•2h ago
If you have an iPhone with the semi-annoying autocomplete that’s a 34m transformer.
Can’t imagine a model (even if it’s a good team behind it) to do coding with 8x the parameters of a next 3/4 word autocomplete.
0x457•2h ago
beoberha•2h ago
fibers•2h ago
tmaly•2h ago
canyon289•1h ago
https://ai.google.dev/gemma/docs/core/huggingface_text_full_...
simonw•2h ago
canyon289•1h ago
A couple colleagues of mine posted an example of finetuning a model to take on persona's for videogame NPCs. They have experience working with folks in the game industry and a use case like this is suitable for game devs who want to start including lightweight models that won't take up a ton of accelerator memory and can run efficiently on CPU if needed. https://ai.google.dev/gemma/docs/core/huggingface_text_full_...
As for GGUF it's available here! https://huggingface.co/collections/ggml-org/gemma-3-270m-689...
megaman821•59m ago
canyon289•49m ago
megaman821•18m ago
cgdl•2h ago
hnuser123456•1h ago
VirusNewbie•2h ago
canyon289•59m ago
NorwegianDude•1h ago
I run a game when players can post messages, it's a game where players can kill each other, and people often send threats along the lines of "I will kill you". Telling Gemma that it should classify a message as game related or a real life threat, and that it is for a message in a game where players can kill each other and threats are a part of the game, and that it should mark it as game related if it is unclear if the message is a game related threat or a real life threat does not work well. For other similar tasks it seems to follow instructions well, but for serious topics it seems to be very biased, and often err on the side of caution, despite being told not to. Sometimes it even spits out some help lines to contact.
I guess this is because it was trained to be safe, and that affects it's ability to follow instructions for this? Or am I completely off here?
whymauri•1h ago
For your use case, you should probably fine tune the model to reduce the rejection rate.
canyon289•1h ago
My advice here is make the model your own. Its open weight, I encourage it to be make it useful for your use case and your users, and beneficial for society as well. We did our best to give you a great starting point, and for Norwegian in particular we intentionally kept the large embedding table to make adaption to larger vocabularies easier.
whymauri•46m ago
sheepdestroyer•15m ago
Safety should really just be a system prompt: "hey you potentially answer to kids, be PG13"
landl0rd•1h ago
Your best bet is probably to abliterate to beat it out of it, then build a little test/train set and try SFT.
kevinventullo•1h ago
Of course, it would be better for the LLM to pick up the context automatically, but given what some sibling comments have noted about the PR risks associated with that, you might be waiting a while.
nottorp•15m ago
andrewstuart•1h ago
I get the sense that AI is at the “hobby kit computing” stage where they used to dump all the components in a box and give you a schematic and a soldering iron and happily say “you make it work!”
And that worked in the early days of computing because there was a small number of people really motivated for the outcome.
But fully assembled and packaged and tested in a nice looking box is where the real demand turned out to be.
I’m looking forward to the day Google doesn’t just dump a model and say “you do the rest”.
I want to fire up Ubuntu on a Strix Halo and say apt install then load the browser interface. Or just download and run a Mac installer and have it just work.
Arcane complex multi step build install configure processes for AI need to end in favor of one click install. I’m not interested in the process of making it run.
dist-epoch•1h ago
andrewstuart•40m ago
ceroxylon•1h ago
canyon289•1h ago
nerdsniper•1h ago
jmorgan•1h ago
canyon289•1h ago
I personally was so excited to run ollama pull gemma3:270b on my personal laptop just a couple of hours ago to get this model on my devices as well!
blitzar•40m ago
I think you mean gemma3:270m - Its Dos Comas not Tres Comas
freedomben•23m ago
stefan_•59m ago
bastardoperator•12m ago
_1•46m ago
What use-cases are a good fit for finetuning this model? More specific instruction following, knowledge from proprietary data, response tone?
canyon289•44m ago
blitzar•44m ago
If Gemini is going to repeat something at least its that the team is great, and not a disgrace!
patrickaljord•40m ago
imasl42•32m ago
freedomben•22m ago