FunctionGemma 270M Model

https://blog.google/technology/developers/functiongemma/

226•mariobm•1mo ago

Comments

canyon289•1mo ago

Hi all, I'm a research lead on this model. Same as every model release post, I enjoy working at Google for a multitude of reasons, and opinions here are my own.

Happy to answer whatever technical questions I can!

xnx•1mo ago

Cool game! Amazing it can run in the browser. My mind was blown when I saw you could give goal based commands vs prescriptive ones. https://huggingface.co/spaces/webml-community/FunctionGemma-...

canyon289•1mo ago

So I didn't even know this was going to be made until recently, and when I saw it, it also blew my mind. I didn't realize how far along web ml community had pushed things, and was impressed by the creativity of the HF folks with visuals and "game flow".

Personally speaking its really neat to see other people who take these models and run with them, creating things I could haven't have imagined. I'm hoping many others in the open community do the same in the coming weeks and the new year

carlcortright•1mo ago

Very cool model! Congrats on the work!

canyon289•1mo ago

Thank you much for the kind words

NitpickLawyer•1mo ago

Wen gemma4? :)

But on a serious note, I'm happy to see more research going into vSLMs (very small...) My "dream" scenario is to have the "agentic" stuff run locally, and call into the "big guns" as needed. Being able to finetune these small models on consumer cards is awesome, and can open up a lot of niche stuff for local / private use.

canyon289•1mo ago

Trust me as a daily at home Gemma user myself, I'm just excited for what's upcoming as you are, maybe even more because I have some hints for what's to come.

>My "dream" scenario is to have the "agentic" stuff run locally, and call into the "big guns" as needed.

FunctionGemma 270m is your starter pack for this, train your own functions to call out to whatever larger models you choose. It's been quite effective my testing, and the finetuning guides should show you how to add in your own capabilities.

Speaking from the research side its incredible how so many small models, not just Gemma, are achieving performance levels of must larger models from just a year or two ago. It's personally why I stay in this space.

all2•1mo ago

I'd be curious to hear more about this 'small model calls out to larger model' thing. Are there demos out there showing this in use?

xnx•1mo ago

Not FunctionGemma related, but would love to see an open weights model from Google for speech to text transcription (diarization, timestamps, etc.).

Whisper is old and resource intensive for the accuracy it provides.

canyon289•1mo ago

I'm not specifically promising anything but I do want to say 2026 is going to be a great year! Many of my colleagues are shipping models too, such as t5gemma which is on the front page, and I'm personally excited to see what we're all collectively going to release in the coming year.

lukeinator42•1mo ago

Very cool! I was wondering, is a separate model performing speech recognition for the voice demos such as the game? The FunctionGemma model card only seems to show text input/output.

canyon289•1mo ago

Yes a separate model is performing ASR in this case. Gemma270m (base, function, and others) are not multimodal out of the box.

That being said if someone in the community wanted to use other encoders like siglip and plug them into Gemma270m to make it multimodal that'd be a great way to have fun over break and build up an AI Eegineer resume :)

zikani_03•1mo ago

Thanks for all the great work. How good is the model at composing actions and is there a way to say, give the model ability to scope actions, for example if actions are related to permissions or some other context? Would one need to pass the role or permission as context or finetune separately?

I hope those questions make sense

canyon289•1mo ago

> How good is the model at composing actions?

I think you mean taking the results of one function call and putting it into another? We saw some promise but didn't heavily train for this use case in the base model. The thing we noticed with the 270m sized models, and the performance expectations of AI models in 2025, is that these size models perform best for _specific users_ when finetuned to that specific use case.

What I suggest is mocking some data either by hand or using some automated tool and finetuning in this kind of use case and using the finetuning colab setup.

> is there a way to give the model ability to scope action for example if actions are related to permissions

Permissions depend on your system architecture more than the model. The model itself just takes in tokens and outputs tokens. Permissions are defined by your security/system setup in which the model itself is running.

vessenes•1mo ago

Hey! Love the Gemma series. Question that came to mind reading the announcement post - the proposal there is that you can use this as a local backbone and have it treat a larger model as a 'tool call' when more reasoning is needed.

In my mind we want a very smart layer frontier model orchestrating, but not slowing everything down by doing every little thing; this seems like the opposite - a very fast layer that can be like "wait a minute, I'm too dumb for this, need some help".

My question is - does the Gemma team use any evaluation around this particular 'call a (wiser) friend' strategy? How are you thinking about this? Is this architecture flow more an accommodation to the product goal - fast local inference - or do you guys think it could be optimal?

canyon289•1mo ago

We evaluate many things that you alluded to, such as speed on device, output correctness, and also "is this something that would be useful" the last one being a bit abstract.

The way we think about it is what do we think developers and users need, and is there a way we can fill that gap in a useful way. With this model we had the hypothesis you had, there are fantastic larger models out there pushing the frontier of AI capabilities, but there's also a nice for smaller customizable model that's quick to run and quick to tune.

What is optimal then ultimately falls to you and your use cases (which I'm guessing at here), you have options now between Gemini and Gemma.

vessenes•1mo ago

Thanks - yeah, I'm capable of assessing for my own use cases. I guess I was trying to muse out-loud about whether there's a useful benchmark to be made or published out of these assessments. There are a number of architectures where there's a 'fast loop' and then a slow loop. Robotics comes to mind. I think training the ability to be like 'uh oh, better get over to slow good thinking' into the fast loop models is likely to be super useful.

exacube•1mo ago

Some fine tuning data questions:

i see the the dataset Google published in this notebook https://github.com/google-gemini/gemma-cookbook/blob/main/Fu... -- from looking at the dataset on huggingface, it looks synthetically generated.

1. do you recommend any particular mix or focus in the dataset for finetuning this model, without losing too much generality?

2. do you have any recommendations for how many examples per-tool?

thank you for your (and your teams) work!

canyon289•1mo ago

> Do you recommend any particular mix or focus in the dataset for finetuning this model, without losing too much generality?

Astute questions, there's sort of two ways to think about finetuning, 1. Obliterate any general functionality and train the model on your general commands 2. As you asked maintain generality trying to preserve initial model ability

For 2 typically low learning rate or LORA is a good strategy. We show an example in our the finetuning tutorial in the blog.

> 2. do you have any recommendations for how many examples per-tool? This depends on the tool complexity and the variety of user inputs. So a simple tool like turn_flashlight_on(), with no args, will get taught quickly, especially if say you're only prompting in English.

But if you have a more complex function like get_weather(lat, lon, day, region, date) and have prompts coming in in English, Chinese, Gujarati and spanish, the model needs to do a lot more "heavy lifting" to both translate a request and fill out a complex query. We know as programmers date by themselves are insanely complex in natural language (12/18/2025 vs 18/12/2025).

To get this right it'll help the model if it was trained on data that shows it the versions of variations of inputs possible.

Long answer but I hope this makes sense.

exacube•1mo ago

it does; thanks so much, appreciate it!

mrinterweb•1mo ago

I have often wondered how much a specialized local LLM could benefit an agentic tool like Gemini CLI. I would think there could be a good win for speed and minimizing token use if coding agents used a local model. A local model could handle a lot of the low level system interaction type tasks and then send the prompts that require deeper reasoning to frontier models. It seems wasteful and slow to use frontier models to figure out how to grep a codebase, run tests, git diff, etc.

Might Gemini CLI offload some of its prompts to FunctionGemma?

canyon289•1mo ago

I want to say so much right now but I can't :)

The most generic thing I can say is I really do like working at Google because its one of the few (maybe only) company that has models of all sizes and capabilities. Because of this research and product development is insanely fun and feels "magical" when things just click together.

Keep following the Google Developer channels/blogs whatever. Google as a whole is pushing hard in this space and I personally think is building stuff that felt like science fiction just 3 years ago.

mentalgear•1mo ago

Much-appreciate the focus on local-first (on-device) ! I'm wondering how your approach differs from (or integrates with) something like "Differentiable Programming for LLM Tool Selection" https://viksit.substack.com/p/optimizing-tool-selection-for-...

canyon289•1mo ago

I've only just skimmed this blog post but if I'm reading correctly FunctionGemma can work just like what's intended here, a "contextless" tool router.

Going one level up you as a developer have a choice how much context you want to provide to the model. Philipp Schmid wrote a good blog post about this, titling this "context engineering". I like his idea because instead of just blindly throwing stuff into a model's context window and hoping to get good performance, it encourages folks to think more about how what's going into the context in each turn.

https://www.philschmid.de/context-engineering

Similarly I think the blog post you linked has a similar sentiment. There's nuanced approaches that can yield better results if an engineering mindset is applied.

mudkipdev•1mo ago

If I have a simple mainly question-answering AI using only a couple of tools (web search), am I better off starting with Gemma or FunctionGemma?

canyon289•1mo ago

It depends on a couple of things. If you expect reasoning or frontier level chat abilities then larger Gemma models or Gemini is better.

Another hard constraint is context limit, Gemma 270m is at 32k so if the search results returned are massive then this not a great model. The larger 4b+ Gemma models have 128k, and Gemini token window is in the millions

cbabraham•1mo ago

hi! Does this bring us closer to a gemini-cli like experience using a local modal that can run on a macbook pro? It felt like gemma3n was already 'smart' enough it just wasn't tuned for tool use.

canyon289•1mo ago

Its definitely a step in that direction. I use Gemma models on my local macbook all the time and am personally excited to have this one available for me at home now as well

ekianjo•1mo ago

Does this require webgpu to run on the browser?

A4ET8a8uTh0_v2•1mo ago

<< You are ready to fine-tune: You need the consistent, deterministic behavior that comes from fine-tuning on specific data, rather than the variability of zero-shot prompting. << You prioritize local-first deployment: Your application requires near-instant latency and total data privacy, running efficiently within the compute and battery limits of edge devices.

Thank you. I felt that was a very under appreciated direction ( most of the spotlight seemed to be on 'biggest' models ).

canyon289•1mo ago

I'm with you! Small generative models are awesome, I thought so a decade ago and I still think so now! The size of what is "small" has definitely increased though, I used to think a 100 parameter model was large back in 2016, but here I am now saying 270 million is small :)

nateb2022•1mo ago

Ollama link too: https://ollama.com/library/functiongemma

homarp•1mo ago

llama.cpp link https://huggingface.co/ggml-org/functiongemma-270m-it-GGUF

SpaceManNabs•1mo ago

can you run this from n8n?

canyon289•1mo ago

I just looked through their webpage and github and I'm not sure. But maybe someone should make a feature request!

https://github.com/n8n-io/n8n

orliesaurus•1mo ago

edit: Im so dumb...

canyon289•1mo ago

Its already on the phone! Check out the demo videos and colab that show you how to load this model onto a device relatively easily.

On this project I was lucky enough to work with the Google AI Edge team who have deep expertise in edge deployments on device. Check out this app they built which loads in the Gemma 270m models and runs them on your phone.

https://play.google.com/store/apps/details?id=com.google.ai....

You also can finetune your own models and load them onto device with the same workflow. Scroll to the bottom to see the instructions and a screenshot example https://ai.google.dev/gemma/docs/mobile-actions

xnx•1mo ago

Unbelievable shipping velocity from Google in December, and it sounds like they're not done for the week: https://x.com/osanseviero/status/2001723652635541566

canyon289•1mo ago

:popcorn gif:

eachro•1mo ago

Do you think this would be appropriate for a command line tool that hits various apis as the function calls? Ex: "what's the weather in SF tomorrow?" Or "daily price change of apple, Tesla stock for past week"? (Let's assume I have documented the apis thoroughly somewhere that the model has access to or fine tuned it on this data)

milenf•1mo ago

Hi, also on the FunctionGemma team! Something like this would be a good use case for the model. Based on how complicated the API is you might need to finetune it (we released a colab that guides you through the experience + how to export/run it locally). Generally better tool descriptions help although if it is something very complicated finetuning would be better.

lostmsu•1mo ago

Both your examples require Internet access so there's no reason not to use cloud-hosted model which would work magnitudes better.

Someone•1mo ago

FTA: In our "Mobile Actions" evaluation, fine-tuning transformed the model’s reliability, boosting accuracy from a 58% baseline to 85%. This confirms that for edge agents, a dedicated, trained specialist is an efficient path to production-grade performance.

I would be wary of having a LLM with 85% accuracy call tools on my system. Isn’t that fairly far away from production-grade performance?

I also don’t see that the fact that accuracy can be boosted from 50% to 85% is any indication that it can be boosted further.

all2•1mo ago

There are ways around this. You can push the success rate close to 100% if you use chain of thought and a quorum selection. It isn't great, and it slows response times, but if 85% isn't good enough, you just need to flip the coin about 5 times to get nearly(!) guaranteed results.

spoj•1mo ago

Coin flipping works only if the fails are roughly independent. More important is the complexity ceiling above which they fail all the time.

all2•1mo ago

So my solution to non-binary failure states is

1. Generate a potential solution

2. If the solution is complex, chunk it up into logical parts

3. Vote on each chunk and select those with more than k votes

By doing this you can filter out outliers (not always desirable) and pull the signal out of the noise.

canyon289•1mo ago

Good insight here, we actually did not include thinking into this model partly because we saw how incredibly fast it was to just get the minimum amount of tokens to output an answer.

Thinking helps performance scores but we'll leave it up to users to add additional tokens if they want. Our goal here was the leanest weight and token base for blazing fast performance for you all.

syntaxing•1mo ago

I’ve been wanting to fine tune models for home assistant but unsure how to get some synthetic data, any recommendations?

wds•1mo ago

I've found this dataset specifically for Home Assistant with over 32k examples: https://huggingface.co/datasets/acon96/Home-Assistant-Reques...

syntaxing•1mo ago

Funnily, I did fine tune Qwen 1.7B with this but learned this dataset is meant for the authors extension homeLLM

allenporter•1mo ago

Check out the approach here: https://github.com/allenporter/home-assistant-datasets and the reports/ directory has a leaderboard for function calling. I'm curious to see how well this model does.

syntaxing•1mo ago

Oh nice, I thought about doing something Ng similar with a docker image and fake devices but never got around to it

andai•1mo ago

Hot take: Dodgy small/fast/cheap LLM in a While True approx. equals AGI for most real world tasks.

gavmor•1mo ago

I have not found this to be the case but, then again, a lot has changed in the last year.

gessha•1mo ago

My brain didn’t realize that the parameters were megabytes and not gigabytes and my reaction went from “meh” to “holy bananas!”

Great work from the Google ML teams, I’ll be trying this model out.

mentalgear•1mo ago

Neat, now we can build our LCARS with this?

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

How we made geo joins 400× faster with H3 indexes

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

A century of hair samples proves leaded gas ban worked

Dark Alley Mathematics

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Microsoft open-sources LiteBox, a security-focused library OS

Sheldon Brown's Bicycle Technical Info

Hackers (1995) Animated Experience

PC Floppy Copy Protection: Vault Prolok

An Update on Heroku

Delimited Continuations vs. Lwt for Threads

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

How to effectively write quality code with AI

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

Learning from context is harder than we thought

Understanding Neural Network, Visually

Introducing the Developer Knowledge API and MCP Server

I now assume that all ads on Apple news are scams

FORTH? Really!?

I'm going to cure my girlfriend's brain tumor

Evaluating and mitigating the growing risk of LLM-discovered 0-days

Show HN: Smooth CLI – Token-efficient browser for AI agents

How virtual textures work

Show HN: Slack CLI for Agents

Claude Opus 4.6

Female Asian Elephant Calf Born at the Smithsonian National Zoo

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

How we made geo joins 400× faster with H3 indexes

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

A century of hair samples proves leaded gas ban worked

Dark Alley Mathematics

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Microsoft open-sources LiteBox, a security-focused library OS

Sheldon Brown's Bicycle Technical Info

Hackers (1995) Animated Experience

PC Floppy Copy Protection: Vault Prolok

An Update on Heroku

Delimited Continuations vs. Lwt for Threads

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

How to effectively write quality code with AI

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

Learning from context is harder than we thought

Understanding Neural Network, Visually

Introducing the Developer Knowledge API and MCP Server

I now assume that all ads on Apple news are scams

FORTH? Really!?

I'm going to cure my girlfriend's brain tumor

Evaluating and mitigating the growing risk of LLM-discovered 0-days

Show HN: Smooth CLI – Token-efficient browser for AI agents

How virtual textures work

Show HN: Slack CLI for Agents

Claude Opus 4.6

Female Asian Elephant Calf Born at the Smithsonian National Zoo

FunctionGemma 270M Model

Comments