Improved Gemini 2.5 Flash and Flash-Lite

https://developers.googleblog.com/en/continuing-to-bring-you-our-latest-models-with-an-improved-gemini-2-5-flash-and-flash-lite-release/

143•meetpateltech•1h ago

Comments

scosman•1h ago

Ugh. If the model name includes sem_ver version number, increment the version number when making a new release!

Anthropic learned this lesson. Google, Deepseek, Kimi, OpenAI and others keep repeating it. This feels like Gemini_2.5_final_FINAL_FINAL_v2.

rsc•1h ago

FWIW, the versions are not semver but they do follow a defined and regular version schema: https://ai.google.dev/gemini-api/docs/models#model-versions.

Imustaskforhelp•49m ago

I am seeing a lot of demand for something like a semver for AI models.

Could thereotically there could be something like a semver that can be autogenerated from that defined and regular version scheme that you shared?

Like, Honestly my idea of it is that I could use something like openrouter and then just change the semver without having to worry about these soooo many things as the schema that you shared y'know?

A website / tool which can create a semver from this defined scheme and vice versa can be really cool actually :>

qafy•46m ago

2.5 isn't the version number, its the model generation. it would only be updated when the underlying model architecture, training, etc are updated. this release is, as the name implies, the same model but likely with hardware optimizations, system prompt, and fine-tuning tweaks applied.

ComputerGuru•16m ago

Ok, so if not 2.6 then 2.5.1 :)

newfocogi•1h ago

Non-AI Summary:

Both models have improved intelligence on Artificial Analysis index with lower end-to-end response time. Also 24% to 50% improved output token efficiency (resulting in lower cost).

Gemini 2.5 Flash-Lite improvements include better instruction following, reduced verbosity, stronger multimodal & translation capabilities. Gemini 2.5 Flash improvements include better agentic tool use and more token-efficient reasoning.

Model strings: gemini-2.5-flash-lite-preview-09-2025 and gemini-2.5-flash-preview-09-2025

Mistletoe•1h ago

2.5 Flash is the first time I've felt AI has become truly useful to me. I was #1 AI hater but now find myself going to the Gemini app instead of Google search. It's just better in every way and no ads. The info it provides is usually always right and it feels like I have the whole generalized and accurate knowledge of the internet at my fingertips in the app. It's more intimate, less distractions. Just me and the Gemini app alone talking about kale's ideal germination temperature, instead of a bunch of mommy bloggers, bots, and SEO spam.

Now how long can Google keep this going and cannibalizing how they make money is another question...

yesco•4m ago

It's also excellent for subjective NLP-type analysis. For example, I use it for "scouting" chapters in my translation pipeline to compile coherent glossaries that I can feed into prompts for per-chapter translation.

This involves having it identify all potential keywords and distinct entities, determine their approximate gender (important for languages with ambiguous gender pronouns), and then perform a line-by-line analysis of each chapter. For each line, it identifies the speaking entity, determines whose POV the line represents, and identifies the subject entity. While I didn't need or expect perfection, Gemini Flash 2.5 was the only model I tested that could not only follow all these instructions, but follow them well. The cheap price was a bonus.

I was thoroughly impressed, it's now my go-to for any JSON-formatted analysis reports.

jonplackett•45m ago

I think “Non-AI summary” is going to become a thing. I already enjoyed reading it more because I knew someone had thought about the content.

crishoj•34m ago

Any idea what "output token efficiency" refers to? Gemini Flash is billed by number of input/output tokens, which I assume is fixed for the same output, so I'm struggling to understand how it could result in lower cost. Unless of course they have changed tokenization in the new version?

minimaxir•26m ago

The post implies that the new model are better at thinking, therefore less time/cost spent overall.

The first chart implies the gains are minimal for nonthinking models.

OGEnthusiast•1h ago

I'm not even sure how to evaluate what a "better" LLM is, when I've tried running the exact same model (Qwen3) and prompt and gotten vastly different responses on Qwen Chat vs OpenRouter vs running the model locally.

1899-12-30•1h ago

That's a difference in the system prompt, not the model itself.

OGEnthusiast•6m ago

True yeah, good point.

daemonologist•57m ago

There several reasons responses from the same model might vary:

- "temperature" - intentional random sampling from the most likely next tokens to improve "creativity" and help avoid repetition

- quantization - running models with lower numeric precision (saves on both memory and compute, without impacting accuracy too much)

- differences in/existence of a system prompt, especially when using something end-user-oriented like Qwen Chat

- not-quite-deterministic GPU acceleration

Benchmarks are usually run at temperature zero (always take the most likely next token), with the full-precision weights, and no additions to the benchmark prompt except necessary formatting and stuff like end-of-turn tokens. They also usually are multiple-choice or otherwise expect very short responses, which leaves less room for run-to-run variance.

Of course a benchmark still can't tell you everything - real-world performance can be very different.

OGEnthusiast•9m ago

Thanks, this is a good checklist.

Liwink•1h ago

Gemini 2.5 Flash is an impressive model for its price. However, I don't understand why Gemini 2.0 Flash is still popular.

From OpenRouter last week:

* xAI: Grok Code Fast 1: 1.15T

* Anthropic: Claude Sonnet 4: 586B

* Google: Gemini 2.5 Flash: 325B

* Sonoma Sky Alpha: 227B

* Google: Gemini 2.0 Flash: 187B

* DeepSeek: DeepSeek V3.1 (free): 180B

* xAI: Grok 4 Fast (free): 158B

* OpenAI: GPT-4.1 Mini: 157B

* DeepSeek: DeepSeek V3 0324: 142B

crazysim•1h ago

Maybe the same reason why they kept the name for the 2.5 Flash update.

People are lazy at pointing to the latest name.

koakuma-chan•1h ago

Why is Grok so popular

coder543•1h ago

I think it has been free in some editor plugins, which is probably a significant factor.

I would rather use a model that is good than a model that is free, but different people have different priorities.

YetAnotherNick•57m ago

Non free has double usage than free. Free one uses your data for training.

Imustaskforhelp•54m ago

I mean, I can kinda roll through a lot of iterations with this model without worrying about any AI limits.

Y'know with all these latest models, the lines are kinda blurry actually. The definition of "good" is being foggy.

So it might as well be free as the definition of money is clear as crystal.

I also used it for some time to test on something really really niche like building telegram bot in cloudflare workers and grok-4-fast was kinda decent on that for the most part actually. So that's nice.

davey48016•1h ago

I think it's very cheap right now.

keeeba•1h ago

It came from nowhere to 1T tokens per week, seems… suspect.

riku_iki•1h ago

I think it is included for free into some coding product

BoredPositron•1h ago

They had a lot of free promos with coding apps. It's okay and cheap so I bet some sticked with it.

NitpickLawyer•59m ago

It's pretty good and fast af. At backend stuff is ~ gpt5-mini in capabilities, writes ok code, and works good with agentic extensions like roo/kilo. My colleagues said it handles frontend creation so-so, but it's so fast that you can "roll" a couple of tries and choose the one you want.

Also cheap enough to not really matter.

SR2Z•47m ago

Yeah, the speed and price are why I use it. I find that any LLM is garbage at writing code unless it gets constant high-entropy feedback (e.g. an MCP tool reporting lint errors, a test, etc.) and the quality of the final code depends a lot more on how well the LLM was guided than the quality of the model.

A bad model with good automated tooling and prompts will beat a good model without them, and if your goal is to build good tooling and prompts you need a tighter iteration loop.

nwienert•17m ago

This is so far off my experience. Grok 4 fast is straight trash, it literally isn’t even close to decent code for what I tried. Meanwhile Sonnet is miles better - but even still, Opus while I guess technically being only slightly better, in practice is so much better that I find it hard to use Sonnet at all.

minimaxir•30m ago

Grok Code Fast 1 usage is driven almost entirely by Kilo Code and Cline: https://openrouter.ai/x-ai/grok-code-fast-1/apps

Both apps have offered usage for free for a limited time:

https://blog.kilocode.ai/p/grok-code-fast-get-this-frontier-...

https://cline.bot/blog/grok-code-fast

frde_me•1h ago

I know we have a lot of workloads at my company on older models no one has bothered to upgrade yet

koakuma-chan•1h ago

Hell yeah, GPT 35 Turbo

kilroy123•12m ago

There are cheaper models. Could cut the bill in half or more.

tiahura•58m ago

Primarily classification or something else?

YetAnotherNick•1h ago

Gemini 2.0 Flash is the best fast non reasoning model by quite a margin. Lot of things doesn't require any reasoning.

mistic92•55m ago

Price, 2.0 Flash is cheaper than 2.5 Flash but still very good model.

nextos•35m ago

API usage of Flash 2.0 is free, at least till you hit a very generous bound. It's not simply a trial period. You don't even need to register any payment details to get an API key. This might be a reason for its popularity. AFAIK only some Mistral offerings have a similar free tier?

PetrBrzyBrzek•47m ago

It’s cheaper and faster. What’s not to understand?

simonw•40m ago

My one big problem with OpenRouter is that, as far as I can tell, they don't provide any indication of how many companies are using each model.

For all I know there are a couple of enormous whales on there who, should they decide to switch from one model to another, will instantly impact those overall ratings.

I'd love to have a bit more transparency about volume so I can tell if that's what is happening or not.

minimaxir•34m ago

Granted, due to OpenRouter's 5.5% surcharge, any enormous whales have a strong financial incentive to use the provider's API directly.

A "weekly active API Keys" faceted by models/app would be a useful data point to measure real-world popularity though.

tardyp•59m ago

LLM Model versioning really makes me perplex those days...

jsight•39m ago

Yeah, why is it that working with AI makes people completely forget what version numbers mean?

gemini-2.5-flash-preview-09-2025 - what are they thinking?

I thought about joking that they had AI name it for them, but when I asked Gemini, it said that this name was confusing, redundant, and leads to unnecessarily high cognitive load.

Maybe Googlers should learn from their own models.

iamdelirium•12m ago

Because the number is model generation.

ImPrajyoth•59m ago

I’ve been tinkering with the last version for code gen. This update might finally put it on par with Claude for latency. Anyone tried benchmarking the new preview yet?

aeon_ai•57m ago

I think a Model-specific SemVer needs to be created to be clearer as to what degree of change has taken place, in the age of model weights.

Something that distinguishes between a completely new pre-training process/architecture, and standard RLHF cycles/optimizations.

brap•57m ago

Am I the only one who is starting to feel the Gemini Flash models are better than Pro?

Flash is super fast, gets straight to the point.

Pro takes ages to even respond, then starts yapping endlessly, usually confuses itself in the process and ends up with a wrong answer.

selimthegrim•14m ago

I tried to put Pro deep research on an actual research task and it didn’t even return anything just kept on working.

gnulinux•13m ago

This is not my experience. In my experience Gemini 2.5 Pro is the best model in every use-case I tried. There are a few very hard (graduate level) logic or math problems that Claude 4.1 Opus edged-out over Gemini 2.5 Pro, but in general if you have no idea which model will perform best on a difficult question, imho Gemini 2.5 Pro is a safer bet especially since it's significantly cheaper. Gemini 2.5 Flash is really good but imho not nearly as good as Pro in (1) research math (2) creative/artistic writing (3) open ended programming debugging.

On the other hand, I do prefer using Claude 4 Sonnet on very open-ended agentic programming tasks because it seems to have a better integration with VSCode Copilot. Gemini 2.5 Pro bugs out much more often where Claude works fine almost every time.

ashwindharne•56m ago

Google seems to be the main foundation model provider that's really focusing on the latency/TPS/cost dimensions. Anthropic/OpenAI are really making strides in model intelligence, but underneath some critical threshold of performance, the really long thinking times make workflows feel a lot worse in collaboration-style tools, vs a much snappier but slightly less intelligent model.

It's a delicate balance, because these Gemini models sometimes feel downright lobotomized compared to claude or gpt-5.

jjani•44m ago

Can't agree with that. Gemini doesn't lead just on price/performance - ironically it's the best "normie" model most of the time, despite it's lack of popularity with them until very recent.

It's bad at agentic stuff, especially coding. Incomparably so compared to Claude and now GPT-5. But if it's just about asking it random stuff, and especially going on for very long in the same conversation - which non-tech users have a tendency to do - Gemini wins. It's still the best at long context, noticing things said long ago.

Earlier this week I was doing some debugging. For debugging especially I like to run sonnet/gpt5/2.5-pro in parallel with the same prompt/convo. Gemini was the only one that, 4 or so messages in, pointed out something very relevant in the middle of the logs in the very first message. GPT and Sonnet both failed to notice, leading them to give wrong sample code. I would've wasted more time if I hadn't used Gemini.

It's also still the best at a good number of low-resource languages. It doesn't glaze too much (Sonnet, ChatGPT) without being overly stubborn (raw GPT-5 API). It's by far the best at OCR and image recognition, which a lot of average users use quite a bit.

Google's ridiculously bad at marketing and AI UX, but they'll get there. They're already much more than just a "bang for the buck" player.

FWIW I use all 3 above mentioned on a daily basis for a wide variety of tasks, often side-by-side in parallel to compare performance.

dpoloncsak•41m ago

Does it still try to 'unplug' itself if it gets something wrong, or did they RL that out yet?

jjani•20m ago

Not sure if you're joking or serious? Every model has "degenerate" behavior it can be coerced into. Sonnet is even more apologetic on average.

mips_avatar•38m ago

IMO the race for Latency/TPS/cost is entirely between grok and gemini flash. No model can touch them (especially for image to text related tasks), openai/anthropic seem entirely uninterested in competing for this.

omarspira•34m ago

I would be surprised if this dichotomy you're painting holds up to scrutiny.

My understanding is Gemini is not far behind on "intelligence", certainly not in a way that leaves obvious doubt over where they will be over the next iteration/model cycles, where I would expect them to at least continue closing the gap. I'd be curious if you have some benchmarks to share that suggest otherwise.

Meanwhile, afaik something Google has done, and perhaps relates back to your point re "latency/TPS/cost dimensions" that other providers aren't doing as much is integrating their model into interesting products beyond chat, at a pace that seems surprising given how much criticism they had been taking for being "slow" to react to the LLM trend.

Besides the Google Workspace surface and Google search, which now seem obvious - there are other interesting places where Gemini will surface - https://jules.google/ for one, to say nothing of their experiments/betas in the creative space - https://labs.google/flow/about

Another I noticed today: https://www.google.com/finance/beta

I would have thought putting Gemini on a finance dashboard like this would be inviting all sorts of regulatory (and other) scrutiny... and wouldn't be in keeping with a "slow" incumbent. But given the current climate, it seems Google is plowing ahead just as much as anyone else - with a lot more resources and surface to bring to bear. Imagine Gemini integration on Youtube. At this point it just seems like counting down the days...

ChildOfChaos•56m ago

Hopefully this isn't instead of the rumoured Gemini 3 pro this week.

Imustaskforhelp•53m ago

I think that the Gemini 3 pro might be next month I am not sure.

can I get the sources of your rumour please? (Yes I know that I can search it but I would honestly prefer it if you could share it, thanks in advance!)

ChildOfChaos•50m ago

Bens bites was suggesting we might be Gemini 3 pro and Claude 4.5 this week.

To be honest, I hadn't heard that elsewhere, but I haven't been following it massively this week.

fnordsensei•32m ago

Next week is next month.

Imustaskforhelp•3m ago

I swear I forgot :sob:

I AM LAUGHING SO HARD RIGHT NOWWWWW

LMAOOOO

I wish to upvote this twice lol

minimaxir•55m ago

Gemini 2.5 Flash has been the LLM I've used the most recently for a variety of domains, especially image inputs and structured outputs which beat both OpenAI and Anthropic in my opinion.

zzleeper•45m ago

Not sure prices are changed though. :/

minimaxir•25m ago

Prices indeed did not change, I misread and deleted.

pupppet•4m ago

Gemini 2.5 Flash runs circles around ChatGPT 5 for many of my tasks, I’m surprised it’s not more popular than it is.

Fiahil•52m ago

Question to the one that tested it : Does it still timeout a lot with unreliable response time (1-5 sec) ?

zitterbewegung•49m ago

Okay this is a nitpick but why wouldn't you increment a part of the version number to signify that there is an improvement? These releases are confusing.

bl4ckneon•43m ago

I would assume that it will supersede the model that they currently have. So eventually 2.5 flash will be the new and improved 2.5 Flash rather than 2.6.

Same way that openai updated their 4-o models and the like, which didn't turn out so well when it started glazing everyone and they had to revert it (maybe that was just chat and not api)

zitterbewegung•30m ago

Even if it was just chat and or API I have used the API and I know that they have at minimum added the retraining date and time that they could just affix to the Gemini 2.5 Flash and Flash-Lite because when I use the API I have to verify that the upgrade of the backend system didn't break anything and pinning versions I assume is pretty common.

TIPSIO•41m ago

This is also my beef...

Anthropic kind of did the same thing [1] except it back-fired recently with the cries of "nerfing".

We buy these tokens, which are very hard to do in limited tiers, they expire after only a year, and we don't even know how often the responses are changing in the background. Even a 1% improvement or reduction I would want disclosed.

Really scary foundation AI companies are building on IMO. Transparency and access is important.

[1] https://status.claude.com/incidents/h26lykctfnsz

davidmckayv•33m ago

This really captures something I've been experiencing with Gemini lately. The models are genuinely capable when they work properly, but there's this persistent truncation issue that makes them unreliable in practice.

I've been running into it consistently, responses that just stop mid-sentence, not because of token limits or content filters, but what appears to be a bug in how the model signals completion. It's been documented on their GitHub and dev forums for months as a P2 issue.

The frustrating part is that when you compare a complete Gemini response to Claude or GPT-4, the quality is often quite good. But reliability matters more than peak performance. I'd rather work with a model that consistently delivers complete (if slightly less brilliant) responses than one that gives me half-thoughts I have to constantly prompt to continue.

It's a shame because Google clearly has the underlying tech. But until they fix these basic conversation flow issues, Gemini will keep feeling broken compared to the competition, regardless of how it performs on benchmarks.

https://github.com/googleapis/js-genai/issues/707

https://discuss.ai.google.dev/t/gemini-2-5-pro-incomplete-re...

dorianmariecom•15m ago

chatgpt also has lots of reliability issues

diego_sandoval•3m ago

If anyone from OpenAI is reading this, I have two complaints:

1. Using the "Projects" thing (Folder organization) makes my browser tab (on Firefox) become unusably slow after a while. I'm basically forced to use the default chats organization, even though I would like to organize my chats in folders.

2. After editing a message that you already sent,you get to select between the different branches of the chat (1/2, and so on), which is cool, but when ChatGPT fails to generate a response in this "branched conversation" context, it will continue failing forever. When your conversation is a single thread and a ChatGPT message fails with an error, re trying usually works and the chat continues normally.

bogtog•23m ago

> Today, we are releasing updated versions of Gemini 2.5 Flash and 2.5 Flash-Lite, available on Google AI Studio and Vertex AI, aimed at continuing to deliver better quality while also improving the efficiency.

Typo in the first sentence? "... improving the efficiency." Gemini 2.5 Pro says this is perfectly good phrasing, whereas ChatGPT and Claude recognize that it's awkward or just incorrect. Hmm...

gpm•21m ago

"Improving the efficiency" sounds fine to me (a native English speaker), what's wrong with it in your opinion?

bre1010•17m ago

You would just say "improving efficiency". Whereas theirs is like: "Improving the efficiency [... of what?]"

latentnumber•17m ago

"the" is redundant is probably what GP means.

burkaman•13m ago

Usually you would say "improving the efficiency of x and y". In this case at the end of the sentence it should be "improving the models' efficiency" or just "improving efficiency". I don't think it's "wrong" and it's obviously clear what they mean, but I agree that the phrasing is a little awkward.

mwest217•18m ago

ChatGPT and Claude are mistaken if they think it is incorrect. The parallelism in verb tenses is between "continuing to deliver" and "improving the efficiency". It's a bit wordy, but definitely not wrong.

ahmedfromtunis•15m ago

I'm genuinely surprised to see that "thinking" flash-lite is more performant than flash with no "thinking".

simonw•9m ago

I added support to these models to my llm-gemini plugin, so you can run them like this (using uvx so no need to install anything first):

  export LLM_GEMINI_KEY='...'
  uvx --isolated --with llm-gemini llm -m gemini-flash-lite-latest 'An epic poem about frogs at war with ducks'

Release notes: https://github.com/simonw/llm-gemini/releases/tag/0.26

Pelicans: https://github.com/simonw/llm-gemini/issues/104#issuecomment...

canadiantim•2m ago

Who wins in the end? the frogs? the ducks? or the pelicans?

modeless•6m ago

Why are model providers allergic to version number increments?

dcchambers•4m ago

Why do all of these model providers have such issues naming/versioning them? Why even use a version number (2.5) if you aren't going to change it when you update the model?

This industry desperately needs a Steve Jobs to bring some sanity to the marketing.

Whole-Genome Sequencing Will Change Pregnancy

The Bullet as Shitpost: Media Won't Publish Manifestos but Scrawls on Casings Ok

Accenture to 'exit' staff that cannot be retrained for age of AI

Plutonium Removal from Sweden (2012)

Kagi Search Down

Spells, Skepticism, and Surrender: When the Agent Becomes the Author

Google Asks Supreme Court to Intervene in Dispute with Epic

Attention in CPU and stateless GPU compute for unlimited context windows

Prometheus Exporter for PostgreSQL

What's New in Aspire 9.5

Microsoft Disables Some Cloud Services Used by Israel's Defense Ministry

Meredith Whittaker on Reclaiming Privacy in the Age of AI [video]

VM Obfuscation using x86 MXCSR FPU exceptions

US, Canada scramble fighter jets to intercept Russian military planes off Alaska

OpenAPI Arazzo

Setting up Sun Ray Thin Clients [video]

Karpathy's Scale and Solving Horseless Carriages

The Wrong Kind of Mathematics

The Need for Accessibility Excellence

SF Scrambles to Shut Down Viral Parking Ticket Tracker

Building real time analytics for Stripe

Open Infrastructure Is Not Free: A Joint Statement on Sustainable Stewardship

Shoplifters could soon be chased down by drones

OpenAI: Elevated API error rates for GPT-5-nano

OpenAI: Elevated Errors for Code Interpreter in the API

More ways to work with your team and tools in ChatGPT

Liquid Nanos

Open Source Stem Knowledge Base

Accelerate video generation through sparse attention

Tracing JITs in the Real World CPython Core Dev Sprint