frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Gemini 3 Flash: frontier intelligence built for speed

https://blog.google/products/gemini/gemini-3-flash/
287•meetpateltech•1h ago•111 comments

AWS CEO says replacing junior devs with AI is 'one of the dumbest ideas'

https://www.finalroundai.com/blog/aws-ceo-ai-cannot-replace-junior-developers
167•birdculture•1h ago•95 comments

Coursera to combine with Udemy

https://investor.coursera.com/news/news-details/2025/Coursera-to-Combine-with-Udemy-to-Empower-th...
208•throwaway019254•5h ago•132 comments

Tell HN: HN was down

204•uyzstvqs•1h ago•141 comments

A Safer Container Ecosystem with Docker: Free Docker Hardened Images

https://www.docker.com/blog/docker-hardened-images-for-every-developer/
67•anttiharju•1h ago•8 comments

Notes on Sorted Data

https://amit.prasad.me/blog/sorted-data
24•surprisetalk•6d ago•1 comments

How, and why, I invented OnlyFans. In 2004

https://themosthandsomemanintheworld.com/how-and-why-i-invented-onlyfans-in-2004/
13•MrSkelter•29m ago•3 comments

Launch HN: Kenobi (YC W22) – Personalize your website for every visitor

10•sarreph•1h ago•21 comments

Flick (YC F25) Is Hiring Founding Engineer to Build Figma for AI Filmmaking

https://www.ycombinator.com/companies/flick/jobs/Tdu6FH6-founding-frontend-engineer
1•rayruiwang•1h ago

AI will make formal verification go mainstream

https://martin.kleppmann.com/2025/12/08/ai-formal-verification.html
739•evankhoury•21h ago•374 comments

Yep, Passkeys Still Have Problems

https://fy.blackhats.net.au/blog/2025-12-17-yep-passkeys-still-have-problems/
53•todsacerdoti•5h ago•16 comments

alpr.watch

https://alpr.watch/
846•theamk•1d ago•395 comments

Linux Kernel Rust Code Sees Its First CVE Vulnerability

https://www.phoronix.com/news/First-Linux-Rust-CVE
30•weinzierl•47m ago•11 comments

No Graphics API

https://www.sebastianaaltonen.com/blog/no-graphics-api
746•ryandrake•22h ago•137 comments

Announcing the Beta release of ty

https://astral.sh/blog/ty
728•gavide•21h ago•140 comments

AI's real superpower: consuming, not creating

https://msanroman.io/blog/ai-consumption-paradigm
133•firefoxd•9h ago•94 comments

Is Mozilla trying hard to kill itself?

https://infosec.press/brunomiguel/is-mozilla-trying-hard-to-kill-itself
625•pabs3•8h ago•550 comments

Learning the oldest programming language (2024)

https://uncenter.dev/posts/learning-fortran/
19•lioeters•4h ago•12 comments

No AI* Here – A Response to Mozilla's Next Chapter

https://www.waterfox.com/blog/no-ai-here-response-to-mozilla/
442•MrAlex94•20h ago•256 comments

TLA+ Modeling Tips

http://muratbuffalo.blogspot.com/2025/12/tla-modeling-tips.html
82•birdculture•10h ago•18 comments

Pricing Changes for GitHub Actions

https://resources.github.com/actions/2026-pricing-changes-for-github-actions/
743•kevin-david•1d ago•781 comments

GPT Image 1.5

https://openai.com/index/new-chatgpt-images-is-here/
485•charlierguo•1d ago•235 comments

Modern SID chip substitutes [video]

https://www.youtube.com/watch?v=nooPmXxO6K0
44•vismit2000•3d ago•2 comments

Mozilla appoints new CEO Anthony Enzor-Demeo

https://blog.mozilla.org/en/mozilla/leadership/mozillas-next-chapter-anthony-enzor-demeo-new-ceo/
563•recvonline•1d ago•842 comments

I created a publishing system for step-by-step coding guides in Typst

https://press.knowledge.dev/p/new-150-pages-rust-guide-create-a
4•deniskolodin•3d ago•2 comments

Thin desires are eating life

https://www.joanwestenberg.com/thin-desires-are-eating-your-life/
620•mitchbob•1d ago•209 comments

I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours

https://simonwillison.net/2025/Dec/15/porting-justhtml/
219•pbowyer•19h ago•120 comments

40 percent of fMRI signals do not correspond to actual brain activity

https://www.tum.de/en/news-and-events/all-news/press-releases/details/40-percent-of-mri-signals-d...
474•geox•1d ago•184 comments

Ford Has Steered Its Former EV Truck and Plant Plans in to a Ditch

https://512pixels.net/2025/12/ford-ev-changes/
10•zdw•1h ago•3 comments

Japan to revise romanization rules for first time in 70 years

https://www.japantimes.co.jp/news/2025/08/21/japan/panel-hepburn-style-romanization/
245•rgovostes•1d ago•199 comments
Open in hackernews

Gemini 3 Flash: frontier intelligence built for speed

https://blog.google/products/gemini/gemini-3-flash/
285•meetpateltech•1h ago

Comments

meetpateltech•1h ago
Deepmind Page: https://deepmind.google/models/gemini/flash/

Developer Blog: https://blog.google/technology/developers/build-with-gemini-...

Model Card [pdf]: https://deepmind.google/models/model-cards/gemini-3-flash/

Gemini 3 Flash in Search AI mode: https://blog.google/products/search/google-ai-mode-update-ge...

minimaxir•1h ago
Documentation for Gemini 3 Flash in particular: https://ai.google.dev/gemini-api/docs/gemini-3
simonw•59m ago
For anyone from the Gemini team reading this: these links should all be prominent in the announcement posts. I always have to hunt around for them!
meetpateltech•29m ago
Google actually does something similar for major releases - they publish a dedicated collection page with all related links.

For example, the Gemini 3 Pro collection: https://blog.google/products/gemini/gemini-3-collection/

But having everything linked at the bottom of the announcement post itself would be really great too!

GaggiX•1h ago
They went too far, now the Flash model is competing with their Pro version. Better SWE-bench, better ARC-AGI 2 than 3.0 Pro. I imagine they are going to improve 3.0 Pro before it's no more in Preview.

Also I don't see it written in the blog post but Flash supports more granular settings for reasoning: minimal, low, medium, high (like openai models), while pro is only low and high.

skerit•1h ago
> They went too far, now the Flash model is competing with their Pro version

Wasn't this the case with the 2.5 Flash models too? I remember being very confused at that time.

jug•1h ago
I'm not sure how I'm going to live with this!
minimaxir•1h ago
"minimal" is a bit weird.

> Matches the “no thinking” setting for most queries. The model may think very minimally for complex coding tasks. Minimizes latency for chat or high throughput applications.

I'd prefer a hard "no thinking" rule than what this is.

GaggiX•1h ago
It still supports the legacy mode of setting the budget, you can set it to 0 and it would be equivalent to none reasoning effort like gpt 5.1/5.2
samyok•1h ago
Don’t let the “flash” name fool you, this is an amazing model.

I have been playing with it for the past few weeks, it’s genuinely my new favorite; it’s so fast and it has such a vast world knowledge that it’s more performant than Claude Opus 4.5 or GPT 5.2 extra high, for a fraction (basically order of magnitude less!!) of the inference time and price

esafak•1h ago
What are you using it for and what were you using before?
jauntywundrkind•1h ago
Just to point this out: many of these frontier models cost isn't that far away from two orders of magnitude more than what DeepSeek charges. It doesn't compare the same, no, but with coaxing I find it to be a pretty capable competent coding model & capable of answering a lot of general queries pretty satisfactorily (but if it's a short session, why economize?). $0.28/m in, $0.42/m out. Opus 4.5 is $5/$25 (17x/60x).

I've been playing around with other models recently (Kimi, GPT Codex, Qwen, others) to try to better appreciate the difference. I knew there was a big price difference, but watching myself feeding dollars into the machine rather than nickles has also founded in me quite the reverse appreciation too.

I only assume "if you're not getting charged, you are the product" has to be somewhat in play here. But when working on open source code, I don't mind.

thecupisblue•54m ago
Oh wow - I recently tried 3 Pro preview and it was too slow for me.

After reading your comment I ran my product benchmark against 2.5 flash, 2.5 pro and 3.0 flash.

The results are better AND the response times have stayed the same. What an insane gain - especially considering the price compared to 2.5 Pro. I'm about to get much better results for 1/3rd of the price. Not sure what magic Google did here, but would love to hear a more technical deep dive comparing what they do different in Pro and Flash models to achieve such a performance.

Also wondering, how did you get early access? I'm using the Gemini API quite a lot and have a quite nice internal benchmark suite for it, so would love to toy with the new ones as they come out.

freedomben•38m ago
Cool! I've been using 2.5 flash and it is pretty bad. 1 out of 5 answers it gives will be a lie. Hopefully 3 is better
samyok•30m ago
Did you try with the grounding tool? Turning it on solved this problem for me.
Davidzheng•27m ago
what if the lie is a logical deduction error not a fact retrieval error
rat9988•3m ago
The error rate would still be improved overall and might make it a viable tool for the price depending on the usecase.
epolanski•37m ago
Gemini 2.0 flash was good already for some tasks of mine long time ago..
tanh•1h ago
Does this imply we don't need as much compute for models/agents? How can any other AI model compete against that?
doomerhunter•1h ago
Pretty stoked for this model. Building a lot with "mixture of agents" / mix of models and Gemini's smaller models do feel really versatile in my opinion.

Hoping that the local ones keep progressively up (gemma-line)

fariszr•1h ago
These flash models keep getting more expensive with every release.

Is there an OSS model that's better than 2.0 flash with similar pricing, speed and a 1m context window?

Edit: this is not the typical flash model, it's actually an insane value if the benchmarks match real world usage.

> Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications.

The replacement for old flash models will be probably the 3.0 flash lite then.

fullstackwife•1h ago
cost of e2e task resolution should be cheaper, even if single inference cost is higher, you need fewer loops to solve a problem now
fariszr•1h ago
Sure, but for simple tasks that require a large context window, aka the typical usecase for 2.0 flash, it's still significantly more expensive.
aoeusnth1•1h ago
I think it's good, they're raising the size (and price) of flash a bit and trying to position Flash as an actually useful coding / reasoning model. There's always lite for people who want dirt cheap prices and don't care about quality at all.
thecupisblue•38m ago
Yes, but the 3.0 Flash is cheaper, faster and better than 2.5 Pro.

So if 2.5 Pro was good for your usecase, you just got a better model for about 1/3rd of the price, but might hurt the wallet a bit more if you use 2.5 Flash currently and want an upgrade - which is fair tbh.

user_7832•1h ago
Two quick questions to Gemini/AI Studio users:

1, has anyone actually found 3 Pro better than 2.5 (on non code tasks)? I struggle to find a difference beyond the quicker reasoning time and fewer tokens.

2, has anyone found any non-thinking models better than 2.5 or 3 Pro? So far I find the thinking ones significantly ahead of non thinking models (of any company for that matter.)

Workaccount2•1h ago
Gemini 3 is a step change up against 2.5 for electrical engineering R&D.
tmaly•1h ago
Not for coding but for the design aspect, 3 outshines 2.5
Davidzheng•1h ago
I think it's probably actually better at math. Though still not enough to be useful in my research in a substantial way. Though I suspect this will change suddenly at some point as the models move past a certain threshold (also it is heavily limited by the fact that the models are very bad at not giving wrong proofs/counterexamples) so that even if the models are giving useful rates of successes, the labor to sort through a bunch of trash makes it hard to justify.
jug•1h ago
Looks like a good workhorse model, like I felt 2.5 Flash also was at its time of launch. I hope I can build confidence with it because it'll be good to offload Pro costs/limits as well of course always nice with speed for more basic coding or queries. I'm impressed and curious about the recent extreme gains on ARC-AGI-2 from 3 Pro, GPT-5.1 and now even 3 Flash.
simonsarris•1h ago
Even before this release the tools (for me: Claude Code and Gemini for other stuff) reached a "good enough" plateau that means any other company is going to have a hard time making me (I think soon most users) want to switch. Unless a new release from a different company has a real paradigm shift, they're simply sufficient. This was not true in 2023/2024 IMO.

With this release the "good enough" and "cheap enough" intersect so hard that I wonder if this is an existential threat to those other companies.

theLiminator•1h ago
For me, the last wave of models finally started delivering on their agentic coding promises.
orourke•44m ago
This has been my experience exactly. Even over just the last few weeks I’ve noticed a dramatic drop in having to undo what the agents have done.
nprateem•1h ago
But for me the previous models were routinely wrong time wasters that overall added no speed increase taking the lottery of whether they'd be correct into account.
bgirard•1h ago
Why wouldn't you switch? The cost to switch is near zero for me. Some tools have built in model selectors. Direct CLI/IDE plug-ins practically the same UI.
azuanrb•43m ago
Not OP, but I feel the same way. Cost is just one of the factor. I'm used to Claude Code UX, my CLAUDE.md works well with my workflow too. Unless there's any significant improvement, changing to new models every few months is going to hurt me more.
bgirard•7m ago
I used to think this way. But I moved to AGENTS.md. Now I use the different UI as a mental context separation. Codex is working on Feature A, Gemini on feature B, Claude on Feature C. It has become a feature.
calflegal•1h ago
I asked a similar question yesterday:

https://news.ycombinator.com/item?id=46290797

catigula•34m ago
Correct. Opus 4.5 'solved' software engineering. What more do I need? Businesses need uncapped intelligence, and that is a very high bar. Individuals often don't.
Workaccount2•1h ago
Really hoping this is used for real time chatting and video. The current model is decent, but when doing technical stuff (help me figure out how to assemble this furniture) it falls far short of 3 pro.
Tiberium•1h ago
Yet again Flash receives a notable price hike: from $0.3/$2.5 for 2.5 Flash to $0.5/$3 (+66.7% input, +20% output) for 3 Flash. Also, as a reminder, 2 Flash used to be $0.1/$0.4.
BeetleB•1h ago
Yes, but this Flash is a lot more powerful - beating Gemini 3 Pro on some benchmarks (and pretty close on others).

I don't view this as a "new Flash" but as "a much cheaper Gemini 3 Pro/GPT-5.2"

Tiberium•1h ago
I would be less salty if they gave us 3 Flash Lite at same price as 2.5 Flash or cheaper with better capability, but they still focus on the pricier models :(
zzleeper•1h ago
Same! I want to do some data stuff from documents and 2.0 pricing was amazing, but the constant increases go the wrong way for this task :/
jexe•4m ago
Right, depends on your use cases. I was looking forward to the model as an upgrade to 2.5 Flash, but when you're processing hundreds of millions of tokens a day (not hard to do if you're dealing in documents or emails with a few users), the economics fall apart.
bennydog224•1h ago
From the article, speed & cost match 2.5 Flash. I'm working on a project where there's a huge gap between 2.5 Flash and 2.5 Flash Lite as far as performance and cost goes.

-> 2.5 Flash Lite is super fast & cheap (~1-1.5s inference), but poor quality responses.

-> 2.5 Flash gives high quality responses, but fairly expensive & slow (5-7s inference)

I really just need an in-between for Flash and Flash Lite for cost and performance. Right now, users have to wait up to 7s for a quality response.

acheong08•1h ago
Thinking along the line of speed, I wonder if a model that can reason and use tools at 60fps would be able to control a robot with raw instructions and perform skilled physical work currently limited by the text-only output of LLMs. Also helps that the Gemini series is really good at multimodal processing with images and audio. Maybe they can also encode sensory inputs in a similar way.

Pipe dream right now, but 50 years later? Maybe

iamgopal•1h ago
Much sooner, hardware, power, software, even AI model design, inference hardware, cache, everything being improved , it's exponential.
incognito124•56m ago
Believe it or not, there's Gemini Robotics, which seems to be exactly what you're talking about:

https://deepmind.google/models/gemini-robotics/

Previous discussions: https://news.ycombinator.com/item?id=43344082

bearjaws•1h ago
I've been using the preview flash model exclusively since it came out, the speed and quality of response is all I need at the moment. Although still using Claude Code w/ Opus 4.5 for dev work.

Google keeps their models very "fresh" and I tend to get more correct answers when asking about Azure or O365 issues, ironically copilot will talk about now deleted or deprecated features more often.

sv123•51m ago
I've found copilot within the Azure portal to be basically useless for solving most problems.
djeastm•40m ago
Me too. I don't understand why companies think we devs need a custom chat on their website when we all have access to a chat with much smarter models open in a different tab.
moralestapia•1h ago
Not only it is fast, it is also quite cheap, nice!
__jl__•1h ago
This is awesome. No preview release either, which is great to production.

They are pushing the prices higher with each release though: API pricing is up to $0.5/M for input and $3/M for output

For comparison:

Gemini 3.0 Flash: $0.50/M for input and $3.00/M for output

Gemini 2.5 Flash: $0.30/M for input and $2.50/M for output

Gemini 2.0 Flash: $0.15/M for input and $0.60/M for output

Gemini 1.5 Flash: $0.075/M for input and $0.30/M for output (after price drop)

Gemini 3.0 Pro: $2.00/M for input and $12/M for output

Gemini 2.5 Pro: $1.25/M for input and $10/M for output

Gemini 1.5 Pro: $1.25/M for input and $5/M for output

I think image input pricing went up even more.

Correction: It is a preview model...

srameshc•1h ago
Thanks that was a great breakup of cost. I just assumed before that it was the same pricing. The pricing probably comes from the confidence and the buzz around Gemini 3.0 as one of the best performing models. But competetion is hot in the area and it's not too far where we get similar performing models for cheaper price.
mips_avatar•1h ago
I'm more curious how Gemini 3 flash lite performs/is priced when it comes out. Because it may be that for most non coding tasks the distinction isn't between pro and flash but between flash and flash lite.
uluyol•55m ago
Are these the current prices or the prices at the time the models were released?
__jl__•32m ago
Mostly at the time of release except for 1.5 Flash which got a price drop in Aug 2024.

Google has been discontinuing older models after several months of transition period so I would expect the same for the 2.5 models. But that process only starts when the release version of 3 models is out (pro and flash are in preview right now).

hubraumhugo•1h ago
You can get your HN profile analyzed and roasted by it. It's pretty funny :) https://hn-wrapped.kadoa.com
echelon•1h ago
This is hilarious. The personalized pie charts and XKCD-style comics are great, and the roast-style humor is perfect.

I do feel like it's not an entirely accurate caricature (recency bias? limited context?), but it's close enough.

Good work!

You should do a "show HN" if you're not worried about it costing you too much.

WhereIsTheTruth•49m ago
This is exactly why you keep your personal life off the internet
peheje•40m ago
This is great. I literally "LOL'd".
onraglanroad•15m ago
I didn't feel roasted at all. In fact I feel vindicated! https://hn-wrapped.kadoa.com/onraglanroad
poplarsol•1h ago
Will be interesting to see what their quota is. Gemini 3.0 Pro only gives you 250 / day until you spam them with enough BS requests to increase your total spend > $250.
andrepd•1h ago
Is there a way to try this without a Google account?
mschulkind•59m ago
Just use openrouter or a similar aggregator.
primaprashant•1h ago
Pricing is $0.5 / $3 per million input / output tokens. 2.5 Flash was $0.3 / $2.5. That's 66% increase in input tokens and 20% increase in output token pricing.

For comparison, from 2.5 Pro ($1.25 / $10) to 3 Pro ($2 / $12), there was 60% increase in input tokens and 20% increase in output tokens pricing.

simonw•59m ago
Calculating price increases is made more complex by the difference in token usage. From https://blog.google/products/gemini/gemini-3-flash/ :

> Gemini 3 Flash is able to modulate how much it thinks. It may think longer for more complex use cases, but it also uses 30% fewer tokens on average than 2.5 Pro.

imvetri•1h ago
this is why samsung is stopping production in flash
Tepix•57m ago
This is why they stopped The Flash after season 9 in 2023.
SyrupThinker•1h ago
I wonder if this suffers from the same issue as 3 Pro, that it frequently "thinks" for a long time about date incongruity, insisting that it is 2024, and that information it receives must be incorrect or hypothetical.

Just avoiding/fixing that would probably speed up a good chunk of my own queries.

robrenaud•1h ago
Omg, it was so frustrating to say:

Summarize recent working arxiv url

And then it tells me the date is from the future and it simply refuses to fetch the URL.

Fiveplus•1h ago
It is interesting to see the "DeepMind" branding completely vanish from the post. This feels like the final consolidation of the Google Brain merger. The technical report mentions a new "MoE-lite" architecture. Does anyone have details on the parameter count? If this is under 20B params active, the distillation techniques they are using are lightyears ahead of everyone else.
JeremyHerrman•1h ago
Disappointed to see continued increased pricing for 3 Flash (up from $0.30/$2.50 to $0.50/$3.00 for 1M input/output tokens).

I'm more excited to see 3 Flash Lite. Gemini 2.5 Flash Lite needs a lot more steering than regular 2.5 Flash, but it is a very capable model and combined with the 50% batch mode discount it is CHEAP ($0.05/$0.20).

jeppebemad•52m ago
Have you seen any indications that there will be a Lite version?
summerlight•11m ago
I guess if they want to eventually deprecate the 2.5 family they will need to provide a substitute. And there are huge demands for cheap models.
xnx•1h ago
OpenAI is pretty firmly in the rear-view mirror now.
walthamstow•1h ago
Google Antigravity is a buggy mess at the moment, but I believe it will eventually eat Cursor as well. The £20/mo tier currentluy has the highest usage limits on the market, including Google models and Sonnet and Opus 4.5.
rohitpaulk•1h ago
Wild how this beats 2.5 Pro in every single benchmark. Don't think this was true for Haiku 4.5 vs Sonnet 3.5.
FergusArgyll•49m ago
Sonnet 3.5 might have been better than opus 3. That's my recollection anyhow
walthamstow•1h ago
I'm sure it's good, I thought the last one was too, but it seems like the backdoor way to increase prices is to release a new model
jeffbee•1h ago
If the model is better in that it resolves the task with fewer iterations then the i/o token pricing may be a wash or lower.
kingstnap•1h ago
It has a SimpleQA score of 69%, a benchmark that tests knowledge on extremely niche facts, that's actually ridiculously high (Gemini 2.5 *Pro* had 55%) and reflects either training on the test set or some sort of cracked way to pack a ton of parametric knowledge into a Flash Model.

I'm speculating but Google might have figured out some training magic trick to balance out the information storage in model capacity. That or this flash model has huge number of parameters or something.

GaggiX•48m ago
>or some sort of cracked way to pack a ton of parametric knowledge into a Flash Model.

More experts with a lower pertentage of active ones -> more sparsity.

tanh•21m ago
This will be fantastic for voice. I presume Apple will use it
whinvik•1h ago
Ok, I was a bit addicted to Opus 4.5 and was starting to feel like there's nothing like it.

Turns out Gemini 3 Flash is pretty close. The Gemini CLI is not as good but the model more than makes up for it.

The weird part is Gemini 3 Pro is nowhere as good an experience. Maybe because its just so slow.

__jl__•29m ago
I will have to try that. Cursor bill got pretty high with Opus 4.5. Never considered opus before the 4.5 price drop but now it's hard to change... :)
FergusArgyll•1h ago
So much for "Monopolies get lazy, they just rent seek and don't innovate"
concinds•58m ago
The LLM market has no moats so no one "feels" like a monopoly, rightfully.
NitpickLawyer•52m ago
Also so much for the "wall, stagnation, no more data" folks. Womp womp.
deskamess•11m ago
Monopolies and wanna-be monopolies on the AI-train are running for their lives. They have to innovate to be the last one standing (or second last) - in their mind.
incrudible•9m ago
LLMs are a big threat to their search engine revenue, so whatever monopoly Google may have had does not exist anymore.
jtrn•1h ago
This is the first flash/mini model that doesn't make a complete ass of itself when I prompt for the following: "Tell me as much as possible about Skatval in Norway. Not general information. Only what is uniquely true for Skatval."

Skatval is a small local area I live in, so I know when it's bullshitting. Usually, I get a long-winded answer that is PURE Barnum-statement, like "Skatval is a rural area known for its beautiful fields and mountains" and bla bla bla.

Even with minimal thinking (it seems to do none), it gives an extremely good answer. I am really happy about this.

I also noticed it had VERY good scores on tool-use, terminal, and agentic stuff. If that is TRUE, it might be awesome for coding.

I'm tentatively optimistic about this.

amunozo•47m ago
I tried the same with my father's little village (Zarza Capilla, in Spain), and it gave a surprisingly good answer in a couple of seconds. Amazing.
kingstnap•39m ago
You are effectively describing SimpleQA but with a single question instead of a comprehensive benchmark and you can note the dramatic increase in performance there.
simonw•1h ago
Quick pricing comparison: https://www.llm-prices.com/#it=100000&ot=10000&sel=gemini-3-...

It's 1/4 the price of Gemini 3 Pro ≤200k and 1/8 the price of Gemini 3 Pro >200k - notable that the new Flash model doesn’t have a price increase after that 200,000 token point.

It’s also twice the price of GPT-5 Mini for input, half the price of Claude 4.5 Haiku.

zhyder•58m ago
Glad to see big improvement in the SimpleQA Verified benchmark (28->69%), which is meant to measure factuality (built-in, i.e. without adding grounding resources). That's one benchmark where all models seemed to have low scores until recently. Can't wait to see a model go over 90%... then will be years till the competition is over number of 9s in such a factuality benchmark, but that'd be glorious.
tootyskooty•55m ago
Since it now includes 4 thinking levels (minimal-high) I'd really appreciate if we got some benchmarks across the whole sweep (and not just what's presumably high).

Flash is meant to be a model for lower cost, latency-sensitive tasks. Long thinking times will both make TTFT >> 10s (often unacceptable) and also won't really be that cheap?

caminanteblanco•55m ago
Does anyone else understand what the difference is between Gemini 3 'Thinking' and 'Pro'? Thinking "Solves complex problems" and Pro "Thinks longer for advanced math & code".

I assume that these are just different reasoning levels for Gemini 3, but I can't even find mention of there being 2 versions anywhere, and the API doesn't even mention the Thinking-Pro dichotomy.

flakiness•46m ago
It seems:

   - "Thinking" is Gemini 3 Flash with higher "thinking_level"
   - Prop is Gemini 3 Pro. It doesn't mention "thinking_level" but I assume it is set to high-ish.
peheje•42m ago
I think:

Fast = Gemini 3 Flash without thinking (or very low thinking budget)

Thinking = Gemini 3 flash with high thinking budget

Pro = Gemini 3 Pro with thinking

speedgoose•50m ago
I’m wondering why Claude Opus 4.5 is missing from the benchmarks table.
anonym29•45m ago
I wondered this, too. I think the emphasis here was on the faster / lower costs models, but that would suggest that Haiku 4.5 should be the Anthropic entry on the table instead. They also did not use the most powerful xAI model either, instead opting for the fast one. Regardless, this new Gemini 3 Flash model is good enough that Anthropic should be feeling pressure on both price and model output quality simultaneously regardless of which Anthropic model is being compared against, which is ultimately good for the consumer at the end of the day.
anonym29•49m ago
I never have, do not, and conceivably never will use gemini models, or any other models that require me to perform inference on Alphabet/Google's servers (i.e. gemma models I can run locally or on other providers are fine), but kudos to the team over there for the work here, this does look really impressive. This kind of competition is good for everyone, even people like me who will probably never touch any gemini model.
nickvec•40m ago
So is Gemini 3 Fast the same as Gemini 3 Flash?
alach11•34m ago
I really wish these models were available via AWS or Azure. I understand strategically that this might not make sense for Google, but at a non-software-focused F500 company it would sure make it a lot easier to use Gemini.
holler•16m ago
Agree, when all your sensitive data is in one cloud, it'd be much easier to just leverage the already great llm service there (Bedrock).
heliophobicdude•32m ago
Any word on if this using their diffusion architecture?
k8sToGo•30m ago
I remember the preview price for 2.5 flash was much cheaper. And then it got quite expensive when it went out of preview. I hope the same won't happen.
jijji•21m ago
I tried Gemini CLI the other day, typed in two one line requests, then it responded that it would not go further because I ran out of tokens. I've hard other people complaint that it will re-write your entire codebase from scratch and you should make backups before even starting any code-based work with the Gemini CLI. I understand they are trying to compete against Claude Code, but this is not ready for prime time IMHO.
outside2344•2m ago
I don't want to say OpenAI is toast for general chat AI, but it sure looks like they are toast.