Gemini 3 Pro Model Card

121•Topfi•1h ago

Comments

surrTurr•1h ago

https://news.ycombinator.com/item?id=45963670

margorczynski•48m ago

If these numbers are true then OpenAI is probably done, Anthropic too. Still, it's hard to see an effective monetization method for this tech and it clearly is eating Google's main pie which is search.

Sol-•45m ago

Why? These models just leapfrog each other as time advances.

One month Gemini is on top, then ChatGPT, then Anthropic. Not sure why everyone gets FOMO whenever a new version gets released.

remus•35m ago

I think google is uniquely well placed to make a profitable business out of AI: They make their own TPUs so don't have to pay ridiculous amounts of money to Nvidia, they have a great depth of talent in building models, they've got loads of data they can use for training and they've got a huge existing customer base who can buy their AI offerings.

I don't think any other company has all these ingredients.

gizmodo59•31m ago

While I don’t disagree that Google is the company you can’t bet against when it comes to AI, saying other companies are done is a stretch. If they have a significant moat then they should be at the top all the time by then which is not the case though.

remus•27m ago

Agreed, too early to write off others entirely. It'll be interesting to see who comes out the other side of the bubble with a working business.

adriand•9m ago

Anthropic has a fairly significant lead when it comes to enterprise usage and for coding. This seems like a workable business model to me.

mlnj•28m ago

100% the reason I am long on Google. They can take their time to monetize these new costs.

Even other search competitors have not proven to be a danger to Google. There is nothing stopping that search money coming in.

redox99•27m ago

Considering GPT 5 was only recently released, it's very unlikely GPT will achieve these scores in just a couple of months. If they had something this good in the oven, they'd probably left the GPT 5 name to it.

Or maybe Google just benchmaxxed and this doesn't translate at all in real world performance.

Palmik•7m ago

GPT 5 was released more than 3 months ago. Gemini 2.5 was released less than 8 months ago.

happa•40m ago

This may just be bad recollection from my part, but hasn't Google reported that their search business is right now the most profitable it has ever been?

senordevnyc•38m ago

1) New SOTA models come out all the time and that hasn't killed the other major AI companies. This will be no different.

2) Google's search revenue last quarter was $56 billion, a 14% increase over Q3 2024.

margorczynski•7m ago

1) Not long ago Altman and the OpenAI CFO were openly asking for public money. None of these AI companies have actually any kind of working business plan and are just burning investor money. If the investors see there is no winning against Google (or some open Chinese model) the money will dry up.

2) I'm not suggesting this will happen overnight but especially younger people gravitate towards LLM for information search + actively use some sort of ad blocking. In the long run it doesn't look great for Google.

paswut•35m ago

I'd love to see anthropic/openai pop. back to some regular programming. the models are good enough, time to invest elsewhere

ilaksh•29m ago

The only one it doesn't win is SWE bench which it is significantly behind Claude Sonnet. You just can't take down Sonnet.

stavros•23m ago

Codex has been much better than Sonnet for me.

dotancohen•15m ago

On what types of tasks?

lukev•27m ago

Or else it trained/overfit to the benchmarks. We won't really know until people have a chance to use it for real-world tasks.

Also, models are already pretty good but product/market fit (in terms of demonstrated economic value delivered) remains elusive outside of a couple domains. Does a model that's (say) 30% better reach an inflection point that changes that narrative, or is a more qualitative change required?

alecco•23m ago

For SWE it is the same ranking. But if Google's $20/mo plan is comparable to the $100-200 plans for OpenAI and Anthropic, yes they are done.

But we'll have to wait a few weeks to see if the nerfed model post-release is still as good.

Traubenfuchs•42m ago

So does google actually have a claude console alternative currently?

rjtavares•41m ago

Gemini CLI

muro•40m ago

https://github.com/google-gemini/gemini-cli

itsmevictor•37m ago

Noteworthily, although Gemini 3 Pro seems to have much benchmark scores than other models across the board (including compared to Claude), it's not the case for coding, where it appears to score essentially the same as the others. I wonder why that is.

So far, IMHO, Claude Code remains significantly better than Gemini CLI. We'll see whether that changes with Gemini 3.

decster•22m ago

from my experience, the quality of gemini-cli isn't great, experiencing lot of stupied bug.

BoredPositron•20m ago

Gemini performs better if you use it with Claude Code than with Gemini cli. It still has some odd problems with tool calling but a lot of the performance loss is the Gemini cli app itself.

lifthrasiir•19m ago

Probably because many models from Anthropic would have been optimized for agentic coding in particular...

EDIT: Don't disagree that Gemini CLI has a lot of rough edges, though.

Lionga•18m ago

Because benchmark are a retarded comparison and having nothing to do with reality. Its just jerk material for AI Fanboys

laborcontract•40m ago

It's hilarious that the release of Gemini 3 is getting eclipsed by this cloudflare outage.

senordevnyc•37m ago

It hasn't been released, this is just a leak

amarcheschi•33m ago

On reddit I see it's already available on cursor

https://www.reddit.com/r/Bard/comments/1p093fb/gemini_3_in_c...

yen223•25m ago

Coincidence? Yes

scrlk•38m ago

Benchmarks from pg 4 of the system card:

    | Benchmark                                    | 3 Pro         | 2.5 Pro       | Sonnet 4.5   | GPT-5.1     |
    |--------------------------------------------- |---------------|---------------|------------- |-------------|
    | Humanity’s Last Exam                         | 37.5%         | 21.6%         | 13.7%        | 26.5%       |
    | ARC-AGI-2                                    | 31.1%         | 4.9%          | 13.6%        | 17.6%       |
    | GPQA Diamond                                 | 91.9%         | 86.4%         | 83.8%        | 88.1%       |
    | AIME 2025 (no tools / with code execution)   | 95.0% / 100%  | 88.0% / —     | 87.0% / 100% | 88.0% / —   |
    | MathArena Apex                               | 23.4%         | 0.5%          | 1.6%         | 1.0%        |
    | MMMU-Pro                                     | 81.0%         | 68.0%         | 68.0%        | 80.8%       |
    | ScreenSpot-Pro                               | 72.7%         | 11.4%         | 36.2%        | 3.5%        |
    | CharXiv Reasoning                            | 81.4%         | 69.6%         | 68.5%        | 69.5%       |
    | OmniDocBench 1.5                             | 0.115         | 0.145         | 0.145        | 0.147       |
    | Video-MMMU                                   | 87.6%         | 83.6%         | 77.8%        | 80.4%       |
    | LiveCodeBench Pro                            | 2,439         | 1,775         | 1,418        | 2,243       |
    | Terminal-Bench 2.0                           | 54.2%         | 32.6%         | 42.8%        | 47.6%       |
    | SWE-Bench Verified                           | 76.2%         | 59.6%         | 77.2%        | 76.3%       |
    | t2-bench                                     | 85.4%         | 54.9%         | 84.7%        | 80.2%       |
    | Vending-Bench 2                              | $5,478.16     | $573.64       | $3,838.74    | $1,473.43   |
    | FACTS Benchmark Suite                        | 70.5%         | 63.4%         | 50.4%        | 50.8%       |
    | SimpleQA Verified                            | 72.1%         | 54.5%         | 29.3%        | 34.9%       |
    | MMLU                                         | 91.8%         | 89.5%         | 89.1%        | 91.0%       |
    | Global PIQA                                  | 93.4%         | 91.5%         | 90.1%        | 90.9%       |
    | MRCR v2 (8-needle) (128k avg / 1M pointwise) | 77.0% / 26.3% | 58.0% / 16.4% | 47.1% / n/a  | 61.6% / n/a |

manmal•6m ago

Looks like it will be on par with the contenders when it comes to coding. I guess improvements will be incremental from here on out.

CjHuber•3m ago

If it’s on par in code quality, it would be a way better model for coding because of its huge context window.

Alifatisk•5m ago

These numbers are impressive, at least to say. It looks like Google has produced a beast that will raise the bar even higher. What's even more impressive is how Google came into this game late and went from producing a few flops to being the leader at this (actually, they already achieved the title with 2.5 Pro).

What makes me even more curious is the following

> Model dependencies: This model is not a modification or a fine-tune of a prior model

So did they start from scratch with this one?

benob•2m ago

What does it mean nowadays to start from scratch? At least in the open scene, most of the post-training data is generated by other LLMs.

falcor84•4m ago

That looks impressive, but some of the are a bit out of date.

On Terminal-Bench 2 for example, the leader is currently "Codex CLI (GPT-5.1-Codex)" at 57.8%, beating this new release.

oalessandr•27m ago

Trying to open this link from Italy leads to a CSAM warning

Jowsey•15m ago

Pixeldrain is a free anonymous file host, which unfortunately goes hand-in-hand with this kind of thing.

Fornax96•11m ago

Creator of pixeldrain here. Italy has been doing this for a very long time. They never notified me of any such material being present on my site. I have a lot of measures in place to prevent the spread of CSAM. I have sent dozens of mails to Polizia Postale and even tried calling them a few times, but they never respond. My mails go unanswered and they just hang up the phone.

embedding-shape•27m ago

Curiously, this website seems to be blocked in Spain for whatever reason, and the website's certificate is served by `allot.com/emailAddress=info@allot.com` which obviously fails...

Anyone happen to know why? Is this website by any change sharing information on safe medical abortions or women's rights, something which has gotten websites blocked here before?

amarcheschi•25m ago

That website is used to share everything including pirated things, so that's the reason maybe

Fornax96•18m ago

Creator of pixeldrain here. I have no idea why my site is blocked in Spain, but it's a long running issue.

I actually never discovered who was responsible for the blockade, until I read this comment. I'm going to look into Allot and send them an email.

EDIT: Also, your DNS provider is censoring (and probably monitoring) your internet traffic. I would switch to a different provider.

transcriptase•24m ago

There needs to be a sycophancy benchmark in these comparisons. More baseless praise and false agreement = lower score.

swalsh•22m ago

You're absolutely right

jstummbillig•12m ago

Does not get old.

BoredPositron•11m ago

Your comment demonstrates a remarkably elevated level of cognitive processing and intellectual rigor. Inquiries of this caliber are indicative of a mind operating at a strategically advanced tier, displaying exceptional analytical bandwidth and thought-leadership potential. Given the substantive value embedded in your question, it is operationally imperative that we initiate an immediate deep-dive and execute a comprehensive response aligned with the strategic priorities of this discussion.

postalcoder•3m ago

I care very little about model personality outside of sycophancy. The thing about gemini is that it's notorious for its low self esteem. Given that thing is trained from scratch, I'm very curious to see how they've decided to take it.

jll29•17m ago

Hopefully this model does not generate fake news...

https://www.google.com/search?q=gemini+u.s.+senator+rape+all...

jll29•17m ago

Gemini 3 > Gemma? Hopefully this model does not generate fake news...

https://www.google.com/search?q=gemini+u.s.+senator+rape+all...

lxdlam•16m ago

What does the "Google Antigravity" mean? The link is http://antigravity.google/docs, seemingly a new product but now routing to the Google main page.

Palmik•14m ago

Archive link: https://web.archive.org/web/20251118111103/https://storage.g...

denysvitali•11m ago

Title of the document is "[Gemini 3 Pro] External Model Card - November 18, 2025 - v2", in case you needed further confirmation that the model will be released today.

Also interesting to know that Google Antigravity (antigravity.google / https://github.com/Google-Antigravity ?) leaked. I remember seeing this subdomain recently. Probably Gemini 3 related as well

jmkni•6m ago

what is Google Antigravity?

denysvitali•3m ago

I guess we'll know it in a few hours. Most likely another AI playground? No clue really

catigula•10m ago

I know this is a little controversial but the lack of performance on SWE-bench is hugely disappointing I think economically. These models don’t have any viable path to profitability if they can’t take engineering jobs.

mohsen1•9m ago

     This model is not a modification or a fine-tune of a prior model

Is that common to mention that? Feels like they built something from scratch

mynti•3m ago

It is interesting that the Gemini 3 beats every other model on these benchmarks, mostly by a wide margin, but not on SWE Bench. Sonnet is still king here and all three look to be basically on the same level. Kind of wild to see them hit such a wall when it comes to agentic coding

Laptop Isn't Ready for LLMs

ChatGPT Is Down

What is the chance your plane will be hit by space debris?

CS Pluang ID

GitHub Project Search and Discovery

LLMs and Creation Outside of Time

EU countries agree on voluntary chat monitoring (German)

Pusat Bantuan Shopee

AI Creates the First 100-Billion-Star Simulation of the Milky Way

Depth Anything 3: Recovering the Visual Space from Any Views

Google Antigravity

Show HN: Hair Glow Up – AI hair transformations with complete vibe templates

Shopee Paylater

The Problems with Network Tiers

Show HN: Euroelo, a ranking of European football teams

LaLiga: ISPs Must Join Anti-Piracy War to Secure Broadcasting Rights

Nomor WhatsApp Bank Permata

Hal

The AI Bubble That Isn't There

Do Not Put Your Site Behind Cloudflare If You Don't Need To

Cheese Wars: Rise of the Vibe Coder

Show HN: A transparent, multi-source news analyzer

Multiple Digital Ocean services down

Kerala becomes first state in India to eradicate extreme poverty

It's not surprising that 95% of AI enterprise projects fail

Towards interplanetary QUIC traffic with Rust Quinn

Google Gemini 3 Pro Model Card [pdf]

The Ochre Origins of Art

Show HN: Filtered GitHub Trends

SplitFlow: Flow Decomposition for Inversion-Free Text-to-Image Editing