One month Gemini is on top, then ChatGPT, then Anthropic. Not sure why everyone gets FOMO whenever a new version gets released.
I don't think any other company has all these ingredients.
Even other search competitors have not proven to be a danger to Google. There is nothing stopping that search money coming in.
Or maybe Google just benchmaxxed and this doesn't translate at all in real world performance.
2) Google's search revenue last quarter was $56 billion, a 14% increase over Q3 2024.
2) I'm not suggesting this will happen overnight but especially younger people gravitate towards LLM for information search + actively use some sort of ad blocking. In the long run it doesn't look great for Google.
Also, models are already pretty good but product/market fit (in terms of demonstrated economic value delivered) remains elusive outside of a couple domains. Does a model that's (say) 30% better reach an inflection point that changes that narrative, or is a more qualitative change required?
But we'll have to wait a few weeks to see if the nerfed model post-release is still as good.
So far, IMHO, Claude Code remains significantly better than Gemini CLI. We'll see whether that changes with Gemini 3.
EDIT: Don't disagree that Gemini CLI has a lot of rough edges, though.
https://www.reddit.com/r/Bard/comments/1p093fb/gemini_3_in_c...
| Benchmark | 3 Pro | 2.5 Pro | Sonnet 4.5 | GPT-5.1 |
|--------------------------------------------- |---------------|---------------|------------- |-------------|
| Humanity’s Last Exam | 37.5% | 21.6% | 13.7% | 26.5% |
| ARC-AGI-2 | 31.1% | 4.9% | 13.6% | 17.6% |
| GPQA Diamond | 91.9% | 86.4% | 83.8% | 88.1% |
| AIME 2025 (no tools / with code execution) | 95.0% / 100% | 88.0% / — | 87.0% / 100% | 88.0% / — |
| MathArena Apex | 23.4% | 0.5% | 1.6% | 1.0% |
| MMMU-Pro | 81.0% | 68.0% | 68.0% | 80.8% |
| ScreenSpot-Pro | 72.7% | 11.4% | 36.2% | 3.5% |
| CharXiv Reasoning | 81.4% | 69.6% | 68.5% | 69.5% |
| OmniDocBench 1.5 | 0.115 | 0.145 | 0.145 | 0.147 |
| Video-MMMU | 87.6% | 83.6% | 77.8% | 80.4% |
| LiveCodeBench Pro | 2,439 | 1,775 | 1,418 | 2,243 |
| Terminal-Bench 2.0 | 54.2% | 32.6% | 42.8% | 47.6% |
| SWE-Bench Verified | 76.2% | 59.6% | 77.2% | 76.3% |
| t2-bench | 85.4% | 54.9% | 84.7% | 80.2% |
| Vending-Bench 2 | $5,478.16 | $573.64 | $3,838.74 | $1,473.43 |
| FACTS Benchmark Suite | 70.5% | 63.4% | 50.4% | 50.8% |
| SimpleQA Verified | 72.1% | 54.5% | 29.3% | 34.9% |
| MMLU | 91.8% | 89.5% | 89.1% | 91.0% |
| Global PIQA | 93.4% | 91.5% | 90.1% | 90.9% |
| MRCR v2 (8-needle) (128k avg / 1M pointwise) | 77.0% / 26.3% | 58.0% / 16.4% | 47.1% / n/a | 61.6% / n/a |What makes me even more curious is the following
> Model dependencies: This model is not a modification or a fine-tune of a prior model
So did they start from scratch with this one?
On Terminal-Bench 2 for example, the leader is currently "Codex CLI (GPT-5.1-Codex)" at 57.8%, beating this new release.
Anyone happen to know why? Is this website by any change sharing information on safe medical abortions or women's rights, something which has gotten websites blocked here before?
I actually never discovered who was responsible for the blockade, until I read this comment. I'm going to look into Allot and send them an email.
EDIT: Also, your DNS provider is censoring (and probably monitoring) your internet traffic. I would switch to a different provider.
https://www.google.com/search?q=gemini+u.s.+senator+rape+all...
https://www.google.com/search?q=gemini+u.s.+senator+rape+all...
Also interesting to know that Google Antigravity (antigravity.google / https://github.com/Google-Antigravity ?) leaked. I remember seeing this subdomain recently. Probably Gemini 3 related as well
This model is not a modification or a fine-tune of a prior model
Is that common to mention that? Feels like they built something from scratch
surrTurr•1h ago