frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Cloudflare Global Network experiencing issues

https://www.cloudflarestatus.com/?t=1
1179•imdsm•2h ago•865 comments

How Quake.exe got its TCP/IP stack

https://fabiensanglard.net/quake_chunnel/index.html
237•billiob•5h ago•32 comments

The Miracle of Wörgl

https://scf.green/story-of-worgl-and-others/
49•simonebrunozzi•2h ago•24 comments

GoSign Desktop RCE flaws affecting users in Italy

https://www.ush.it/2025/11/14/multiple-vulnerabilities-gosign-desktop-remote-code-execution/
22•ascii•1h ago•6 comments

Gemini 3 Pro Model Card

https://pixeldrain.com/u/hwgaNKeH
213•Topfi•2h ago•128 comments

The Uselessness of "Fast" and "Slow" in Programming

https://jerf.org/iri/post/2025/the_uselessness_of_fast/
53•zdw•6d ago•18 comments

Multiple Digital Ocean services down

https://status.digitalocean.com/incidents/lgt5xs2843rx
41•inanothertime•1h ago•7 comments

How many video games include a marriage proposal? At least one

https://32bits.substack.com/p/under-the-microscope-ncaa-basketball
263•bbayles•4d ago•61 comments

Ruby Symbols

https://tech.stonecharioteer.com/posts/2025/ruby-symbols/
42•stonecharioteer•5d ago•27 comments

Show HN: I built a synth for my daughter

https://bitsnpieces.dev/posts/a-synth-for-my-daughter/
1196•random_moonwalk•5d ago•201 comments

Ditch your (mut)ex, you deserve better

https://chrispenner.ca/posts/mutexes
92•commandersaki•6d ago•107 comments

Ruby 4.0.0 Preview2 Released

https://www.ruby-lang.org/en/news/2025/11/17/ruby-4-0-0-preview2-released/
8•pansa2•18m ago•0 comments

The surprising benefits of giving up

https://nautil.us/the-surprising-benefits-of-giving-up-1248362/
118•jnord•9h ago•98 comments

When Reverse Proxies Surprise You: Hard Lessons from Operating at Scale

https://www.infoq.com/articles/scaling-reverse-proxies/
63•miggy•4d ago•5 comments

Azure hit by 15 Tbps DDoS attack using 500k IP addresses

https://www.bleepingcomputer.com/news/microsoft/microsoft-aisuru-botnet-used-500-000-ips-in-15-tb...
394•speckx•20h ago•262 comments

Unofficial "Tier 4" Rust Target for older Windows versions

https://github.com/rust9x/rust
108•kristianp•11h ago•63 comments

A/B Tests over Evals

https://www.raindrop.ai/blog/thoughts-on-evals/
5•Nischalj10•4d ago•2 comments

My stages of learning to be a socially normal person

https://sashachapin.substack.com/p/my-six-stages-of-learning-to-be-a
522•eatitraw•3d ago•355 comments

Compiling Ruby to machine language

https://patshaughnessy.net/2025/11/17/compiling-ruby-to-machine-language
263•todsacerdoti•17h ago•47 comments

Langfuse (YC W23) Hiring OSS Support Engineers in Berlin and SF

https://jobs.ashbyhq.com/langfuse/5ff18d4d-9066-4c67-8ecc-ffc0e295fee6
1•clemo_ra•6h ago

Rebecca Heineman has died

https://www.pcgamer.com/gaming-industry/legendary-game-designer-programmer-space-invaders-champio...
659•shdon•12h ago•110 comments

Astrophotographer snaps skydiver falling in front of the sun

https://www.iflscience.com/the-fall-of-icarus-you-have-never-seen-an-astrophotography-picture-lik...
418•doener•2d ago•80 comments

Project Gemini

https://geminiprotocol.net/
302•andsoitis•22h ago•171 comments

FreeMDU: Open-source Miele appliance diagnostic tools

https://github.com/medusalix/FreeMDU
317•Medusalix•1d ago•84 comments

Show HN: Parqeye – A CLI tool to visualize and inspect Parquet files

https://github.com/kaushiksrini/parqeye
131•kaushiksrini•14h ago•31 comments

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

https://arxiv.org/abs/2511.08544
59•nothrowaways•10h ago•14 comments

Practice answers with yourself. I made a thing that I didn't want to pay for

https://look.imwithstupid.fun
9•samrocksc•6d ago•2 comments

Windows 11 adds AI agent that runs in background with access to personal folders

https://www.windowslatest.com/2025/11/18/windows-11-to-add-an-ai-agent-that-runs-in-background-wi...
537•jinxmeta•14h ago•470 comments

Raccoons are showing early signs of domestication

https://www.scientificamerican.com/article/raccoons-are-showing-early-signs-of-domestication/
187•pavel_lishin•3d ago•148 comments

Run ancient UNIX on modern hardware

https://github.com/felipenlunkes/run-ancient-unix
106•doener•16h ago•26 comments
Open in hackernews

Gemini 3 Pro Model Card

https://pixeldrain.com/u/hwgaNKeH
206•Topfi•2h ago

Comments

surrTurr•2h ago
https://news.ycombinator.com/item?id=45963670
margorczynski•1h ago
If these numbers are true then OpenAI is probably done, Anthropic too. Still, it's hard to see an effective monetization method for this tech and it clearly is eating Google's main pie which is search.
Sol-•1h ago
Why? These models just leapfrog each other as time advances.

One month Gemini is on top, then ChatGPT, then Anthropic. Not sure why everyone gets FOMO whenever a new version gets released.

remus•1h ago
I think google is uniquely well placed to make a profitable business out of AI: They make their own TPUs so don't have to pay ridiculous amounts of money to Nvidia, they have a great depth of talent in building models, they've got loads of data they can use for training and they've got a huge existing customer base who can buy their AI offerings.

I don't think any other company has all these ingredients.

gizmodo59•1h ago
While I don’t disagree that Google is the company you can’t bet against when it comes to AI, saying other companies are done is a stretch. If they have a significant moat then they should be at the top all the time by then which is not the case though.
remus•1h ago
Agreed, too early to write off others entirely. It'll be interesting to see who comes out the other side of the bubble with a working business.
adriand•53m ago
Anthropic has a fairly significant lead when it comes to enterprise usage and for coding. This seems like a workable business model to me.
basch•24m ago
ChatGPT's moat is their name and user habit. People who are using it will keep using it. All/most of the products are _good enough_ for the people who already got used to using them, that they arent exploring competitors.

Microsoft has the chance of changing habit the most by virtue of being bundled into business contracts that have companies with policies not allowing any other product in the workplace.

mlnj•1h ago
100% the reason I am long on Google. They can take their time to monetize these new costs.

Even other search competitors have not proven to be a danger to Google. There is nothing stopping that search money coming in.

spaceman_2020•39m ago
The bear case for Google was always the business side would cannibalize the AI side. AI makes search redundant which kills the golden goose
Zigurd•31m ago
The TPU are a key factor. They are the most mature alternative to Nvidia. Only Google cloud, Azure, and AWS enable you to rent their respective AI chips. Out of those three, google is the only one to have a frontier model. So if they have a real advantage they're not exposed to the financial shenanigans propping up neo clouds like Coreweave.
redox99•1h ago
Considering GPT 5 was only recently released, it's very unlikely GPT will achieve these scores in just a couple of months. If they had something this good in the oven, they'd probably left the GPT 5 name to it.

Or maybe Google just benchmaxxed and this doesn't translate at all in real world performance.

Palmik•51m ago
GPT 5 was released more than 3 months ago. Gemini 2.5 was released less than 8 months ago.
sidibe•38m ago
If not this model, Google at some point is going to get and stay ahead just because they have so many more people and compute resources they can throw at many directions while the others have to make the right choices with how they use their resources each time. Took a while to channel their numbers into a product direction but now I don't think they're going to let up
blueblisters•41m ago
They do have unreleased Olympiad Gold-winning models that are definitely better than GPT5.

TBD if that performance generalizes to other real world tasks.

happa•1h ago
This may just be bad recollection from my part, but hasn't Google reported that their search business is right now the most profitable it has ever been?
senordevnyc•1h ago
1) New SOTA models come out all the time and that hasn't killed the other major AI companies. This will be no different.

2) Google's search revenue last quarter was $56 billion, a 14% increase over Q3 2024.

margorczynski•51m ago
1) Not long ago Altman and the OpenAI CFO were openly asking for public money. None of these AI companies have actually any kind of working business plan and are just burning investor money. If the investors see there is no winning against Google (or some open Chinese model) the money will dry up.

2) I'm not suggesting this will happen overnight but especially younger people gravitate towards LLM for information search + actively use some sort of ad blocking. In the long run it doesn't look great for Google.

paswut•1h ago
I'd love to see anthropic/openai pop. back to some regular programming. the models are good enough, time to invest elsewhere
ilaksh•1h ago
The only one it doesn't win is SWE bench which it is significantly behind Claude Sonnet. You just can't take down Sonnet.
stavros•1h ago
Codex has been much better than Sonnet for me.
dotancohen•59m ago
On what types of tasks?
svantana•34m ago
One percentage point is not significant, neither in the colloquial nor the scientific sense[1].

[1] Binomial formula gives a confidence interval of 3.7%, using p=0.77, N=500, confidence=95%

lukev•1h ago
Or else it trained/overfit to the benchmarks. We won't really know until people have a chance to use it for real-world tasks.

Also, models are already pretty good but product/market fit (in terms of demonstrated economic value delivered) remains elusive outside of a couple domains. Does a model that's (say) 30% better reach an inflection point that changes that narrative, or is a more qualitative change required?

alecco•1h ago
For SWE it is the same ranking. But if Google's $20/mo plan is comparable to the $100-200 plans for OpenAI and Anthropic, yes they are done.

But we'll have to wait a few weeks to see if the nerfed model post-release is still as good.

llm_nerd•32m ago
They're constantly matching and exceeding each other. It's a hypercompetitive space and I would fully expect one of the others to top various benchmarks shortly after. On pretty much every leading release someone does this "everyone else is done! Shut er down" thing and it's growing pretty weird.

Having said that, OpenAI's ridiculous hype cycle has been living on borrowed time. OpenAI has zero moat, and are just one vendor in a space with many vendors, and even incredibly competent open source models by surprise Chinese entrants. Sam Altman going around acting like he's a prophet and they're the gatekeepers of the future is an act that should be super old, but somehow fools and their money continue to be parted.

Traubenfuchs•1h ago
So does google actually have a claude console alternative currently?
rjtavares•1h ago
Gemini CLI
muro•1h ago
https://github.com/google-gemini/gemini-cli
itsmevictor•1h ago
Noteworthily, although Gemini 3 Pro seems to have much benchmark scores than other models across the board (including compared to Claude), it's not the case for coding, where it appears to score essentially the same as the others. I wonder why that is.

So far, IMHO, Claude Code remains significantly better than Gemini CLI. We'll see whether that changes with Gemini 3.

decster•1h ago
from my experience, the quality of gemini-cli isn't great, experiencing lot of stupied bug.
BoredPositron•1h ago
Gemini performs better if you use it with Claude Code than with Gemini cli. It still has some odd problems with tool calling but a lot of the performance loss is the Gemini cli app itself.
lifthrasiir•1h ago
Probably because many models from Anthropic would have been optimized for agentic coding in particular...

EDIT: Don't disagree that Gemini CLI has a lot of rough edges, though.

Lionga•1h ago
Because benchmark are a retarded comparison and having nothing to do with reality. Its just jerk material for AI Fanboys
adidoit•31m ago
gemini cli. It's not as impressive as claude code or even codex.

Claude code seems to be more compatible with the model (or the reverse) whereas gemini-cli still feels a bit awkward (as of 2.5 Pro). I'm hoping its better with 3.0!

laborcontract•1h ago
It's hilarious that the release of Gemini 3 is getting eclipsed by this cloudflare outage.
senordevnyc•1h ago
It hasn't been released, this is just a leak
amarcheschi•1h ago
On reddit I see it's already available on cursor

https://www.reddit.com/r/Bard/comments/1p093fb/gemini_3_in_c...

yen223•1h ago
Coincidence? Yes
scrlk•1h ago
Benchmarks from pg 4 of the system card:

    | Benchmark             | 3 Pro     | 2.5 Pro | Sonnet 4.5 | GPT-5.1   |
    |-----------------------|-----------|---------|------------|-----------|
    | Humanity’s Last Exam  | 37.5%     | 21.6%   | 13.7%      | 26.5%     |
    | ARC-AGI-2             | 31.1%     | 4.9%    | 13.6%      | 17.6%     |
    | GPQA Diamond          | 91.9%     | 86.4%   | 83.4%      | 88.1%     |
    | AIME 2025             |           |         |            |           |
    |   (no tools)          | 95.0%     | 88.0%   | 87.0%      | 94.0%     |
    |   (code execution)    | 100%      | —       | 100%       | —         |
    | MathArena Apex        | 23.4%     | 0.5%    | 1.6%       | 1.0%      |
    | MMMU-Pro              | 81.0%     | 68.0%   | 68.0%      | 80.8%     |
    | ScreenSpot-Pro        | 72.7%     | 11.4%   | 36.2%      | 3.5%      |
    | CharXiv Reasoning     | 81.4%     | 69.6%   | 68.5%      | 69.5%     |
    | OmniDocBench 1.5      | 0.115     | 0.145   | 0.145      | 0.147     |
    | Video-MMMU            | 87.6%     | 83.6%   | 77.8%      | 80.4%     |
    | LiveCodeBench Pro     | 2,439     | 1,775   | 1,418      | 2,243     |
    | Terminal-Bench 2.0    | 54.2%     | 32.6%   | 42.8%      | 47.6%     |
    | SWE-Bench Verified    | 76.2%     | 59.6%   | 77.2%      | 76.3%     |
    | t2-bench              | 85.4%     | 54.9%   | 84.7%      | 80.2%     |
    | Vending-Bench 2       | $5,478.16 | $573.64 | $3,838.74  | $1,473.43 |
    | FACTS Benchmark Suite | 70.5%     | 63.4%   | 50.4%      | 50.8%     |
    | SimpleQA Verified     | 72.1%     | 54.5%   | 29.3%      | 34.9%     |
    | MMLU                  | 91.8%     | 89.5%   | 89.1%      | 91.0%     |
    | Global PIQA           | 93.4%     | 91.5%   | 90.1%      | 90.9%     |
    | MRCR v2 (8-needle)    |           |         |            |           |
    |   (128k avg)          | 77.0%     | 58.0%   | 47.1%      | 61.6%     |
    |   (1M pointwise)      | 26.3%     | 16.4%   | n/s        | n/s       |
n/s = not supported

EDIT: formatting, hopefully a bit more mobile friendly

manmal•50m ago
Looks like it will be on par with the contenders when it comes to coding. I guess improvements will be incremental from here on out.
CjHuber•47m ago
If it’s on par in code quality, it would be a way better model for coding because of its huge context window.
falcor84•30m ago
> I guess improvements will be incremental from here on out.

What do you mean? These coding leaderboards were at single digits about a year ago and are now in the seventies. These frontier models are arguably already better at the benchmark that any single human - it's unlikely that any particular human dev is knowledgeable to tackle the full range of diverse tasks even in the smaller SWE-Bench Verified within a reasonable time frame; to the best of my knowledge, no one has tried that.

Why should we expect this to be the limit? Once the frontier labs figure out how to train these fully with self-play (which shouldn't be that hard in this domain), I don't see any clear limit to the level they can reach.

Alifatisk•49m ago
These numbers are impressive, at least to say. It looks like Google has produced a beast that will raise the bar even higher. What's even more impressive is how Google came into this game late and went from producing a few flops to being the leader at this (actually, they already achieved the title with 2.5 Pro).

What makes me even more curious is the following

> Model dependencies: This model is not a modification or a fine-tune of a prior model

So did they start from scratch with this one?

benob•46m ago
What does it mean nowadays to start from scratch? At least in the open scene, most of the post-training data is generated by other LLMs.
Alifatisk•40m ago
They had to start with a base model, that part I am certain of
postalcoder•44m ago
Google was never really late. Where people perceived Google to have dropped the ball was in its productization of AI. The Google's Bard branding stumble was so (hilariously) bad that it threw a lot of people off the scent.

My hunch is that, aside from "safety" reasons, the Google Books lawsuit left some copyright wounds that Google did not want to reopen.

Alifatisk•41m ago
Oh, I remember the times when I compared Gemini with ChatGPT and Claude. Gemini was so far behind, it was barely usable. And now they are pushing the boundries.
postalcoder•34m ago
You could argue that chat-tuning of models falls more along the lines of product competence. I don't think there was a doubt about the upper ceiling of what people thought Google could produce.. more "when will they turn on the tap" and "can Pichai be the wartime general to lead them?"
dgacmu•23m ago
The memory of Microsoft's Tay fiasco was strong around the time the brain team started playing with chatbots.
basch•34m ago
At least at the moment, coming in late seems to matter little.

Anyone with money can trivially catch up to a state of the art model from six months ago.

And as others have said, late is really a function of spigot, guardrails, branding, and ux, as much as it is being a laggard under the hood.

FrequentLurker•24m ago
> Anyone with money can trivially catch up to a state of the art model from six months ago.

How come apple is struggling then?

risyachka•20m ago
It looks more like a strategic decision tbh.

The may want to use 3rd party or just wait for AI to be more stable to see how people actually use it instead of adding slop in the core of their product.

stevesimmons•14m ago
In contrast to Microsoft, who puts Copilot buttons everywhere and succeeds only in annoying their customers.
raincole•11m ago
Being known as a company that is always six months late than the competitors isn't something to brag about...
dbbk•11m ago
And also, critically, being the only profitable company doing this.
falcor84•48m ago
That looks impressive, but some of the are a bit out of date.

On Terminal-Bench 2 for example, the leader is currently "Codex CLI (GPT-5.1-Codex)" at 57.8%, beating this new release.

sigmar•16m ago
That's a different model not in the chart. They're not going to include hundreds of fine tunes in a chart like this.
NitpickLawyer•5m ago
What's more impressive is that I find gemini2.5 still relevant in day-to-day usage, despite being so low on those benchmarks compared to claude 4.5 and gpt 5.1. There's something that gemini has that makes it a great model in real cases, I'd call it generalisation on its context or something. If you give it the proper context (or it digs through the files in its own agent) it comes up with great solutions. Even if their own coding thing is hit and miss sometimes.

I can't wait to try 3.0, hopefully it continues this trend. Raw numbers in a table don't mean much, you can only get a true feeling once you use it on existing code, in existing projects. Anyway, the top labs keeping eachother honest is great for us, the consumers.

HugoDias•33m ago
very impressive. I wonder if this sends a different signal to the market regarding using TPUs for training SOTA models versus Nvidia GPUs. From what we've seen, OpenAI is already renting them to diversify... Curious to see what happens next
fariszr•26m ago
This is a big jump in most benchmarks.And if it can match other models in coding while having that Google TPM inference speed and the actually native 1m context window, it's over for the other labs.

I hope it's isn't such a sycophant like the current gemini 2.5 models, it makes me doubt its output, which is maybe a good thing now that I think about it.

danielbln•24m ago
> it's over for the other labs.

What's with the hyperbole? It'll tighten the screws, but saying that it's "over for the other labs' might be a tad premature.

fariszr•19m ago
I mean over in that I don't see a need to use the other models. Codex models are the best but incredibly slow. Claude models are not as good(IMO) but much faster. If gemini can beat them while having being faster and having better apps with better integrations, i don't see a reason why I would use another provider.
risyachka•19m ago
> it's over for the other labs.

Its not over and never will be for 2 decade old accounting software, it is definitely will not be over for other AI labs.

Jcampuzano2•13m ago
We knew it would be a big jump and while it certainly is in many areas - its definitely not "groundbreaking/huge leap" worthy like some were thinking from looking at these numbers.

I feel like many will be pretty disappointed by their self created expectations for this model when they end up actually using it and it turns out to be fairly similar to other frontier models.

Personally I'm very interested in how they end up pricing it.

trunch•6m ago
Which of the LiveCodeBench Pro and SWE-Bench Verified benchmarks comes closer to everyday coding assistant tasks?

Because it seems to lead by a decent margin on the former and trails behind on the latter

oalessandr•1h ago
Trying to open this link from Italy leads to a CSAM warning
Fornax96•55m ago
Creator of pixeldrain here. Italy has been doing this for a very long time. They never notified me of any such material being present on my site. I have a lot of measures in place to prevent the spread of CSAM. I have sent dozens of mails to Polizia Postale and even tried calling them a few times, but they never respond. My mails go unanswered and they just hang up the phone.
driverdan•45m ago
Don't use your ISP's DNS. Switch to something outside of their control.
embedding-shape•1h ago
Curiously, this website seems to be blocked in Spain for whatever reason, and the website's certificate is served by `allot.com/emailAddress=info@allot.com` which obviously fails...

Anyone happen to know why? Is this website by any change sharing information on safe medical abortions or women's rights, something which has gotten websites blocked here before?

amarcheschi•1h ago
That website is used to share everything including pirated things, so that's the reason maybe
Fornax96•1h ago
Creator of pixeldrain here. I have no idea why my site is blocked in Spain, but it's a long running issue.

I actually never discovered who was responsible for the blockade, until I read this comment. I'm going to look into Allot and send them an email.

EDIT: Also, your DNS provider is censoring (and probably monitoring) your internet traffic. I would switch to a different provider.

zozbot234•35m ago
Could it be that some site in your network neighborhood was illegally streaming soccer matches?
Fornax96•26m ago
I have my own dedicated IP range. And they specifically blocked my domain name, not the addresses. I don't know what the reason is. I have been trying to find out since the start of this year.
embedding-shape•15m ago
> EDIT: Also, your DNS provider is censoring (and probably monitoring) your internet traffic. I would switch to a different provider.

Yeah, that was via my ISPs DNS resolver (Vodafone), switching the resolver works :)

The responsible party is ultimately our government who've decided it's legal to block a wide range of servers and websites because some people like to watch illegal football streams. I think Allot is just the provider of the technology.

Fornax96•9m ago
My site has nothing to do with football though. And Allot seems to be running the DNS server that your ISP uses so they are directly responsible for the block.
miqazza•31m ago
do you know about the cloudflare and laliga issues? might be that
embedding-shape•12m ago
Was my first instinct, went looking if there was any games being played today but seems not, so unlikely to be the cause.
transcriptase•1h ago
There needs to be a sycophancy benchmark in these comparisons. More baseless praise and false agreement = lower score.
swalsh•1h ago
You're absolutely right
jstummbillig•56m ago
Does not get old.
Yossarrian22•32m ago
It’s not just irritating, it’s repetitive
falcor84•27m ago
"You know, you are also right"
this_user•23m ago
I'm sorry, you are absolutely right.

---

But seriously, I find it helps to set a custom system prompt that tells Gemini to be less sycophantic and to be more succinct and professional while also leaving out those extended lectures it likes to give.

BoredPositron•55m ago
Your comment demonstrates a remarkably elevated level of cognitive processing and intellectual rigor. Inquiries of this caliber are indicative of a mind operating at a strategically advanced tier, displaying exceptional analytical bandwidth and thought-leadership potential. Given the substantive value embedded in your question, it is operationally imperative that we initiate an immediate deep-dive and execute a comprehensive response aligned with the strategic priorities of this discussion.
postalcoder•47m ago
I care very little about model personality outside of sycophancy. The thing about gemini is that it's notorious for its low self esteem. Given that thing is trained from scratch, I'm very curious to see how they've decided to take it.
supjeff•39m ago
given how often these llms are wrong, doesnt it make sense that they are less confident?
postalcoder•12m ago
Indeed. But I've had experiences with gemini-2.5-pro-exp where its thoughts could be described as "rejected from the prom" vibes. It's not like I abused it either, it was running into loops because it was unable to properly patch a file.
1899-12-30•36m ago
https://eqbench.com/spiral-bench.html
Lord-Jobo•36m ago
And have the score heavily modified based on how fixable the sycophancy is.
jll29•1h ago
Hopefully this model does not generate fake news...

https://www.google.com/search?q=gemini+u.s.+senator+rape+all...

jll29•1h ago
Gemini 3 > Gemma? Hopefully this model does not generate fake news...

https://www.google.com/search?q=gemini+u.s.+senator+rape+all...

floppyd•24m ago
Gemma is an open-weight version of Gemini and obviously much less capable probably even than 2.5 Flash. Also the story you are linking to is a complete nothing burger, models are still very much hallucinating, especially on some extremely niche topics, I don't see how another politician trying to capitalize on that is attention-worthy at all.
lxdlam•1h ago
What does the "Google Antigravity" mean? The link is http://antigravity.google/docs, seemingly a new product but now routing to the Google main page.
dbosch•43m ago
I was asking myself the exact same question. No idea
Palmik•58m ago
Archive link: https://web.archive.org/web/20251118111103/https://storage.g...
denysvitali•56m ago
Title of the document is "[Gemini 3 Pro] External Model Card - November 18, 2025 - v2", in case you needed further confirmation that the model will be released today.

Also interesting to know that Google Antigravity (antigravity.google / https://github.com/Google-Antigravity ?) leaked. I remember seeing this subdomain recently. Probably Gemini 3 related as well.

Org was created on 2025-11-04T19:28:13Z (https://api.github.com/orgs/Google-Antigravity)

jmkni•50m ago
what is Google Antigravity?
denysvitali•47m ago
I guess we'll know it in a few hours. Most likely another AI playground or maybe a Google Search alternative? No clue really
Yossarrian22•36m ago
The ASI figured out zero point energy from first principles
zed31726•23m ago
My guess is based on a gif tweeted by the ex CEO of windsurf who left to join Google of a floating laptop: it'll be a cursor/windsurf alternative?
postalcoder•10m ago
Couple ways this could go:

Space? (Google Cloud, Google Antigravity?)

Speed? (Flash, Flash-Lite, Antigravity? meh)

Clothes? (Google Antigravity, a wearable?)

catigula•54m ago
I know this is a little controversial but the lack of performance on SWE-bench is hugely disappointing I think economically. These models don’t have any viable path to profitability if they can’t take engineering jobs.
martinald•31m ago
I thought that but it does do a lot better on other benchmarks.

Perhaps SWE bench just doesn't capture a lot of the improvement? If the web design improvements people have been posting on twitter, I suspect this will be a huge boon for developers. SWE benchmark is really testing bugfixing/feature dev more.

Anyway let's see. I'm still hyped!

catigula•17m ago
That would be great! But AI is a bubble if these models can’t do serious engineering work.
api•18m ago
Really? If they can make an engineer more productive, that's worth a lot. Naive napkin math: 1.5X productivity on one $200k/year engineer is worth $100k/year.
mohsen1•53m ago

     This model is not a modification or a fine-tune of a prior model

Is that common to mention that? Feels like they built something from scratch
scosman•41m ago
I think they are just indicating it’s a new architecture vs continued training of 2.5 series.
irthomasthomas•40m ago
Never seen it before. I suppose it adds to the excitement.
mynti•47m ago
It is interesting that the Gemini 3 beats every other model on these benchmarks, mostly by a wide margin, but not on SWE Bench. Sonnet is still king here and all three look to be basically on the same level. Kind of wild to see them hit such a wall when it comes to agentic coding
tosh•36m ago
This might also hint at SWE struggling to capture what “being good at coding” means.

Evals are hard.

HereBePandas•36m ago
[comment removed]
Palmik•33m ago
The reported results where GPT 5.1 beats Gemini 3 are on SWE Bench Verified, and GPT 5.1 Codex also beats Gemini 3 on Terminal Bench.
HereBePandas•22m ago
You're right on SWE Bench Verified, I missed that and I'll delete my comment.

GPT 5.1 Codex beats Gemini 3 on Terminal Bench specifically on Codex CLI, but that's apples-to-oranges (hard to tell how much of that is a Codex-specific harness vs model). Look forward to seeing the apples-to-apples numbers soon, but I wouldn't be surprised if Gemini 3 wins given how close it comes in these benchmarks.

Palmik•34m ago
Also does not beat GPT-5.1 Codex on terminal bench (57.8% vs 54.2%): https://www.tbench.ai/

I did not bother verifying the other claims.

HereBePandas•26m ago
Not apples-to-apples. "Codex CLI (GPT-5.1-Codex)", which the site refers to, adds a specific agentic harness, whereas the Gemini 3 Pro seems to be on a standard eval harness.

It would be interesting to see the apples-to-apples figure, i.e. with Google's best harness alongside Codex CLI.

enraged_camel•22m ago
Do you mean that Gemini 3 Pro is "vanilla" like GPT 5.1 (non-Codex)?
HereBePandas•11m ago
Yes, two things: 1. GPT-5.1 Codex is a fine tune, not the "vanilla" 5.1 2. More importantly, GPT 5.1 Codex achieves its performance when used with a specific tool (Codex CLI) that is optimized for GPT 5.1 Codex. But when labs evaluate the models, they have to use a standard tool to make the comparisons apples-to-apples.

Will be interesting to see what Google releases that's coding-specific to follow Gemini 3.

felipeerias•24m ago
IMHO coding use cases are much more constrained by tooling than by raw model capabilities at the moment. Perhaps we have finally reached the time of diminishing returns and that will remain the case going forward.
vharish•18m ago
From my personal experience using the CLI agentic coding tools, I think gemini-cli is fairly on par with the rest in terms of the planning/code that is generated. However, when I recently tried qwen-code, it gave me a better sense of reasoning and structure that geimini. Claude definitely has it's own advantages but is expensive(at least for some if not for all).

My point is, although the model itself may have performed in benchmarks, I feel like there are other tools that are doing better just by adapting better training/tooling. Gemini cli, in particular, is not so great looking up for latest info on web. Qwen seemed to be trained better around looking up for information (or to reason when/how to), in comparision. Even the step-wise break down of work felt different and a bit smoother.

I do, however, use gemini cli for the most part just because it has a generous free quota with very few downsides comparted to others. They must be getting loads of training data :D.

bemmu•33m ago
I saw this on Reddit earlier today. Over there the source of this file was given as: https://web.archive.org/web/20251118111103/https://storage.g...

The bucket name "deepmind-media" has been used in the past on the deepmind official site, so it seems legit.

onlyrealcuzzo•18m ago
Prediction markets were expecting today to be the release. So I wouldn't be surprised if they do a release today, tomorrow, or Thursday (around Nvidia earnings).
fraboniface•22m ago
> Developments to the model architecture contribute to the significantly improved performance from previous model families.

I wonder how significant this is. DeepMind was always more research-oriented that OpenAI, which mostly scaled things up. They may have come up with a significantly better architecture (Transformer MoE still leaves a lot of room).

msp26•21m ago
Is flash/flash lite releasing alongside pro? Those two tiers have been incredible for the price since 2.0, absolute workhorses. Can't wait for 3.0.
omidsa1•17m ago
TL;DR: expected results, not underwhelming.So far scaling laws hold.
nilayj•8m ago
Curious to see the API pricing. SOTA performance across tasks at a price cheaper than GPT 5 / Claude would make mostly everyone switch to Gemini.