Google Safe Browsing missed 84% of confirmed phishing sites

https://www.norn-labs.com/blog/huginn-report-feb-2026

164•jdup7•2h ago

Comments

supermatt•1h ago

> When we ran the full dataset through the deep scan, it caught every single confirmed phishing site with zero false negatives. The tradeoff is that it flagged all 9 of the legitimate sites in our dataset as suspicious

Huh? Does this mean it just flagged everything as suspicious?

badgersnake•1h ago

lol, return false;

john_strinlai•1h ago

indeed... it seems like it just says everything is phishing... which they go on to say is desirable?

"The tradeoff is that it flagged all 9 of the legitimate sites in our dataset as suspicious, which is worth it when you're actively investigating a link you don't trust."

so, you dont really need the scanning product at all. if you just assume every website is a phishing website, you will have the same performance as the scanner!

jdup7•1h ago

Yeah probably could have done better at describing the methodology. The dataset is just the confirmed (manually by a human) phishing urls. We only included the FPs to show that the tooling isn't perfect there were many TNs that we did not include. Going forward we could definitely frame these results better.

lich_king•1h ago

I don't understand the metric they're using. Which is maybe to be expected of an article that looks LLM-written. But they started with ~250 URLs; that's a weirdly small sample. I'm sure there are tens of thousands malicious websites cropping up monthly. And I bet that Safe Browsing flags more than 16% of that?

So how did they narrow it down to that small number? Why these sites specifically?... what's the false positive / negative rate of both approaches? What's even going on?

jdup7•1h ago

Probably could have been a bit more descriptive around the dataset. Our tooling pulls in a lot more than 250 URLs but since we are manually confirming them that means a smaller dataset. In other words, out of the urls we pulled in these 250 were confirmed (by a human) as phishing. We did not do any selection beyond that. As for the article LLMs were used to help with the graphs and grammatical checks but that's it. This was our first month of going through this exercise and we definitely want to have larger datasets going forward as we expand capacity for review.

As for Safe Browsing catching more than 16% it depends on the timeline at the time these attacks are launched it's likely Safe Browsing catches closer to 0% but as the time goes on that number definitely climbs.

john_strinlai•1h ago

>what's the false positive / negative rate of both approaches

the false positive rate is 100%. they just say everything is phishing:

"When we ran the full dataset through the deep scan, it caught every single confirmed phishing site with zero false negatives. The tradeoff is that it flagged all 9 of the legitimate sites in our dataset as suspicious, which is worth it when you're actively investigating a link you don't trust."

lorenzoguerra•1h ago

it's 100% for what they call "deep scan", it's 66.7% for the "automatic scan". Practically unusable anyway

PunchyHamster•1h ago

They put them directly in front of search results, why would they not miss them ?

xvector•1h ago

There's probably like one engineer maintaining this as a side project at the company

andor•1h ago

Yeah, it would be interesting to know how much work is spent on it. I sometimes submit sites when I am targeted by a campaign, but I'm not sure if they end up in their deny-list.

candiddevmike•1h ago

I'm getting some kind of chrome security warning when using zscaler now. Discussing all of this with non-techies, I think folks are overwhelmed by all of the security warnings they get and have stopped paying attention to them.

So what's the point of doing all of this if there isn't some kind of corresponding education on responsible computer use? There needs to be some personal responsibility here, you can't protect people against everything.

dvh•1h ago

Just yesterday I marked another Gmail phishing scam. This wouldn't be worth mentioning but they are using Google's own service for it. It has to be intentional, there is no other explanation. https://news.ycombinator.com/item?id=46665414

iqandjoke•1h ago

But why Apple choose to work with this on Safari?

nico•1h ago

On a tangent - gmail has a feature to report phishing emails, but it seems like it’s only available on the website. Their mobile app doesn’t seem to have the option (same with “mark as unread”). Is it hidden or just not available?

bradyd•1h ago

The mobile app definitely has mark as unread. It's the envelope icon next to the trashcan (the exact same icon as in the web interface). Never realized there was a report phishing option. I just mark those emails as spam, which is available in the app.

itvision•1h ago

Criminals can easily show Google crawlers "good" websites.

The fact that Safe Browsing even works is already good enough.

7777777phil•1h ago

Blocklists assume you can separate malicious infrastructure from legitimate infrastructure. Once phishing moves to Google Sites and Weebly that model just doesn't work.

lorenzoguerra•1h ago

>We also ran the full dataset of 263 URLs (254 phishing, 9 confirmed legitimate) through Muninn's automatic scan. This is the scan that runs on every page you visit without any action on your part. On its own, the automatic scan correctly identified 238 of the 254 phishing sites and only incorrectly flagged 6 legitimate pages.

...so it has a false positive rate of 67%? On a ridiculously small dataset?

jdup7•1h ago

Fair point in isolation that number doesn't look good. The important context is that this dataset was built to test phishing detection, not to measure false positive rates on normal traffic. It's sourced from our threat intelligence tooling so it's almost entirely malicious URLs by design. The 9 clean sites aren't a random sample of everyday browsing. They're sites that were submitted as suspicious and turned out to be legitimate so they're basically the hardest possible set of clean pages to correctly classify. This seems like a common critique and we definitely could have done a better job of explaining the methodology. Going forward we will include numbers from daily use to give a better picture of FP rate.

mholt•1h ago

I never loved the idea of GSB or centralized blocklists in general due to the consequences of being wrong, or the implications for censorship.

So for my masters' thesis about 6-7 years ago now (sheesh) I proposed some alternative, privacy-preserving methods to help keep users safe with their web browsers: https://scholarsarchive.byu.edu/etd/7403/

I think Chrome adopted one or two of the ideas. Nowadays the methods might need to be updated especially in a world of LLMs, but regardless, my hope was/is that the industry will refine some of these approaches and ship them.

notepad0x90•1h ago

Block lists will always be used for one reason or another, in this case these are verified malicious sites, there is no subjective analysis element in the equation that could be misconstrued as censorship. But even if there was, censorship implies a right to speech, in this case Google has the right to restrict the speech of it's users if it so wishes, matter of fact, through extensions there are many that do censor their users using Chrome.

rstupek•28m ago

I know for a fact that GSB contains non-malicious sites in its dataset.

sirpilade•1h ago

But hits 100% of browsing tracking

blell•43m ago

Educate yourself on how it works before you say something like this.

sirpilade•11m ago

Pun aside, I cannot fully trust a centralized URL checker on a remote server that I don’t own, even if they guarantee that my privacy is safe

notepad0x90•1h ago

Glass is half empty, I see.

How about GSB stopped 16% of phishing sites? that's still huge.

debo_•1h ago

I guess the glass is 16% full.

loloquwowndueo•1h ago

Would you use anything that was only 16% effective for its claimed purpose?

“Tylenol stops headaches in 16% of people” - it’s huge, right? That’s millions of people we’re talking about.

Would you use it?

mock-possum•1h ago

Idk why not? What’re the side effects?

epicprogrammer•1h ago

Having spent some time in the anti-abuse and Trust & Safety space, I always take these vendor reports with a massive grain of salt. It’s a classic case of comparing apples to vendor-marketing oranges. A headline screaming about an 84% miss rate sounds like a systemic collapse until you look at the radically different constraint envelopes a global default like GSB and a specialized enterprise vendor operate under.

The biggest factor here is the false-positive cliff. Google Safe Browsing is the default safety net for billions of clients across Chrome, Safari, and Firefox. If GSB’s false-positive rate ticks up by even a fraction of a percent, they end up accidentally nuking legitimate small businesses, SaaS platforms, or municipal portals off the internet. Because of that massive blast radius, GSB fundamentally has to be deeply conservative. A boutique security vendor, on the other hand, can afford to be highly aggressive because an over-block in a corporate environment just results in a routine IT support ticket.

You also have to factor in the ephemeral nature of modern phishing infrastructure and basic selection bias. Threat actors heavily rely on automated DGAs and compromised hosts where the time-to-live for a payload is measured in hours, if not minutes. If a specialized vendor detects a zero-day phishing link at 10:00 AM, and GSB hasn't confidently propagated a global block to billions of edge clients by 10:15 AM, the vendor scores it as a "miss." Add in the fact that vendors naturally test against the specific subset of threats their proprietary engines are tuned to find, and that 84% number starts to make a lot more sense as a top-of-funnel marketing metric rather than a scientific baseline.

None of this is to say GSB is perfect right now. It has absolutely struggled to keep up with the recent explosion of automated, highly targeted spear-phishing and MFA-bypass proxy kits. But we should read this report for what it really is: a smart marketing push by a security vendor trying to sell a product, not a sign that the internet's baseline immune system is totally broken.

Medowar•1h ago

> We also ran the full dataset of 263 URLs (254 phishing, 9 confirmed legitimate) through Muninn's automatic scan. This is the scan that runs on every page you visit without any action on your part. On its own, the automatic scan correctly identified 238 of the 254 phishing sites and only incorrectly flagged 6 legitimate pages. [...] The tradeoff is that it flagged all 9 of the legitimate sites in our dataset as suspicious, ...

Am I missing something or is that a 66%/100% False Positive Rate on legitimate Sites?

If GSB would have that ratio, it would be absolute unusable.. So comparing these two is absolutely wrong...

ApolloFortyNine•49m ago

The 9/9 is actually crazy, and then they posted about it as if they found something? What they did was find a major issue in their own process and then told the world about it, that just doesn't seem right.

trehalose•30m ago

It would seem their service identifies only phishing sites as legitimate ones. It would seem 100% of sites they deem legitimate are phishing sites. Incredible.

thrwaway55•1m ago

The deep scan detected all phishing sites correctly with the unfortunate tagging of legit sites as phishing too. I imagine their code looks something like isPhishing = true.

ajross•1h ago

> I always take these vendor reports with a massive grain of salt.

Yeah. "Here's a blog post with some casually collected numbers about our product [...] It turns out that it's great!" is sorta boring.

But couple that with a headline framed as "Google [...] Bad" and straight to the top of the HN front page it goes!

jdup7•1h ago

These are fair points and I agree with a lot of them. GSB operates at a scale we don't, and the conservatism that comes with being the default for billions of users is a real constraint. The post tries to acknowledge that ("the takeaway from all of this is not that Google Safe Browsing is bad") and we're upfront about the timing caveat since these were checked at time of scan.

Where I'd push back is on what this means for the average person. Most people have no protection against phishing beyond what their email provider and browser give them. If that protection is fundamentally reactive, catching threats hours or days after they go live, that's a real limitation worth talking about honestly. The 84% number isn't meant to say GSB is broken. It's meant to say there's a gap, and that gap has consequences for real users regardless of the engineering reasons behind it.

On the marketing angle, we aren't currently selling anything. The extension is free and so is submitting URLs for verification. We recognize it would be disingenuous to say we never will, but at the very least the data and the ability to check URLs (similar to PhishTank before they closed registration) will always be free. The dataset is also sourced from public threat intelligence feeds, not a curated set designed to make our tool look good. We think publishing findings like this is valuable even if you set aside everything about our tools.

nubinetwork•49m ago

> I always take these vendor reports with a massive grain of salt. It’s a classic case of comparing apples to vendor-marketing oranges. A headline screaming about an 84% miss rate sounds like a systemic collapse until...

I've seen this before in the ip blocklist space... if you're layering up firewall rules, you're bound to see the higher priority layers more often.

That doesn't mean the other layers suck, security isn't always an A or B situation...

On the other hand, I don't know how I feel about how GSB is implemented... you're telling google every website you go to, but chances are the site already has google analytics or SSO...

xnx•1h ago

Why should I trust that "Norn Labs" knows what is and is not a phishing site?

mrexcess•1h ago

These statistics would be a lot better if they were compared directly to the same measurements taken from dedicated cloud SWGs/SSEs like Zscaler. My somewhat subjective sense is that the whole industry is in a bit of a rough patch, the miss rate seems to be noticeably climbing all across the board.

pothamk•1h ago

One thing that often gets overlooked in these comparisons is distribution latency.

Detecting a phishing domain internally is one problem, but pushing a verified block to billions of browsers worldwide is a completely different operational challenge.

Systems like Safe Browsing have to worry about propagation time, cache layers, update intervals, and the risk of pushing a false positive globally. A specialized vendor can update instantly for a much smaller customer base.

That difference alone can easily look like a “miss” in snapshot-style measurements.

timnetworks•1h ago

The most dangerous links recently have been from sharepoint.com, dropbox.com, etc. and nobody is going to block those.

varispeed•1h ago

When Google will remove scams, phishing and other nonsense from their advertising? Especially the scareware stuff, where AI videos say someone might be listened to / hacked and here is the software that will help block it / find it whatnot. Then they collect personal data.

passwordoops•52m ago

Anecdotal and loosely related, but I can say since Gemini was forced into Gmail, much more obvious SPAM passes the filter

hedora•50m ago

So, the false negative rate was 84%, but what was the false positive rate?

They have a table "AUTOMATIC SCAN RESULTS (263 URLS)" that sort of presents this information. Of the 9 sites that were negatives, they say they incorrectly flagged 6 as phishing.

With a false positive rate of 66%, it's not surprising they were able to drive down their false negative rate. Also, the test set of 254 phishing sites with 9 legitimate ones is a strange choice.

(Or maybe they need to work on how they present data in tables; tl;dr the supporting text.)

decimalenough•41m ago

The false positive rate was 66% for "automatic scan" and 100% (!) for "deep scan".

In other words, you can get these numbers if your deep scan filter is isSuspicious() { return true; }.

obblekk•48m ago

Maybe I’m an outlier but I’d rather this than accidentally block legit sites.

Otherwise this becomes just another tool for Google to wall in the subset of the internet they like.

caaqil•39m ago

Yeah, maybe let's change the title to remove that 84% rate. It's meaningless because it's just 254 websites, given the scale of what Google Safe Browsing deals with.

How is this serious? This is a marketing slop. If the title isn't enough indicator, the ending should be:

> If you're interested in trying Muninn, it's available as a Chrome extension. We're in an early phase and would genuinely appreciate feedback from anyone willing to give it a shot. And if you run across phishing in the wild, consider submitting it to Yggdrasil so the data can help protect others.

bethekidyouwant•25m ago

Their example is really dumb. Eventually, you get a fake Microsoft login page, but they clip out the address bar which clearly isn’t a Microsoft address so your auto complete password isn’t going to be put into the form and you’d have to be pretty dumb to type it in my hand or even to know your Microsoft password, it should be some random thing generated by Safari or whatever your password manager is. Not to mention two factor authentication.

dsr_•17m ago

Almost all email phishing attempts we receive come from GMail.

Wikipedia in read-only mode following mass admin account compromise

Show HN: Jido 2.0, Elixir Agent Framework

Good software knows when to stop

Nvidia PersonaPlex 7B on Apple Silicon: Full-Duplex Speech-to-Speech in Swift

Fast-Servers

Google Workspace CLI

Google Safe Browsing missed 84% of confirmed phishing sites

Intelligence is a commodity. Context is the real AI Moat

Relicensing with AI-Assisted Rewrite

World-first gigabit laser link between aircraft and geostationary satellite

Datasets for Reconstructing Visual Perception from Brain Data

Greg Kroah-Hartman Stretches Support Periods for Key Linux LTS Kernels

Poor Man's Polaroid

Judge orders government to begin refunding more than $130B in tariffs

Smalltalk's Browser: Unbeatable, yet Not Enough

The Man Who Broke into Jail

Building a new Flash

AMD will bring its “Ryzen AI” processors to standard desktop PCs for first time

A GitHub Issue Title Compromised 4k Developer Machines

Jails for NetBSD – Kernel Enforced Isolation and Native Resource Control

Arabic document from 17th-cent. rubbish heap confirms semi-legendary Nubian king

Billy bookshelves as a retro motherboard "rack"

The IRIX 6.5.7M (sgi) source code

Something is afoot in the land of Qwen

The L in "LLM" Stands for Lying

Earth Garden: Field Recordings Around the World

OpenBSD on SGI: A Rollercoaster Story

MacBook Neo

No right to relicense this project

BMW Group to deploy humanoid robots in production in Germany for the first time

Google Safe Browsing missed 84% of confirmed phishing sites

Comments

Wikipedia in read-only mode following mass admin account compromise

Show HN: Jido 2.0, Elixir Agent Framework

Good software knows when to stop

Nvidia PersonaPlex 7B on Apple Silicon: Full-Duplex Speech-to-Speech in Swift

Fast-Servers

Google Workspace CLI

Google Safe Browsing missed 84% of confirmed phishing sites

Intelligence is a commodity. Context is the real AI Moat

Relicensing with AI-Assisted Rewrite

World-first gigabit laser link between aircraft and geostationary satellite

Datasets for Reconstructing Visual Perception from Brain Data

Greg Kroah-Hartman Stretches Support Periods for Key Linux LTS Kernels

Poor Man's Polaroid

Judge orders government to begin refunding more than $130B in tariffs

Smalltalk's Browser: Unbeatable, yet Not Enough

The Man Who Broke into Jail

Building a new Flash

AMD will bring its “Ryzen AI” processors to standard desktop PCs for first time

A GitHub Issue Title Compromised 4k Developer Machines

Jails for NetBSD – Kernel Enforced Isolation and Native Resource Control

Arabic document from 17th-cent. rubbish heap confirms semi-legendary Nubian king

Billy bookshelves as a retro motherboard "rack"

The IRIX 6.5.7M (sgi) source code

Something is afoot in the land of Qwen

The L in "LLM" Stands for Lying

Earth Garden: Field Recordings Around the World

OpenBSD on SGI: A Rollercoaster Story

MacBook Neo

No right to relicense this project

BMW Group to deploy humanoid robots in production in Germany for the first time