frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Benchmarking leading AI agents against Google reCAPTCHA v2

https://research.roundtable.ai/captcha-benchmarking/
29•mdahardy•2h ago

Comments

PaulHoule•1h ago
I know people were solving CAPTCHAS with neural nets (with PHP no less!) back in 2009.
golfer•53m ago
Indeed, captcha vs captcha bot solvers has been an ongoing war for a long time. Considering all the cybercrime and ubiquitous online fraud today, it's pretty impressive that captchas have held the line as long as they have.
mdahardy•47m ago
You could definitely do better than we do here - this was just a test of how well these general-purpose systems are out-of-the-box
xnx•1h ago
Seems like Google Gemini is tied for the best and is the cheapest way to solve Google's reCAPTCHA.

Will be interesting to see how Gemini 3 does later this year.

bena•53m ago
Makes sense, what do you think it was trained on?
mdahardy•39m ago
After watching hundreds of these runs, Gemini was by far the least frustrating model to observe.
Xenoamorphous•1h ago
I’m sure they do better than me. Sometimes I get stuck on an endless loop of buses and fire hydrants.

Also, when they ask you to identify traffic lights, do you select the post? And when it’s motor/bycicles, do you select the guy riding it?

datadrivenangel•59m ago
That's not due to accuracy, you're getting tarpitted for not looking human enough.
sixhobbits•57m ago
Didn't look a lot into this but I think the fact that humans are willing to do this in the "cents per thousand" or something range means that it's really hard to get much interest in automating it
Semaphor•50m ago
There's a browser extension to solve them. Buster.
mdahardy•46m ago
While running this I looked at hundreds and hundreds of captchas. And I still get rejected on like 20% of them when I do them. I truly don't understand their algorithm lol
Sayrus•41m ago
Testing those same captcha on Google Chrome improved my accuracy by at least an order of magnitude.

Either that or it was never about the buses and fire hydrants.

ACCount37•31m ago
It's a known "issue" of reCaptcha, and many other systems like it. If it thinks you're a bot, it will "fail" the first few correct solves before it lets you through.

The worst offenders will just loop you forever, no matter how many solves you get right.

utopman•39m ago
Not sure it is your case but I think I sometimes had to solve many of them when I am in my daily task rush. My hypothesis is that I solve them too fast for "average human resolving duration" recaptcha seems to expect (I think solving it too fast triggers bot fingerprint). More recently when I fall on a recaptcha to solve, I consciently do not rush it and feel have no more to solve more than one anymore. I don't think I have super powers, but as tech guy I do a lot a computing things mechanically.
hnburnsy•18m ago
Pro tip, select a section you know is wrong, then de select it before submitting. Seems to help prove you are not a bot.
guluarte•53m ago
in other words reasoning call fill the context window with crap
flakiness•50m ago
To be honest I'm surprised how well it holds. I expected close-to-total collapse. It'll be a matter of time I guess, but still.
mdahardy•45m ago
Same! As we talk about in the article, the failures were less from raw model intelligence/ability than from challenges with timing and dynamic interfaces
WhereIsTheTruth•44m ago
3 models only, can we really call that a benchmark?
mdahardy•38m ago
yes
maknee•44m ago
interesting results. why does reload/cross-tile have worse results? would be nice to see some examples of failed results (how close did it to solving?)
mdahardy•41m ago
We have an example of a failed cross-tile result in the article - the models seem like they're much better at detecting whether something is in an image vs. identifying the boundaries of those items. This probably has to do with how they're trained - if you train on descriptions/image pairs, I'm not sure how well that does at learning boundaries.

Reload are challenging because of how the agent-action loop works. But the models were pretty good at identifying when a tile contained an item.

ajsnigrutin•32m ago
So, when do we reach a level where AI is better than humans and we remove captcha from pages alltogether? If you don't want bots to read content, don't put it online, you're just inconveniencing real people now.
gregpr07•31m ago
Creator of Browser Use here - cool testing! Could you try gemini-flash-latest as well? It usually performs better than 2.5 pro?

Some interesting debates around Gemini Computer Use - surely Google can post train this away right? (Currently it has no problem solving it haha)

kjok•17m ago
If not today, models will get better at solving captchas in the near future. IMHO, the real concern, however, is cheap captcha solving services.

Show HN: Infostealers in Nov 2025: 183M Gmail, 16B Logins, Nikkei Slack

https://traclea.com/coming-soon
1•Traclea•59s ago•0 comments

Bill Gates Says We're in an AI Bubble Similar to the Dot-Com Bubble

https://www.businessinsider.com/bill-gates-ai-bubble-similar-dot-com-bubble-2025-10
1•BerislavLopac•1m ago•0 comments

Year one of hosting Tor exit relays

https://blog.paranoidpenguin.net/2025/11/year-one-of-hosting-tor-exit-relays/
1•speckx•1m ago•0 comments

Show HN: Princejs – <10 kB Bun framework by a 13-year-old Nigerian, beats Hono

https://github.com/MatthewTheCoder1218/princejs
1•lilprince1218•1m ago•0 comments

"Nobody wants a data center in their backyard"

https://www.datacenterdynamics.com/en/news/nobody-really-wants-a-data-center-in-their-backyard-sa...
2•belter•2m ago•1 comments

Optimizing Authorization Security: A Guide to Access Control Models

https://fusionauth.io/articles/identity-basics/authorization-models
1•mooreds•2m ago•0 comments

Show HN: Klana – AI Design Copilot Plugin for Figma

https://www.figma.com/community/plugin/1545618089212283719/klana
2•joezee•5m ago•0 comments

Async Rust with Tokio I/O streams: backpressure, concurrency, and ergonomics

https://biriukov.dev/docs/async-rust-tokio-io/1-async-rust-with-tokio-io-streams-backpressure-con...
2•fanf2•6m ago•0 comments

Guitar Hero at 20 – how a plastic axe bridged the gap between rock generations

https://www.theguardian.com/games/2025/nov/08/guitar-hero-at-20-gap-between-rock-generations-harm...
1•mitchbob•9m ago•0 comments

Google: Introduction to Agents

https://www.kaggle.com/whitepaper-introduction-to-agents
1•Anon84•12m ago•0 comments

George Hotz: Outwit, Outplay, Outlast [video]

https://www.youtube.com/watch?v=werrvv0MVXQ
1•vrnvu•14m ago•0 comments

The front-facing camera will be invisible in a 2027 iPhone

https://9to5mac.com/2025/11/10/the-front-facing-camera-will-be-invisible-in-a-2027-iphone-says-le...
1•geox•14m ago•1 comments

Entities enabling scientific fraud at scale are large, resilient growing rapidly

https://www.pnas.org/doi/full/10.1073/pnas.2420092122
3•devonnull•15m ago•0 comments

Texas Sheriff Used Flock ALPR in Abortion Investigation

https://www.eff.org/deeplinks/2025/10/flock-safety-and-texas-sheriff-claimed-license-plate-search...
5•snthd•15m ago•1 comments

NIST Randomness Beacon Shut Down

https://csrc.nist.gov/Projects/interoperable-randomness-beacons/beacon-20?2
3•ireflect•15m ago•1 comments

HTML to Markdown Converter

https://github.com/Goldziher/html-to-markdown
3•thunderbong•15m ago•0 comments

Sound familiar? Matching voices boost trust in self-driving cars

https://news.umich.edu/sound-familiar-matching-voices-boost-trust-in-self-driving-cars/
1•ohjeez•16m ago•0 comments

Use AI to edit and optimize your resume for any job

https://ai.resume.essentialx.us/
1•maheshgattani•16m ago•0 comments

Solving Diverse Problems with Sheaf-Wreath Attention [pdf]

https://github.com/bon-cdp/notes/blob/main/c.pdf
2•bon-cdp•16m ago•1 comments

Wikipedia: Temporary Accounts

https://en.wikipedia.org/wiki/Wikipedia:Temporary_accounts
2•encroach•17m ago•0 comments

Why Wise and Airwallex aren't worried about stablecoins

https://text-incubation.com/Why+Wise+and+Airwallex+aren%E2%80%99t+worried+about+stablecoins?
1•krrishd•18m ago•0 comments

Redmond, WA, turns off Flock Safety cameras after ICE arrests

https://www.seattletimes.com/seattle-news/law-justice/redmond-turns-off-flock-safety-cameras-afte...
5•dredmorbius•18m ago•1 comments

Arc Raiders Is So Good I'm Worried It Will Take over My Life

https://kotaku.com/ark-raiders-embark-studios-server-slam-2000637325
2•PaulHoule•18m ago•0 comments

Want a 60-Minute Memory Boost? Neuroscience Just Revealed a Powerful Trick

https://www.inc.com/bill-murphy-jr/want-a-60-minute-memory-boost-neuroscience-just-revealed-a-pow...
1•mikhael•20m ago•0 comments

Sonder to File Bankruptcy and Liquidate After Marriott Cuts Ties

https://www.bloomberg.com/news/articles/2025-11-10/sonder-to-file-bankruptcy-and-liquidate-after-...
2•toomuchtodo•21m ago•1 comments

Dull Days at the Factory

https://mihaiolteanu.me/dull-days-at-the-factory
3•molteanu•23m ago•0 comments

Toxic 'Hammerhead Worm' Is Invading Texas, Triggering Warnings

https://www.sciencealert.com/toxic-hammerhead-worm-is-invading-texas-triggering-warnings
1•rbanffy•23m ago•0 comments

Show HN: Papiers.ai – A new interface for ArXiv

https://twitter.com/sighjith/status/1987722139672117507
3•smnair•23m ago•1 comments

Stop asking 'How was school today?' To raise successful, mentally strong kids

https://www.cnbc.com/2025/11/09/stop-asking-how-was-school-today-to-raise-successful-kids-ask-the...
2•makerdiety•24m ago•1 comments

Fortran Outsmarted Our Billion-Dollar AI Chips

https://medium.com/@daxx5/fortran-outsmarted-our-billion-dollar-ai-chips-61f25b4a40fe
2•marcobambini•24m ago•0 comments