frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The famous O3 "GeoGuessr" prompt did not work

https://www.seangoedecke.com/the-o3-geoguessr-prompt-did-not-work/
15•ingve•1h ago

Comments

grebc•15m ago
I wonder if in all the sampling that all location meta data was stripped.
mickeyp•14m ago
This test would be a lot more useful if the author used images the models obviously hadn't seen before. Pulling images from Wikipedia? They'll have seen 'em before, and the metadata, and all the pages they were casually linked to.

The premise that the long prompt only made the model think 'a second longer' may have more to do with the fact that it knows about the images. So why think harder if you know the answer?

At no point does the author contemplate that.

vintermann•13m ago
They say they threw in some indoor images, presumably from around where they were.
vintermann•14m ago
Interesting what he reports, that newer models are worse at geolocation. Sorry if I'm getting paranoid, but I wonder if that's a deliberately nerfed capability.
Gys•12m ago
> I think this shows how easy it is to fool yourself about the quality of prompting. When the model is already pretty good at a task, you can give it a very elaborate prompt without impacting performance. It’ll still be pretty good, except this time it’s good because of what you did.
fontain•8m ago
“It’s also interesting to me that nobody checked this at the time. It took me about six hours of fairly-distracted work and about $15 to construct and run this benchmark. Why didn’t anyone do this when they were writing articles about how good the o3 prompt was?”

Because the meta around AI is not rigorous reporting on the nuance of capabilities but bold claims that are easy to retweet. There is no incentive to say “actually, AI is not good at this”. Nobody checked it because nobody cares.

There are lots of tasks that AI can be useful for but almost all of the headline claims (including Mythos) are exaggerated at best and bunk at worst.

An OpenAI model has disproved a central conjecture in discrete geometry

https://openai.com/index/model-disproves-discrete-geometry-conjecture/
1170•tedsanders•15h ago•856 comments

Show HN: Rmux – A programmable terminal multiplexer with a Playwright-style SDK

https://github.com/helvesec/rmux
15•shideneyu•44m ago•7 comments

GitHub confirms breach of 3,800 repos via malicious VSCode extension

https://www.bleepingcomputer.com/news/security/github-confirms-breach-of-3-800-repos-via-maliciou...
843•Timofeibu•20h ago•350 comments

Haskell Foundation 2026 Update

https://discourse.haskell.org/t/haskell-foundation-2026-update/14136
119•azhenley•7h ago•32 comments

Show HN: I reverse engineered Apple's video wallpapers

https://github.com/kageroumado/phosphene
283•kageroumado•10h ago•64 comments

Vivaldi 8.0

https://vivaldi.com/blog/vivaldi-on-desktop-8-0/
127•OuterVale•2h ago•57 comments

New features in GCC 16: Improved error messages and SARIF output

https://developers.redhat.com/articles/2026/04/28/gcc-16-improved-error-messages-sarif-output
81•siteshwar•2d ago•12 comments

The Letter S, by Donald Knuth (1980) [pdf]

https://gwern.net/doc/design/typography/1980-knuth.pdf
170•bambax•10h ago•21 comments

DOS Zone

https://dos.zone/
259•rglover•11h ago•56 comments

Typewise (YC S22) Is Hiring an AI Growth Engineer (Zurich or Remote)

https://www.ycombinator.com/companies/typewise/jobs/HmCzfBK-ai-growth-engineer
1•janisberneker•2h ago

Flipper One Tech Specs

https://docs.flipper.net/one/general/tech-specs
376•gregsadetsky•15h ago•133 comments

Anthropic is expanding to Colossus2. Will use GB200

https://twitter.com/nottombrown/status/2057194829986300375
202•aurareturn•13h ago•188 comments

All the bugs they found

https://andreapivetta.com/posts/all-the-bugs-they-found.html
26•ziggy42•1d ago•4 comments

How fast is N tokens per second really?

https://mikeveerman.github.io/tokenspeed/
417•hexagr•3d ago•82 comments

Archaeologists find Egyptian mummy buried with the 'Iliad'

https://www.openculture.com/2026/05/archaeologists-discover-ancient-egyptian-mummy-buried-with-pa...
137•diodorus•5d ago•94 comments

Simulating Infinity in Conway's Game of Life with Modern C++

https://ryanjk5.github.io/posts/GOLDE/
35•HeliumHydride•2d ago•6 comments

What is a Demand Coop

https://cahootzcoops.com/blog/what-is-a-demand-coop
65•DeonRob•8h ago•63 comments

OpenAI Is Preparing to File for an IPO Soon

https://www.wsj.com/tech/ai/openai-is-preparing-to-file-for-an-ipo-very-soon-0ec95af5
91•louiereederson•17h ago•226 comments

Saying goodbye to asm.js

https://spidermonkey.dev/blog/2026/05/20/saying-goodbye-to-asmjs.html
378•eqrion•22h ago•147 comments

Your Most Improbable Life

https://kevinkelly.substack.com/p/your-most-improbable-life
107•jger15•2d ago•72 comments

Reviving old scanners with an in-browser Linux VM bridged to WebUSB over USB/IP

https://yes-we-scan.app/details
75•gmac•2d ago•27 comments

Show HN: I made a tactical map-based WWII submarine simulator (public beta)

https://silentshark.app/alpha/
45•epaga•2d ago•15 comments

Recreate famous water profiles using supermarket bottled water

https://www.waterdictionary.net
45•smugglerFlynn•2d ago•25 comments

Google’s AI is being manipulated. The search giant is quietly fighting back

https://www.bbc.com/future/article/20260519-google-tackles-attempts-to-hack-its-ai-results
313•tigerlily•23h ago•193 comments

The famous O3 "GeoGuessr" prompt did not work

https://www.seangoedecke.com/the-o3-geoguessr-prompt-did-not-work/
15•ingve•1h ago•7 comments

Intuit to lay off over 3k employees to refocus on AI

https://techcrunch.com/2026/05/20/intuit-to-lay-off-over-3000-employees-to-refocus-on-ai/
192•wapasta•9h ago•141 comments

Numexpr: Fast numerical array expression evaluator for Python, NumPy, Pandas

https://github.com/pydata/numexpr
7•tosh•2d ago•0 comments

The Interview That Ships to Production: replacing whiteboards with pull requests

https://www.angellist.com/blog/the-interview-that-ships-to-production
26•asimov4•2d ago•7 comments

Qian Xuesen: The missile genius America lost and China gained (2025)

https://www.usni.org/magazines/naval-history/2025/december/missile-genius-america-lost-and-china-...
176•thnaks•16h ago•93 comments

Why is Inkwell stuck in review

https://www.manton.org/2026/05/19/why-is-inkwell-stuck-in.html
142•speckx•16h ago•46 comments