news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The famous O3 "GeoGuessr" prompt did not work

https://www.seangoedecke.com/the-o3-geoguessr-prompt-did-not-work/

15•ingve•1h ago

Comments

grebc•15m ago

I wonder if in all the sampling that all location meta data was stripped.

mickeyp•14m ago

This test would be a lot more useful if the author used images the models obviously hadn't seen before. Pulling images from Wikipedia? They'll have seen 'em before, and the metadata, and all the pages they were casually linked to.

The premise that the long prompt only made the model think 'a second longer' may have more to do with the fact that it knows about the images. So why think harder if you know the answer?

At no point does the author contemplate that.

vintermann•13m ago

They say they threw in some indoor images, presumably from around where they were.

vintermann•14m ago

Interesting what he reports, that newer models are worse at geolocation. Sorry if I'm getting paranoid, but I wonder if that's a deliberately nerfed capability.

Gys•12m ago

> I think this shows how easy it is to fool yourself about the quality of prompting. When the model is already pretty good at a task, you can give it a very elaborate prompt without impacting performance. It’ll still be pretty good, except this time it’s good because of what you did.

fontain•8m ago

“It’s also interesting to me that nobody checked this at the time. It took me about six hours of fairly-distracted work and about $15 to construct and run this benchmark. Why didn’t anyone do this when they were writing articles about how good the o3 prompt was?”

Because the meta around AI is not rigorous reporting on the nuance of capabilities but bold claims that are easy to retweet. There is no incentive to say “actually, AI is not good at this”. Nobody checked it because nobody cares.

There are lots of tasks that AI can be useful for but almost all of the headline claims (including Mythos) are exaggerated at best and bunk at worst.

An OpenAI model has disproved a central conjecture in discrete geometry

https://openai.com/index/model-disproves-discrete-geometry-conjecture/

1170•tedsanders•15h ago•856 comments

Show HN: Rmux – A programmable terminal multiplexer with a Playwright-style SDK

https://github.com/helvesec/rmux

15•shideneyu•44m ago•7 comments

GitHub confirms breach of 3,800 repos via malicious VSCode extension

https://www.bleepingcomputer.com/news/security/github-confirms-breach-of-3-800-repos-via-maliciou...

843•Timofeibu•20h ago•350 comments

Haskell Foundation 2026 Update

https://discourse.haskell.org/t/haskell-foundation-2026-update/14136

119•azhenley•7h ago•32 comments

Show HN: I reverse engineered Apple's video wallpapers

https://github.com/kageroumado/phosphene

283•kageroumado•10h ago•64 comments

Vivaldi 8.0

https://vivaldi.com/blog/vivaldi-on-desktop-8-0/

127•OuterVale•2h ago•57 comments

New features in GCC 16: Improved error messages and SARIF output

https://developers.redhat.com/articles/2026/04/28/gcc-16-improved-error-messages-sarif-output

81•siteshwar•2d ago•12 comments

The Letter S, by Donald Knuth (1980) [pdf]

https://gwern.net/doc/design/typography/1980-knuth.pdf

170•bambax•10h ago•21 comments

DOS Zone

https://dos.zone/

259•rglover•11h ago•56 comments

Typewise (YC S22) Is Hiring an AI Growth Engineer (Zurich or Remote)

https://www.ycombinator.com/companies/typewise/jobs/HmCzfBK-ai-growth-engineer

1•janisberneker•2h ago

Flipper One Tech Specs

https://docs.flipper.net/one/general/tech-specs

376•gregsadetsky•15h ago•133 comments

Anthropic is expanding to Colossus2. Will use GB200

https://twitter.com/nottombrown/status/2057194829986300375

202•aurareturn•13h ago•188 comments

All the bugs they found

https://andreapivetta.com/posts/all-the-bugs-they-found.html

26•ziggy42•1d ago•4 comments

How fast is N tokens per second really?

https://mikeveerman.github.io/tokenspeed/

417•hexagr•3d ago•82 comments

Archaeologists find Egyptian mummy buried with the 'Iliad'

https://www.openculture.com/2026/05/archaeologists-discover-ancient-egyptian-mummy-buried-with-pa...

137•diodorus•5d ago•94 comments

Simulating Infinity in Conway's Game of Life with Modern C++

https://ryanjk5.github.io/posts/GOLDE/

35•HeliumHydride•2d ago•6 comments

What is a Demand Coop

https://cahootzcoops.com/blog/what-is-a-demand-coop

65•DeonRob•8h ago•63 comments

OpenAI Is Preparing to File for an IPO Soon

https://www.wsj.com/tech/ai/openai-is-preparing-to-file-for-an-ipo-very-soon-0ec95af5

91•louiereederson•17h ago•226 comments

Saying goodbye to asm.js

https://spidermonkey.dev/blog/2026/05/20/saying-goodbye-to-asmjs.html

378•eqrion•22h ago•147 comments

Your Most Improbable Life

https://kevinkelly.substack.com/p/your-most-improbable-life

107•jger15•2d ago•72 comments

Reviving old scanners with an in-browser Linux VM bridged to WebUSB over USB/IP

https://yes-we-scan.app/details

75•gmac•2d ago•27 comments

Show HN: I made a tactical map-based WWII submarine simulator (public beta)

https://silentshark.app/alpha/

45•epaga•2d ago•15 comments

Recreate famous water profiles using supermarket bottled water

https://www.waterdictionary.net

45•smugglerFlynn•2d ago•25 comments

Google’s AI is being manipulated. The search giant is quietly fighting back

https://www.bbc.com/future/article/20260519-google-tackles-attempts-to-hack-its-ai-results

313•tigerlily•23h ago•193 comments

The famous O3 "GeoGuessr" prompt did not work

https://www.seangoedecke.com/the-o3-geoguessr-prompt-did-not-work/

15•ingve•1h ago•7 comments

Intuit to lay off over 3k employees to refocus on AI

https://techcrunch.com/2026/05/20/intuit-to-lay-off-over-3000-employees-to-refocus-on-ai/

192•wapasta•9h ago•141 comments

Numexpr: Fast numerical array expression evaluator for Python, NumPy, Pandas

https://github.com/pydata/numexpr

7•tosh•2d ago•0 comments

The Interview That Ships to Production: replacing whiteboards with pull requests

https://www.angellist.com/blog/the-interview-that-ships-to-production

26•asimov4•2d ago•7 comments

Qian Xuesen: The missile genius America lost and China gained (2025)

https://www.usni.org/magazines/naval-history/2025/december/missile-genius-america-lost-and-china-...

176•thnaks•16h ago•93 comments

Why is Inkwell stuck in review

https://www.manton.org/2026/05/19/why-is-inkwell-stuck-in.html

142•speckx•16h ago•46 comments