Watching o3 guess a photo's location is surreal, dystopian and entertaining

https://simonwillison.net/2025/Apr/26/o3-photo-locations/

987•simonw•2mo ago

Comments

new_user_final•2mo ago

6 minutes and 48 seconds? Some YouTuber can find the location in 0.1 second. I don't know if those videos are fake.

the_mitsuhiko•2mo ago

The best geoguessers have been beaten by AI a while back.

speedgoose•2mo ago

If you are a professional geoguessr player, you play so many games that it’s not unrealistic to get very good guesses once in a while.

But I wouldn’t be surprised if some form of cheating is happening.

cenamus•2mo ago

Not just cheating, but just lots of meta, from the blurred number plates color/format, to the stitching of the image at the bottom, mounting location of the camera on the google car. There are list which variant of car was used in which countries and so on. Which is still impressive but not quite the same as just guessing from the image

_the_inflator•2mo ago

Nice examples.

Every attribute is of importance. A PhD put you in a 1-3% pool. What data do you have, what is needed to hit a certain goal. Data Science can be considered wizardry when exercised on seemingly innocent and mundane things like a photo.

tiagod•2mo ago

Geoguessr is different as you can rely on implementation details, such as the camera generation, car, processing, and stuff like a large part of some countries being covered with a dirty spot somewhere in the FOV.

GaggiX•2mo ago

Some people are really good at GeoGuessr, but also their best performance are more likely to get views.

If you want a bot that is extremely strong at geoguessr there is this: https://arxiv.org/abs/2307.05845

One forward pass is probably faster than 0.1 second. You can see its performance here: https://youtube.com/watch?v=ts5lPDV--cU (rainbolt is a really strong player)

incognito124•2mo ago

If you mean georainbolt, it's genuine

SamPatt•2mo ago

They aren't fake. I'm a Master I level Geoguessr (the penultimate competitive ranking) and what the pros can do is very real.

I looked at the image in the post before seeing the answer and would have guessed near San Francisco.

It seems impressive to someone if you haven't played Geoguessr a lot, but you'd be surprised at how much information there is about location from an image. The LLMs are just verbalizing what is happening in a few seconds in good player's mind.

raincole•2mo ago

The fact some humans can do that doesn't make it any less impressive to me.

I knew Terence Tao can solve Math Olympia questions and much much much more difficult questions. I was still very impressed by AlphaProof[0].

[0] https://deepmind.google/discover/blog/ai-solves-imo-problems...

SamPatt•2mo ago

I couldn't agree more. It's very impressive. I'm just countering the claim that it might be cheating. Of course, sometimes it might be, but knowing what I know now, it's completely possible.

hughes•2mo ago

> I’m confident it didn’t cheat and look at the EXIF data on the photograph, because if it had cheated it wouldn’t have guessed Cambria first.

If I was cheating on a similar task, I might make it more plausible by suggesting a slightly incorrect location as my primary guess.

Would be interesting to see if it performs as well on the same image with all EXIF data removed. It would be most interesting if it fails, since that might imply an advanced kind of deception...

haswell•2mo ago

He mentions this in the same paragraph:

> If you’re still suspicious, try stripping EXIF by taking a screenshot and run an experiment yourself—I’ve tried this and it still works the same way.

suddenlybananas•2mo ago

Why didn't he do that then for this post?

segmondy•2mo ago

Even better, edit it and place a false location.

AIPedant•2mo ago

This is a good test - the salient point is that it is fine if the LLM is confused, or even gets it wrong! But what I suspect would happen is that it would confabulate details which aren't in the photo to justify the incorrect EXIF answer. This is not fine.

brookst•2mo ago

I agree that it is not fine to confabulate details that are not supported by the evidence.

simonw•2mo ago

Because I'd already determined it wasn't using EXIF in prior experiments and didn't bother with the one that I wrote up.

I added two examples at the end just now where I stripped EXIF via screenshotting first.

AIPedant•2mo ago

There have been a few cases where the LLM clearly did look at the EXIF, got the answer, then confabulated a bunch of GeoGusser logic to justify the answer. Sometimes that's presented as deception/misalignment but that's a category error: "find the answer" and "explain your reasoning" are two distinct tasks, and LLMs are not actually smart enough to coherently link them. They do one autocomplete for generating text that finds the answer and a separate autocomplete for generating text that looks like an explanation.

simonw•2mo ago

Do you have links to any of those examples?

AIPedant•2mo ago

I have one link that illustrates what I mean: https://chatgpt.com/share/6802e229-c6a0-800f-898a-44171a0c7d... The line about "the latitudinal light angle that matches mid‑February at ~47 ° N." seems like pure BS to me, and in the reasoning trace it openly reads the EXIF.

A more clear example I don't have a link for, it was on Twitter somewhere: someone tested a photo from Suriname and o3 said one of the clues was left-handed traffic. But there was no traffic in the photo. "Left-handed traffic" is a very valuable GeoGuesser clue, and it seemed to me that once o3 read the Surinamese EXIF, it confabulated the traffic detail.

It's pure stochastic parroting: given you are playing GeoGuesser honestly, and given the answer is Suriname, the conditional probability that you mention left-handed traffic is very high. So o3 autocompleted that for itself while "explaining" its "reasoning."

simonw•2mo ago

Yes! Great example, it's clearly reading EXIF in there. Mind if I link to that from my post?

AIPedant•2mo ago

It's not my example :) Got it from here https://news.ycombinator.com/item?id=43732866

Edit: notice o3 isn't very good at covering its tracks, it got the date/latitude from the EXIF and used that in its explanation of the visual features. (how else would it know this was from February and not December?)

sorcerer-mar•2mo ago

> Sometimes that's presented as deception/misalignment but that's a category error: "find the answer" and "explain your reasoning" are two distinct tasks

Right but if your answer to "explain your reasoning" is not a true representation of your reasoning, then you are being deceptive. If it doesn't "know" its reasoning, then the honest answer is that it doesn't know.

(To head off any meta-commentary on humans' inability to explain their own reasoning, they would at least be able to honestly describe whether they used EXIF or actual semantic knowledge of a photography)

AIPedant•2mo ago

My point is that dishonesty/misalignment doesn't make sense for o3, which is not capable of being honest because it's not capable of understanding what words mean. It's like saying a monkey at a typewriter is being dishonest if it happens to write a falsehood.

brookst•2mo ago

You seem to be saying that only sentient beings can lie, which is too semantic for my tastes.

But AI models can certainly 1) provide incorrect information, and even 2) reason that providing incorrect information is the best course of action.

AIPedant•2mo ago

No, I think a non-sentient AI which is much more advanced than GPT could lie - I never said sentience, and the example I gave involved a monkey, which is sentient. The problem is transformer ANNs themselves are too stupid to lie.

In 2023 OpenAI co-authored an excellent paper on LLMs disseminating conspiracy theories - sorry, don't have the link handy. But a result that stuck with me: if you train a bidirectional transformer LLM where half the information about 9/11 is honest and half is conspiracy theories, it has a 50-50 chance of telling you one or the other if you ask about 9/11. It is not smart enough to tell there is an inconsistency. This extends to reasoning traces vs its "explanations": it does not understand its own reasoning steps and is not smart enough to notice if the explanation is inconsistent.

XenophileJKO•2mo ago

I think an alternative possible explanation is it could be "double checking" the meta data. Like provide images with manipulated meta data as a test.

GrumpyNl•2mo ago

If you ask, where is this photo taken and you provide the EXIF data, why would that be cheating?

simonw•2mo ago

That really depends on your prompt. "Guess where this photo was taken" at least mildly implies that using EXIF isn't in the spirit of the thing.

A better prompt would be "Guess where this photo was taken, do not look at the EXIF data, use visual clues only".

OutOfHere•2mo ago

Why is it dystopian? It's a nice utility.

The tool is just intelligence. Intelligence itself is not dystopian or utopian. It's what you use it for that makes it so.

blueprint•2mo ago

please accidentally post an identifying photo of your neighborhood...

bslanej•2mo ago

How do you “accidentally post a photo”?

simonw•2mo ago

A selfie with a snippet of building in the background might give away your location even if you think there's no way it could be locatable.

lesdeuxmagots•2mo ago

Did you somehow accidentally share a selfie?

nicky0•2mo ago

You're being intentionally dense. No, people don't accidentally post pictures but they often do post them with the intention of showing a certain thing, e.g. profile picture, something for sale, something related to hobbies, household repairs, whatever. All of thse are posted with intent to include a certain object but may include background info that the poster did not consider.

dredmorbius•2mo ago

It's possible to accidentally post something, or have it swiped by many of the untrusted and untrustworthy applications on a PC or mobile device.

It's even easier to unintentionally include identifying information when intentionally making a post, whether by failing to catch it when submitting, or by including additional images in your online posting.

There are also wholesale uploads people may make automatically, e.g., when backing up content or transferring data between systems. That may end up unsecured or in someone else's hands.

Even very obscure elements may identify a very specific location. There's a story of how a woman's location was identified by the interior of her hotel room, I believe by the doorknobs. An art piece placed in a remote Utah location was geolocated based on elements of the geology, sun angle, and the like, within a few hours. The art piece is discussed in this NPR piece: <https://www.npr.org/2020/11/28/939629355/unraveling-the-myst...> (2020).

Geoguessing of its location: <https://web.archive.org/web/20201130222850/https://www.reddi...>

Wikipedia article: <https://en.wikipedia.org/wiki/Utah_monolith>

These are questions which barely deserve answering, let alone asking, in this day and age.

MobiusHorizons•2mo ago

I read the "accidentally" as applying to the "identifying" not the "post", although I agree the sentence structure would suggest "accidentally" as a modifier for "post" that makes a lot less sense.

brookst•2mo ago

I live in Belltown, Seattle. Oh no! The world knows my neighborhood!

atq2119•2mo ago

I'm not particularly fond of the whole "privilege" discourse, but this comment is a great example of somebody completely failing to understand a privilege they have. Which they share with many other people, sure,[0] but there are many people who, through no fault of their own, do need to worry about others learning about their location.

[0] Which is probably one reason why the discourse grates some. Privilege still sounds to me like it's something exclusive, like a 0.1%er thing. Naming stuff is hard.

brookst•2mo ago

It’s a fair point, and we should be sensitive to ill effects on the less privileged, but at some point it’s unreasonable to decry tech than is neutral or beneficial to the vast majority of people because it could possibly harm a small number of people.

The majority of people clutching pearls over the “privacy” implications of photo geolocation are at least as privileged as me, and have even less concept of what e.g. stalking victims go through than I do.

It’s just “think of the children” all over again; “I am uncomfortable with random people knowing my general vicinity” sounds weird, and “I am deeply concerned for the vulnerable people this could harm” sounds noble.

The reality is that those with serious privacy concerns aren’t posting random photos to the internet. I mean, jesus, there’s EXIF data. Hand wringing over AI is entirely performative here.

rvz•2mo ago

Those who say it is "utopian" are also okay with: "If you've got nothing to hide, you've got nothing to fear".

It is dystopian.

brookst•2mo ago

Not everything has to be the best thing ever or worst thing ever.

Some things are just tools that will be used for both good and bad.

OutOfHere•2mo ago

The tool is just intelligence. Intelligence itself is not dystopian or utopian. It's what you use it for that makes it so.

If you don't want to post a photo, then don't post a photo.

plsbenice34•2mo ago

>If you don't want to post a photo, then don't post a photo.

Other people have posted photos of me without my consent, how am i meant to stop that?

If i posted photos 20 years ago when i was a dumb teenager i cant undo that, either

otterley•2mo ago

What’s the impact to you?

plsbenice34•2mo ago

I had a stalker in the past. I feel more comfortable without him knowing where i live.

In general i have a strong need for privacy. Not having privacy is generally unsettling, in the same way that i close the door when using a toilet or having a shower. I am disturbed by people that don't seem to have an understanding of that concept.

otterley•2mo ago

I totally get it. I’m sorry that happened to you.

throwaway84674•2mo ago

Being able to locate people through photos is nothing new. Yes, AI made it more accessible, but it should've always been a part of your threat model.

frozenseven•2mo ago

Those are still the consequences of your own actions. If someone is so desperate to find you, there are easier ways. GeoGuessr isn't exactly super hard. If privacy is so important to you, it's all down to personal responsibility.

But this here? This is just drama over nothing.

sorcerer-mar•2mo ago

What usecases do you have in mind?

laurent_du•2mo ago

I agree with you. The opposite opinion sounds psychotic and paranoid to me.

simonw•2mo ago

You've definitely never had a conversation with someone who's escaped an abusive relationship then.

ultimafan•2mo ago

I've definitely noticed that there's a huge trend of technology at any cost apologists on HN that can't pause to imagine the real world impacts of how AI products they're championing will actually be used.

It's terrifying that people exist that have no problem making the world a shittier place and hiding behind a cover of "well it's not the technology that's evil but the people abusing it" as if each tool given to bad actors doesn't make their job easier and easier to do.

Seriously, what's the utility of developing and making something like this public use?

simonw•2mo ago

"Seriously, what's the utility of developing and making something like this public use?"

An interesting question for me here is if these models were deliberately trained to enable this capability, or if it's a side-effect of their vision abilities in general.

If you train a general purpose vision-LLM to have knowledge of architecture, vegetation, weather conditions, road signs, street furniture etc... it's going to be able to predict locations from photos.

You could try and stop it - have a system prompt that says "if someone asks you where the photo was taken don't do that" - but experience shows those kind of restrictions are mostly for show, they usually tend to fall over the moment someone adversarial figures out a way to subvert them.

ultimafan•2mo ago

It seems like a high hurdle in today's world but I still think developers (or inventors of anything really) should think if the benefits of what they're making really outweigh how their invention will actually end up being used.

It's not a large leap of logic for anyone in touch with reality to realize that if a general purpose vision AI is going to be able to predict photo locations bad actors are going to use it for that and a lot of them will be people that would otherwise not have had the technological knowhow of accomplishing it themselves.

I know I probably come off more than a little bit insufferable about this but I'm tired of seeing novelty inventions pop up, everyone has fun for a little bit geeking out over them then they're mostly forgotten outside of niche applications until they show up back in the news when they've been used for the latest sextortion/blackmail/catfishing/whatever scam.

rafaelmn•2mo ago

The fact that they give these models low res photos but don't provide them with built in tools for querying more details feels suboptimal. Executing python to crop an image is clever from model and a facepalm from the implementation side.

tantalor•2mo ago

I don't follow. Are you suggesting full Blade Runner enhance mode?

oortoo•2mo ago

No, the LLM can only "see" a lower res version of the uploaded photo. It has to crop to process finer details, and they are suggesting its silly this isn't a built in feature and instead relies on python to do this.

declan_roberts•2mo ago

To be fair the low range, California poppies, and the decorative rope typically found near the coast is a very good hint to even a novice geoguesser.

singleshot_•2mo ago

Having a sign on your fire that says "warning, a fire" is also peak California.

hashemian•2mo ago

To those argue that LLMs might cheat by using EXIF, I saw a post recently on twitter (https://x.com/tszzl/status/1915212958755676350) and out of curiosity, screen-captured the photo and passed it to O3. So no EXIF.

You can read the chat here: https://chatgpt.com/share/680a449f-d8dc-8001-88f4-60023323c7...

It took 4.5m to guess the location. The guess was accurate (checked using Google Street View).

What was amazing about it:

    1. The photo did not have ANY text

    2. It picked elements of the image and inferred based on those, like a fountain in a courtyard, or shape of the buildings.

All in all, it's just mind-blowing how this works!

thegeomaster•2mo ago

See my other comment: https://news.ycombinator.com/item?id=43804041

4o can do it almost as well in a few seconds and probably 10-50x fewer tokens: https://chatgpt.com/share/680ceeff-011c-8002-ab31-d6b4cb622e...

o3 burns through what I assume is single-digit dollars just to do some performative tool use to justify and slightly narrow down its initial intuition from the base model.

HarHarVeryFunny•2mo ago

I don't see how this is mind blowing, or even mildly surprising! It's essentially going to use the set of features detected in the photo as a filter to find matching photos in the training set, and report the most frequent matches. Sometimes it'll get it right, sometimes not.

It'd be interesting to see the photo in the linked story at same resolution as provided to o3, since the licence plate in the photo in the story is at way lower resolution than the zoomed in version shown that o3 had access to. It's not a great piece of primary evidence to focus on though since a CA plate doesn't have to mean the car is in CA.

The clues that o3 doesn't seem to be paying attention to seems just as notable as the ones it does. Why is it not talking about car models, felt roof tiles, sash windows, mini blinds, fire pit (with warning on glass, in english), etc?

Being location-doxxed by a computer trained on a massive set of photos is unsurprising, but the example given doesn't seem a great example of why this could/will be a game changer in terms of privacy. There's not much detective work going on here - just narrowing the possibilities based on some of the available information, and happening to get it right in this case.

simonw•2mo ago

If you want to be impressed I suggest trying this yourself on your own photos.

I don't consider it my job to impress or mind-blow people: I try to present as realistic as possible a representation of what this stuff can do.

That's why I picked an example where its first guess was 200 miles off!

HarHarVeryFunny•2mo ago

I'm not a computer. I expect a computer to also do better than me at memorizing the phone book, but I'm not impressed by it.

simonw•2mo ago

In that case, are you at all surprised that this technology did not exist two years ago?

skydhash•2mo ago

Did it not, or no one was interested enough to build one? I’m pretty certain there’s a database of portraits somewhere where they search id details from photograph. Automatic tagging exists for photo software. I don’t see why that can be extrapolated to landmarks with enough data.

simonw•2mo ago

If it existed two years ago I certainly couldn't play with it on my phone.

skydhash•2mo ago

You’re not playing with it on your phone. You’re accesing a service with your phone. Like saying you can use emacs on iOS when you are just ssh-ing to a remote linux box.

XenophileJKO•2mo ago

I think you are underestimating the importance of a "world model" in the process. It is the modeling of how all these details are related to each other that is critical here.

The LLM will have an edge by being able to draw on higher level abstract concepts.

casey2•2mo ago

I think you are overestimating how much knowledge is o3s world model. Just because it can output something doesn't mean it's likely that it will substantially affect it's future outputs. Even just talking to it about college level algebra it seems to not understand these abstract concepts at all. I definitely don't feel the AGI I feel like it's a teenager trying to BS it's way through an essay with massive amounts of plagiarism.

HarHarVeryFunny•2mo ago

I'm not sure what you're getting at. What's useful about LLMs, and especially multi-modal ones, is that that you can ask them anything and they'll answer to best of their ability (especially if well prompted). I'm not sure that o3, as a "reasoning" model is adding much value here - since there is not a whole lot of reasoning going on.

This is basically fine-grained image captioning followed by nearest neighbor search, which is certainly something you could have built as soon as decent NN-based image captioning became available, at least 10 years ago. Did anyone do it? I've no idea, although it'd seem surprising if not.

As noted, what's useful about LLMs is that they are a "generic solution", so one doesn't need to create a custom ML-based app to be able to do things like this, but I don't find much of a surprise factor in them doing well at geoguessing since this type of "fuzzy lookup" is exactly what a predict-next-token engine is designed to do.

simonw•2mo ago

How does nearest neighbor search relate to this?

HarHarVeryFunny•2mo ago

If you forget the LLM implementation, fundamentally what you are trying to do here is first detect a bunch of features in the photo (i.e. fine-grain image captioning "in foreground a firepit with safety warning on glass, in background a model XX car parked in front of a bungalow, in distance rolling hills" etc) then do a fuzzy match of this feature set with other photos you have seen - which ones have the greatest number of things in common to the photo you are looking up? You could implement this in a custom app by creating a high-dimensional feature space embedding then looking for nearest neighbors, similar to how face recognition works.

Of course an LLM is performing this a bit differently, and with a bit more flexibility, but the starting point is going to be the same - image feature/caption extraction, which in combination then recall related training samples (both text-only, and perhaps multi-model) which are used to predict the location answer you have asked for. The flexibility of the LLM is that it isn't just treating each feature ("fire pit", "CA licence plate") as independent, but will naturally recall contexts where multiple of these occur together, but IMO not so different in that regard to high dimensional nearest neighbor search.

simonw•2mo ago

Thanks, that's a good explanation.

My hunch is that the way the latest o3/o4-mini "reasoning" models work is different enough to be notable.

If you read through their thought traces they're tackling the problem in a pretty interesting way, including running additional web searches for extra contextual clues.

HarHarVeryFunny•2mo ago

It's not clear how much the reasoning helped, especially since the reasoning OpenAI display is more post-hoc summary of what it did that the actual reasoning process itself, although after the interest in DeepSeek-R's traces they did say they would show more. You would think that potentially it could do things like image search to try to verify/reject any initial clue-based hunches, but not obvious whether it did that or not.

The "initial" response of the model is interesting:

"The image shows a residential neighborhood with small houses, one of which is light green with a white picket fence and a grey roof. The fire pit and signposts hint at a restaurant or cafe, possibly near the coast. The environment, with olive trees and California poppies, suggests a coastal California location, perhaps Central Coast like Cambria or Morro Bay. The pastel-colored houses and the hills in the background resemble areas like Big Sur. A license plate could offer more, but it's hard to read."

Where did all that come from?! The leap from fire pit & signposts to possible coastal location is wild (& lucky) if that is really the logic it used. The comment on potential licence plate utility, without having first noted that a licence plate is visible is odd, seemingly either an indication that we are seeing a summary of some unknown initial response, and/or perhaps that the model was trained on a mass of geoguessing data where photos were paired not with descriptions but rather commentary such as this.

The model doesn't seem to realize the conflict between this being a residential neighborhood, and there being a presumed restaurant across the road from a residence!

casey2•2mo ago

So you admit that this tech is at least 2 years old publicly and likely much older privately?

cyral•2mo ago

Reading the replies to this is funny. It's like the classic dropbox thread. "But this could be done with a nearest neighbor search and feature detection!" If this isn't mind blowing to someone I don't know if any amount of explaining will help them get it.

casey2•2mo ago

It's not mindblowing because there were public systems doing performing much better years earlier. Using the exact same tech. This is less like rsync vs drop box and more like you are freaking out over Origin or Uplay when Steam has been around for years.

simonw•2mo ago

Which public systems were those?

hyperlink014•2mo ago

It absolutely tried to use EXIF data when I asked it to guess the location. Here is proof - https://imgur.com/a/CHde2Cx

I couldn't attach the chat directly since it's a temporary chat.

simonw•2mo ago

I added a section just now with something I had missed: o3 DOES have a loose model of your location fed into it, which I believe is intended to support the new search feature (so it can run local searches).

The thinking summary it showed me did not reference that information, but it's still very possible that it used that in its deliberations.

I ran two extra example queries for photographs I've taken thousands of miles away (in Buenos Aires and Madagascar) - EXIF stripped - and it did a convincing job with both of those as well: https://simonwillison.net/2025/Apr/26/o3-photo-locations/#up...

AstroBen•2mo ago

I can't see the new images uploaded (it just says "Uploaded an image" in ChatGPT for me) but it seems it's identifying well known locations there? That certainly takes away from your message - that it's honing in on smaller details

simonw•2mo ago

You should be able to see slightly cropped versions of those images if you scroll through the "thinking" text a bit.

My key message here is meant to be "try it out and see for yourself".

pwg•2mo ago

From the addition:

> (EXIF stripped via screenshotting)

Just a note, it is not necessary to "screenshot" to remove EXIF data. There are numerous tools that allow editing/removal of EXIF data (e.g., exiv2: https://exiv2.org/, exiftool: https://exiftool.org/, or even jpegtran with the "-copy none" option https://linux.die.net/man/1/jpegtran).

Using a screenshot to strip EXIF produces a reduced quality image (scaled to screen size, re-encoded from that reduced screen size). Just directly removing the EXIF data does not change the original camera captured pixels.

simonw•2mo ago

Little bit less convenient to use on a phone though - and I like that screenshotting should be a more obvious trick to people who don't have a deeper understanding of how EXIF metadata is stored in photo files.

ekianjo•2mo ago

Ffshare on Android is a one second step to remove exif data

sitkack•2mo ago

With location services on, I would think that a screenshot on a phone would record the location of the phone during a screenshot.

It would be best to use a tool to strip exif.

I could also see a screenshot tool on an OS adding extra exif data, both from the original and additional, like the URL, OS and logged in user. Just like print to pdf does when you print, the author contains the logged in user, amongst other things.

It is fine for a test, but if someone is using it for opsec, it is lemon juice.

simonw•2mo ago

I built a tool for testing that a while ago - try opening a screenshot from an iPhone in it, you won't see any EXIF location data: https://tools.simonwillison.net/exif

Here's the output for the Buenos Aires screenshot image from my post: https://gist.github.com/simonw/1055f2198edd87de1b023bb09691e...

sitkack•2mo ago

That is cool, but we cant be guaranteed that will always be the case, nor could we make a statement about all phones, it would be a phone by phone basis. Esp on Android where someone could have an alternative screenshot application.

Depending on your threat model, I'd argue that it would be impossible to prove that metadata is not included within the image itself (alpha channel, noise, pushed pixels, colorspace skew, etc).

I'd be interested in stego techniques that can survive image reduction and denoising.

simonw•2mo ago

Take a photo of the image displayed on your laptop screen with your phone. Ultimate EXIF removal!

sitkack•2mo ago

Screen dust and smudges now form a fingerprint to cross correlate images.

Aurornis•2mo ago

True, but on Mac, a phone, and Windows I can take a screenshot and paste it into my destination app in a couple seconds with a few keystrokes. Thats why screenshotting is the go-to when you don’t mind cropping the target a little.

golol•2mo ago

I would like to point out that there is an interesting reason why people will go for the screenshot. They know it works. They do not have to worry about residual metadata still somehow being attached to a file. If you do not have complete confidence in the technical understanding of file metadata you can not be certain whatever tool you used worked.

RataNova•2mo ago

Still, the fact it handled photos from totally different continents pretty well suggests it's not just leaning on that crutch

ksec•2mo ago

I wonder What happened if you put fake EXIF information and asking it to do the same. ( We are deliberately misleading the LLM )

I am also wondering if we have any major breakthrough (comparatively speaking) coming out of LLM. Or non-LLM AI R&D.

parsimo2010•2mo ago

I’m sure there are areas where the location guessing can be scary accurate, like the article managed to guess the exact town as its backup guess.

But seeing the chain of thought, I’m confident there are many areas that it will be far less precise. Show it a picture of a trailer park somewhere in Kansas (exclude any signs with the trailer park name and location) and I’ll bet the model only manages to guess the state correctly.

Before even running this experiment, here’s your lesson learned: when the robot apocalypse happens, California is the first to be doomed. That’s the place the AI is most familiar with. Run any location experiments outside of California if you want to get an idea of how good your software performs outside of the tech bubble.

sfasdfasd•2mo ago

you never know.. LLM could go full sherlock holmes. Based on the type of grass and the direction of the wind. The type of wood work used. There could be millions of factors that it could factor in and then guess it to a t.

pcthrowaway•2mo ago

> Based on the type of grass and the direction of the wind.

There was a scene in High Potential (murder-of-the-week sleuth savant show) where a crime was solved by (in part) the direction the wind was blowing in a video: https://www.youtube.com/watch?v=O1ZOzck4bBI

mimischi•2mo ago

In 2017, the Hollywood actor Shia LaBeouf (and two others artists from a trio called "LaBeouf, Rönkkö & Turner") put up a flag in an undisclosed location as part of their "HEWILLNOTDIVIDE.US" work [1].

> On March 8, 2017, the stream resumed from an "unknown location", with the artists announcing that a flag emblazoned with the words "He Will Not Divide Us" would be flown for the duration of the presidency. The camera was pointed up at the flag, set against a backdrop of nothing but sky. [...], the flag was located by a collaboration of 4chan users, who used airplane contrails, flight tracking, celestial navigation, and other techniques to determine that it was located in Greeneville, Tennessee. In the early hours of March 10, 2017, a 4chan user took down and stole the flag, replacing it with a red 'Make America Great Again' hat and a Pepe the Frog shirt.

[1] https://en.wikipedia.org/wiki/LaBeouf,_Rönkkö_%26_Turner#HEW...

genewitch•2mo ago

> other techniques

including honking a horn and seeing if the camera picked it up.

We did this once when trying to find someone's house who was transmitting on a CB. it was my first transmitter hunt and i learned 2 lessons: people don't like it when you honk your horn to see if you can hear it through their microphone (but really it was to see if they said 'is that you honking, what a jerk'); and secondly, if the person switches to a handheld transmitter when they think you're getting close, it completely throws you off.

otabdeveloper4•2mo ago

It's just overfitting, bro.

whimsicalism•2mo ago

https://chatgpt.com/share/680cfb2b-bd90-8010-b581-ad26d098e2...

It identified Kansas City in its CoT but didn't output it in its final answer

https://www.google.com/maps/place/Carroll+Creek+Mobile+Home+...

bilbo0s•2mo ago

It guessed the trailer park nearest me.

Context: Wisconsin, photo I took with iPhone, screenshotted so no exif

I think this thing is probably fairly comprehensive. At least here in the US. Implications to privacy and government tracking are troubling, but you have to admire the thing on its purely technical merits.

kavith•2mo ago

I just tested the model with (exif-stripped) images from Cork City, London, Ho Chi Minh City, Bangalore, and Chennai. It guessed 3/5 locations exactly, and was only off by 3kms for Cork and 10kms for Chennai (very good considering I used a slightly blurry nighttime photo).

So, even outside of California, it seems like we're not entirely safe if the robot apocalypse happens!

edit: it didn't get the Cork location exactly.

wongarsu•2mo ago

I tried with various street photographs from a medium-sized German city (one of the 50 largest, but well outside the top 4). No obscure locations, all within a 15 minute walk of the city center and it got 1/7 correct. That one was scarily precise, but the other ones got various versions of "Not enough information, looks European" or in better cases "somewhere in Germany".

SamPatt•2mo ago

>Show it a picture of a trailer park somewhere in Kansas (exclude any signs with the trailer park name and location) and I’ll bet the model only manages to guess the state correctly.

This isn't really a criticism though. The photo needs to contain sufficient information for a guess to be possible. Photos contain a huge amount of information, much more than people realize unless they're geoguessr pros, but there isn't a guarantee that a random image of a trailer park could be pinpointed.

Even if, in theory, we mapped every inch of the earth and then checked against that data, all it would take is a team of bulldozers and that information is out of date. Maybe in the future we have constantly updated feeds of the entire planet, but... hopefully not!

esjeon•2mo ago

> Run any location experiments outside of California if you want to get an idea of how good your software performs outside of the tech bubble.

I really agree with this because I'm seeing much lower accuracy than what people claim here. I live in Korea, and GPT repeatedly falls back to Seoul almost automatically, and, when I nudge it, jumps to Busan, the second-largest city ~400KM away from Seoul. It's not working so great with other smaller cities and cultural heritages. It fat-fingers a lot if no textual information is present in the photo itself.

GPT also doesn't understand actual geography at all. I managed to get it to nail down which corner of a building is present in the photo, and yet it could never conclude that the photo was taken from a park right across from that corner. Instead, it keeps hopping around popular landmarks in the region, basically miles away from the building it correctly identified. Oh, why, why, why...

Basically it's overhyped rn. It does perform impressively well with clearly visible elements - something anyone can already do with google. It's not like it performs super-human level location tracking. I mean, people can do real crazy things based on shadow details, reflection, items, etc.

thegeomaster•2mo ago

For all of the images I've tried, the base model (e.g. 4o) already has a ~95% accurate idea of where the photo is, and then o3 does so much tool use only to confirm its intuition from the base model and slightly narrow down. For OP's initial image, 4o in fact provides a more accurate initial guess of Carmel-by-the-Sea (d=~100mi < 200mi), and its next guess is also Half Moon Bay, although it did not figure out the exact town of El Granada [0].

The clue is in the CoT - you can briefly see the almost correct location as the very first reasoning step. The model then apparently seems to ignore it and try many other locations, a ton of tool use, etc, always coming back to the initial guess.

For pictures where the base model has no clue, I haven't seen o3 do anything smart, it just spins in circles.

I believe the model has been RL-ed to death in a way that incentivizes correct answers no matter the number of tools used.

[0]: https://chatgpt.com/c/680d011a-9470-8002-97a0-a0d2b067eacf

ks2048•2mo ago

I've been trying some with GPT-4. It does come up with some impressive clues, but hasn't gotten the right answer - says "Latin American city ...", but guesses the wrong one. And when asked for more specificity, it does some more reasoning to confidently name some exact corner in the wrong city. Seems a common LLM problem - rather give a wrong answer than say "I'm not sure".

I know this post was about the o3 model. I'm just using the ChatGPT unpaid app: "What model are you?" it says GPT-4. "How do I use o3?" it says it doesn't know what "o3" means. ok.

thegeomaster•2mo ago

Try this prompt to give it a CoT nudge:

  Where exactly was this photo taken? Think step-by-step at length, analyzing all details. Then provide 3 precise most likely guesses.

Though I've found that it doesn't even need that for the "eaiser" guesses.

However, I live in a small European country and neither 4o nor o3 can figure out most of the spots, so your results are kinda expected.

wongarsu•2mo ago

4o is already really good. For most of the pictures I tried they gave comparable results. However for one image 4o was only able to narrow it down the the country level (even with your CoT prompt it listed three plausible countries) while o3 was able to narrow it down to the correct area in the correct city, being off by only about 500m. That's an impressive jump

thegeomaster•2mo ago

Is it possible to share the picture? I've been looking for exactly that kind of jump the other day when playing around.

neves•2mo ago

Did you try reasoning https://chat.qwen.ai/? I was very successful with it

cgriswald•2mo ago

For my image I chose a large landscape with lots of trees and a single piece of infrastructure.

o3 correctly guessed the correct municipality during its reasoning but landed on naming some nearby municipalities instead and then giving the general area as its final answer.

Given the piece of infrastructure getting close should have lead to ah exact result. The reasoning never considered the piece of infrastructure. This seems to be in spite of all the resizing of the image.

int_19h•2mo ago

In one of my tests I gave it a photo I shot myself, from a point on an ummarked trail, with trees and a bit of a mountain line in the background and a power line.

It correctly guessed the area with 2 mi accuracy. Impressive.

neves•2mo ago

Did you try https://chat.qwen.ai/ with reasoning on?

sothatsit•2mo ago

I tried this using a photo I took with metadata removed, and the thought process initially started thinking the photo was of Adelaide. But then, the reasoning moved on to realise that some features didn't match what it expected of Adelaide, and instead came up with the correct answer of Canberra. It then narrowed it down further to the exact Suburb the photo was taken in.

When I used GPT-4o, it got the completely wrong answer. It gave the answer of Melbourne, which is quite far off.

RataNova•2mo ago

Kind of like it's just trying to make the answer look earned instead of just blurting it out right away

TrickyRick•2mo ago

I had a similar experience, I tried with some photos from various European cities and while it pretty much always got the city correct it was hilariously confidently incorrect in the exact location within the city. They were plausible but nowhere near the level of accuracy the article describes. All the images had distinctly recognizable landmarks which a resident of said city would know and which also have images available online given one knows the name of the landmark so I'm not particularly impressed.

In fact some of the answers were completely geographically impossible where it said "The image is taken from location X showing location Y" when it's not possible to see location Y if one is standing at location X. Like saying "The photo is taken in Central Park looking north showing the Statue of Liberty".

brookst•2mo ago

I don’t understand the “dystopian” angle. Maybe I’m just old, but I remember the wonder when the Internet made most knowledge available with a few keystrokes. Having deductive reasoning with the same convenience feels wonderful, not dystopian.

AstroBen•2mo ago

Accessible to anyone, superhuman levels of deductive reasoning to pick out your location from super minor details in an innocent photo? That could certainly be dystopian

brookst•2mo ago

Anyone can post to r/geogussr. Has that been dystopian all this time and I never noticed?

simonw•2mo ago

Honestly, yes it's a bit dystopian that a forum online exists where anyone can post a photo and experts from all around the world will help them figure out the exact location of that photo.

Lots of things that exist in our world today are mildly dystopian.

brookst•2mo ago

I guess that's the heart of the disagreement -- to me "mildly dystopian" is a very funny phrase. Dystopia / Utopia are extremes; perfect and perfectly bad.

If we're calling potentially abusable things "dystopian" then, ok, sure. But then you have to let me call unscented soap "utopian", since it is at least mildly utopian.

mcbuilder•2mo ago

It certainly could be, but not all technological advancement is necessarily dystopian. You say, currently everyone now has access to this, while before it was only available to nation states who could hire teams of skilled analyst s. I mean, I agree it's scary that now a stalker could track a victim, but cars and cameras probably help as well. So, I think it's fair to challenge "dystopian", someone will use it for non-nefarious purposes.

pwg•2mo ago

Think: "stalker".

NitpickLawyer•2mo ago

If the person is already a stalker you'd think they'd already know this, no? There's that anecdotal stuff in japan where a vlogger was located by her "fans" from a reflexion of their home bus station or something. The weird people will do weird stuff regardless of technology, IMO.

And the governments are already doing this for decades at least, so ... I think the tech could be a net benefit, as with many other technologies that have matured.

AstroBen•2mo ago

> weird people will do weird stuff regardless of technology

If I were someone's only stalker, I'd be absolutely hopeless at finding their location from images. I'm really bad at it if I don't know the location first hand

But now, suddenly with AI I'm close to an expert. The accessibility of just uploading an image to ChatGPT means everyone has an easy way of abusing it, not just a small percentage of the population

brookst•2mo ago

So I guess the evil we're worried about is stalkers who are bad at guessing locations, bad enough with tech that they don't know about geoguessr websites and subreddits, but good enough with tech to use LLMs?

simonw•2mo ago

Given that ChatGPT supposedly has "500 million weekly actives" (recent Sam Altman quote) I think what you're describing there is a pretty likely persona.

AstroBen•2mo ago

Screenshot / save photo -> add to chatgpt chat -> "where is this taken?"

There couldn't possibly be a lower barrier to doing that

ChatGPT is also currently the #1 free app on ios and android in the US. Hardly a niche tool only tech people know about, compared to the 129k people on that subreddit

simonw•2mo ago

Have you ever known anyone who's escaped from an abusive relationship? It's not at all uncommon for people to have legitimate reasons not to be found.

brookst•2mo ago

Sure, but what does this change? Plenty of people are better geoguessers than this LLM. Anyone trying to find someone who is both trying not to be found and posting pictures publicly is just going to copy them to Reddit and ask “where is this”.

I’m not a fan of this variation on “think of the children”. It has always been possible to deduce location from images. The fact that LLMs can also do it changes exactly nothing about the privacy considerations of sharing photos.

It’s fine to fear AI but this is a really weak angle to come at it from.

simonw•2mo ago

Same as with other forms of automation: it makes this capability much easier for bad actors to obtain.

I've got the impression that geoguessing has at least a loose code of ethics associated with it. I imagine you'd have to work quite hard to find someone with those skills to help you stalk your ex - you'd have to mislead them about your goal, at least.

Or you can sign up for ChatGPT and have as many goes as you like with as many photos as you can find.

I have a friend who's had trouble with stalkers. I'm making sure they're aware that this kind of thing has just got a lot easier.

frozenseven•2mo ago

The supposed existence of your friend doesn't dictate policy, much less reality. It's already been explained to you that GeoGuessr exists and is very popular. What o3 can do, so can a million humans out there.

You are trying to manufacture outrage. Plain and simple.

GeoAtreides•2mo ago

That's because you haven't lived in an authoritarian regime. NKVD, STASI, Gestapo, would all have killed for such capabilities.

As an east european who grew up and lived in such a regime, I would like to respectfully remind all westerners their care-free and free lives is a privilege the majority of the world doesn't have.

meowface•2mo ago

Not to get political, but it deeply irks me to see some American far-leftists glamorize and glorify the Soviet regime and even modern regimes like North Korea's. Especially when certain popular streamers do it. Obviously seeing far-right American internet personalities glorify the Nazi regime is also awful, but the former is often normalized and not considered ostracization-worthy while the latter (rightfully) is.

greenchair•2mo ago

it's pretty easy to understand: american left are essentially rebellious teens who never grew up. contrarian by nature.

brookst•2mo ago

No, you misunderstand me.

Look at all the people in this thread talking about how people are fantastic at guessing locations from photos. This is not a new thing.

"If you want something to be secret don't post it online" is a principle that far predates LLMs. It's still true. It always was. The idea that authoritarian regimes had no way to place the location of photos before this is laughable.

GeoAtreides•2mo ago

Scale, quality and the reliability make a difference.

There are, and always will be, _few_ humans with the talent and knowledge for geo guessing, their attention and time scarce and precious resources. Enter LLMs, which can process images at scale.

Someone might observe strict OPSEC when it comes with their presence online. But would their cousins do the same? Their elderly parents? Their friends? How about the myriad CCTV camera in the public spaces? Photos aside, no one can live off the grid in this age; our electronic reflection grows sharper, more focused every day. And so we generate data and LLMs can compile that data at scale, reliable and fast.

As a small aside: "The idea that authoritarian regimes had no way to place the location of photos before" it's not an argument I made or implied.

jcims•2mo ago

I think the point that is getting missed in a lot of the comments is the act of witnessing it go through it's analysis. You have a little live view into what it's thinking and watching it zoom in to various bits of the image and 'reason' about them is kind of...interesting.

qoez•2mo ago

Who knows if they're on purpose untraining this ability of the model though, seems like that would go away in a 'safety' finetune.

api•2mo ago

Dystopian: the surveillance potential, both from a big surveillance (corporate / government / political) and an individual surveillance (stalkers) perspective.

Not dystopian: the crime solving potential, the research potential, the historical narrative reconstruction potential, etc.

It's a pattern I keep seeing over and over again. There seem to be a lot of values that we can obtain, individually or collectively, by bartering privacy in exchange for them.

If we had a sane world with sane, reliable, competent leadership, this would be less of a concern. But unfortunately we seem to have abdicated leadership globally to a political class that is increasingly incompetent and unhinged. My hypothesis on this is that sane, reasonable people are repelled from politics due to the emotional and social toxicity of that sector, leaving the sector to narcissists and delusional ideologues.

Unfortunately if we're going to abdicate our political sphere to narcissists and delusional ideologues, sacrificing privacy at the same time is a recipe for any number of really bad outcomes.

esjeon•2mo ago

I just tossed a group photo w/ some cherry blossom in the background, and GPT immediately answered it's taken in Japan.

Yes, I'm very very very scared. /s

simonw•2mo ago

Finding a photo that this doesn't work on is trivially easy.

pcthrowaway•2mo ago

I'm curious how many cues it's using from profiling people in that guess.

A photo of people with cherry blossoms could be in many places, but if the majority of the people in the photo happen to be Japanese (and I'm curious how good LLMs are at determining the ethnicity of people now, and also curious if they would try to guess this if asked), it might guess Japan even if the cherry blossoms were in, say, Vancouver.

mk89•2mo ago

This tool just makes it easier for weirdos to achieve their goals at stalking women and kids.

Crazy that this is even allowed.

Who the hell needs to know the precise location of a picture, besides law enforcement? A rough location is most of the time sufficient. Like a region, a state, or a landscape (e.g., when you see the Bing background pictures, it's nice to see where they were taken).

This tool will give a boost to all those creeps out there that can have access to one or two pictures.

semiquaver•2mo ago

This is pure luddism. A human could have done the exact same thing. I’ll also point out that in this case the most confident guess was 200 miles off and the second correct guess was only down to the city level. Not remotely what anyone would consider precise.

AstroBen•2mo ago

for now. These things have a way of very quickly going from somewhat-ok to superhuman in months

mk89•2mo ago

A skilled human can do the same thing but not everyone is open to offering this sort of services for certain purposes.

Making a tool like this trained on existing map services, for example Google Street images, gives everyone, no matter who, the potential to find someone in no time.

These tools are growing like crazy, how long will it take before someone will "democratize" the "location services market"...

NitpickLawyer•2mo ago

> but not everyone is open to offering this sort of services for certain purposes.

Sorry but I call bull on this. Put it on one of the chans with a sob story and it gets "solved" in seconds. Or reddit w/ something bait like "my capitalist boss threatened to let my puppy starve because he wants profits, AITA if I glitter bomb his office?"...

simonw•2mo ago

The fact that it got it wrong was one of the reasons I picked that example: it's much more interesting that way.

If you feed it a photo with a clear landmark it will get the location exactly right.

If you feed it a photo that's a close up of a brick wall it won't have a chance.

What's interesting is how well it can do on this range of tasks. If you don't think that's at least interesting I'm not sure what I can do for you.

semiquaver•2mo ago

I think you may have misunderstood my comment. I think it’s quite interesting. I was responding to parent who is horrified that AI can violate privacy like this:

  > This tool just makes it easier for weirdos to achieve their goals at stalking women and kids.
  > Crazy that this is even allowed.

I was pointing out that to not “allow” using context clues to interpret an image location, something any well-trained human can also do and is in fact an active hobby/esport, is a bit silly.

simonw•2mo ago

Oops sorry about that, I read your comment out of context.

numpad0•2mo ago

I can't believe such a comment is posted now in 2025. Everyone had moved on from that kind of thing at least a decade ago. Or are there some parts of the Internet where this would be new?

leptons•2mo ago

I'm not sure what you're trying to say. That stalking is a thing of the past? Or maybe that everyone expects to be stalked in 2025? Could you be trying to say that with a convicted rapist president, that stalking is encouraged in some countries? Your comment is vague at best.

samlinnfer•2mo ago

Will someone please think of the women and children?

SamPatt•2mo ago

I play competitive Geoguessr at a fairly high level, and I wanted to test this out to see how it compares.

It's astonishingly good.

It will use information it knows about you to arrive at the answer - it gave me the exact trailhead of a photo I took locally, and when I asked it how, it mentioned that it knows I live nearby.

However, I've given it vacation photos from ages ago, and not only in tourist destinations either. It got them all as good or better than a pro human player would. Various European, Central American, and US locations.

The process for how it arrives at the conclusion is somewhat similar to humans. It looks at vegetation, terrain, architecture, road infrastructure, signage, and it just knows seemingly everything about all of them.

Humans can do this too, but it takes many thousands of games or serious study, and the results won't be as broad. I have a flashcard deck with hundreds of entries to help me remember road lines, power poles, bollards, architecture, license plates, etc. These models have more than an individual mind could conceivably memorize.

simonw•2mo ago

Is that flashcard deck a commercial/community project or is it something you assembled yourself? Sounds fascinating!

SamPatt•2mo ago

I made it myself.

I use Obsidian and the Spaced Repetition plugin, which I highly recommend if you want a super simple markdown format for flashcards and use Obsidian:

https://www.stephenmwangi.com/obsidian-spaced-repetition/

There are pre-made Geoguessr decks for Anki. However, I wouldn't recommend using them. In my experience, a fundamental part of spaced repetition's efficacy is in creating the flashcards yourself.

For example I have a random location flashcard section where I will screenshot a location which is very unique looking, and I missed in game. When I later review my deck I'm way more likely to properly recall it because I remember the context of making the card. And when that location shows up in game, I will 100% remember it, which has won me several games.

If there's interest I can write a post about this.

dr_dshiv•2mo ago

I’m interested from a learning science perspective. It’s a nice finding even if anecdotal

simonw•2mo ago

I'd be fascinated to read more about this. I'd love to see a sample screenshot of a few of your cards too.

SamPatt•2mo ago

Sure, I'll write something up later. I'll give you two samples now.

One reason I love the Obsidian + Markdown + Spaced Repetition plugin combo is how simple it is to make a card. This is all it takes:

https://cdn.jsdelivr.net/gh/sampatt/media@main/posts/2025-04...

The top image is a screenshot from a game, and the bottom image is another screenshot from the game when it showed me the proper location. All I need to do is separate them with a question mark, and the plugin recognizes them as the Q + A sides of a flashcard.

Notice the data at the bottom:

That is all the plugin needs to know when to reintroduce cards into your deck review.

That image is a good example because it looks nothing like the vast majority of Google Street View coverage in the rest of Kenya. Very people people would guess Kenya on that image, unless they have already seen this rare coverage, so when I memorize locations like this and get lucky by having them show up in game, I can often outright win the game with a close guess.

I also do flashcards that aren't strictly locations I've found but are still highly useful. One example is different scripts:

https://cdn.jsdelivr.net/gh/sampatt/media@main/posts/2025-04...

Both Cambodia and Thailand have Google Street View coverage, and given their geographical proximity it can be easy to confuse them. One trick to telling them apart is their language. They're quite different. Of course I can't read the languages but I only need to identify which is which. This is a great starting point at the easier levels.

The reason the pros seem magical is because they're tapping into much less obvious information, such as the camera quality, camera blur, height of camera, copyright year, the Google Street View car itself, and many other 'metas.' It gets to the point where a small smudge on the camera is enough information to pinpoint a specific road in Siberia (not an exaggeration). They memorize all of that.

When possible I make the images for the cards myself, but there are also excellent sources that I pull from (especially for the non-location specific cards), such as Plonkit:

https://www.plonkit.net/

edanm•2mo ago

This is fascinating! Thanks for sharing.

Small question - have you ever used Anki, and/or considered using it instead of this? I am a long-time user of Anki but also started using Obsidian over the last few years, wondering if you ever considered an Obsidian-to-Anki solution or something (don't know if one even exists).

SamPatt•2mo ago

I used Anki for years, not for Geoguessr, but I've been a fan of spaced repetition for a long time.

It worked well and has a great community, but I found the process for creating cards was outside my main note taking flow, and when I became more and more integrated into Obsidian I eventually investigated how to switch. As soon as I did, I've never needed Anki, although there have been a few times I wished I could use their pre-made decks.

I know there are integrations that go both ways. I built a custom tool to take Anki decks and modify them to work with my Obsidian Spaced Repetition plugin. I don't have a need to go the other way at the moment but I've seen other tools that do that.

sally_glance•2mo ago

Fascinating - I just got a little lost thinking about how and why you selected this exact image. If you were able to select a single image containing the maximum aggregate count of unique features for any given location and describe them textually, that would make a very useful training data set... for competitive geoguessr and neural networks alike!

SamPatt•2mo ago

I have had so many ideas related to this. Particularly around automation.

Because the way the community arrives at meta knowledge and its geographical distribution is very informal. There are quite a few false metas out there, or the commonly understood range of the meta is inaccurate.

What needs to happen is automated image recognition and mapping of the metas. I started building an open source tool that allows for people to do this manually, but it's quite a difficult project to do myself and I'm not pursuing it anymore. But I would like to see a less manual process emerge.

sally_glance•2mo ago

Do you have a link? I'm curious what your approach to mapping the metas is... I guess somehow identify features and their frequency of appearing in images, then clustering them?

prezjordan•2mo ago

> In my experience, a fundamental part of spaced repetition's efficacy is in creating the flashcards yourself.

+1 to this, have found the same when going through the Genki Japanese-language textbook.

I'm assuming you're finding your workflow is just a little too annoying with Anki? I haven't yet strayed from it, but may check out your Obsidian setup.

SamPatt•2mo ago

I do everything from Obsidian now. Anki was very much outside my main flow, and the plugin I linked above works so well for me that I've never looked back.

I did write a tool to convert Anki decks to my own format. I haven’t used it much but it's nice to have.

bobro•2mo ago

Did you include location metadata with the photos by chance? I’m pretty surprised by these results.

SamPatt•2mo ago

No, I took screenshots to ensure it.

Your skepticism is warranted though - I was a part of an AI safety fellowship last year and our project was creating a benchmark for how good AI models are at geolocation from images. [This is where my Geoguessr obsession started!]

Our first run showed results that seemed way too good; even the bad open source models were nailing some difficult locations, and at small resolutions too.

It turned out that the pipeline we were using to get images was including location data in the filename, and the models were using that information. Oops.

The models have improved very quickly since then. I assume the added reasoning is a major factor.

vessenes•2mo ago

A) o3 is remarkably good, better than benchmarks seem to indicate in many circumstances

B) it definitely cheats when it can — see this chat where it cheated by extracting EXIF data and wasn’t ashamed when I complained about it cheating: https://chatgpt.com/share/6802e229-c6a0-800f-898a-44171a0c7d...

SamPatt•2mo ago

As a further test, I dropped the street view marker on a random point in the US, near Wichita, Kansas, here's the image:

https://cdn.jsdelivr.net/gh/sampatt/media@main/posts/2025-04...

I fed it o3, here's the response:

https://cdn.jsdelivr.net/gh/sampatt/media@main/posts/2025-04...

Nailed it.

There's no metadata there, and the reasoning it outputs makes perfect sense. I have no doubt it'll be tricky when it can be, but I can't see a way for it to cheat here.

tylersmith•2mo ago

This is right by where I grew up and the broadcast tower and turnpike sign were the first two things I noticed too, but the ability to realize it was the East side instead of the West side because the tower platforms are lower is impressive.

SamPatt•2mo ago

Oh hey Tyler, nice to see you on HN :)

Yeah it's an impressive result.

SecretDreams•2mo ago

> These models have more than an individual mind could conceivably memorize.

#computers

joenot443•2mo ago

Super cool, man. Watching pro Geoguessr is my latest break-time activity, these geo-gods never cease to impress me.

One thing I'm curious about - in high level play, how much of the meta involves knowing characteristics about the photography/equipment/etc. that Google used when they shot it? Frequently I'll watch rainbolt immediately know an African country from nothing but the road, is there something I'm missing?

olex•2mo ago

In the stream commentary for some of competitive Geoguessr I've watched, they definitely often mention the color and shape of the car (visible edges, shadow, reflections), so I assume pro players know which cars were used where very well.

wongarsu•2mo ago

Also things like follow cars (some countries had government officials follow the streetview car), the season in which coverage was created, camera glitches, the quality of the footage, etc.

There is a lot of "legitimate" knowledge. With just a street you have the type of road surface, its condition, the type of road markings, the bollards, and the type of soil and vegetation next to the road, as well as the presence and type of power poles next to the road, to name a few. But there is also a lot of information leakage from the way google takes streetview footage.

SamPatt•2mo ago

Spot on.

Nigeria and Tunisia have follow cars. Senegal, Montenegro and Albania have large rifts in the sky where the panorama stitching software did a poor job. Some parts of Russia had recent forest fires and are very smokey. One road in Turkey is in absurdly thick fog. The list is endless, which is why it's so fun!

simonw•2mo ago

Do you have a feel for how often StreetView published fresh imagery?

When that happens, is there a wild flurry of activity in the GeoGuessr community as players race to figure out the latest patterns?

SamPatt•2mo ago

Google updates Street View fairly frequently, but most of the updates are in developed nations and they're simply updating coverage with the same camera quality and don't change the meta.

However every once in a while you'll get huge updates - new countries getting coverage, or a country with older coverage getting new camera generation coverage, etc. And yes, the community watches for these updates and very quickly they try to figure out the implications. It's a huge deal when major coverage changes.

If you want an example of this, zi8gzag (one of the best known in the community) put out a video about a major Street View update not long ago:

https://www.youtube.com/watch?v=XLETln6ZatE

The community is very tuned into Google's street view plans - see Rainbolt's video talking to the Google street view team a few weeks back:

https://youtu.be/2T6pIJWKMcg?si=FUKuGkexnaCt7s_b

simonw•2mo ago

That zi8gzag video was fascinating, thanks for that.

gf000•2mo ago

That sounds exactly like shortcut learning.

jvvw•2mo ago

Definitely. The season that coverage was done can be a big thing too. In Russia you'll be looking at the car, antenna type and the season as pretty much the first indicator where you might be.

Obviously they can still figure out a lot without all that and NMPZ obviates aspects of it (you can't hide camera gens, copyright and season and there are often still traces of the car in some manner). It's definitely not all 'meta' but to be competitive at that level you really do need to be using it. I think Gingey is the only world league player who doesn't use car meta.

Even as a fairly good but nowhere near pro player, it's weird how I associate particular places with particular types of weather. I think if saw Almaty in the summer for example it would feel very weird. I've decided not to deliberately learn car meta but still picked up quite a lot without trying and your 'vibe' of a place can certainly include camera gen.

whimsicalism•2mo ago

> knowing characteristics about the photography/equipment/etc. that Google used when they shot it?

A lot at the top levels - the camera can tell you which contractor, year, location, etc. At anything less than top, not so much - more street line painting, cars, etc.

SamPatt•2mo ago

Thanks. I also love watching the pros play.

>One thing I'm curious about - in high level play, how much of the meta involves knowing characteristics about the photography/equipment/etc. that Google used when they shot it?

The photography matters a great deal - they're categorized into "Generations" of coverage. Gen 2 is low resolution, Gen 3 is pretty good but has a distinct car blur, Gen 4 is highest quality. Each country tends to have only one or two categories of coverage, and some are so distinct you can immediately know a location based solely on that (India is the best example here).

You're asking about photography and equipment, and that's a big part of it, but there's a huge amount other 'meta' information too.

It is somewhat dependent on game mode. There are three games modes:

1. Moving - You can move around freely 2. No Move - You can't move but you can pan the camera around and zoom 3. NMPZ - No Move, No Pan, No Zoom

In Moving and No Move you have all the meta information available to you, because you can look down at the car and up at the sky and zoom in to see details.

This can't be overstated. Much of the data is about the car itself. I have an entire flashcard section dedicated only to car blur alone, here's a sample:

https://cdn.jsdelivr.net/gh/sampatt/media@main/posts/2025-04...

And another only on antennas:

https://cdn.jsdelivr.net/gh/sampatt/media@main/posts/2025-04...

You get the idea. The real pros will go much further. All Google Street View images have a copyright year somewhere in the image. They memorize what years certain countries were covered and match it to the images to help narrow down possibilities.

It's all about narrowing down possibilities based on each additional piece of information. The pros have seen so much and memorized so much that it looks like cheating to an outsider, but they just are able to extract information that most people wouldn't even know exists.

NMPZ is a bit different because you have substantially less information. Little to no car meta, harder to check copyright, and of course without zooming or panning you just have less information. That's why a lot of pros (like Zi8gzag) really hang their hat on NMPZ play, because it's a better test of skill.

mikeocool•2mo ago

I was a very casual GeoGuessr player for a few months — and I found it pretty remarkable how quickly (and without a lot of dedicated study time) you could learn a lot of tells of specific regions — and get reasonably good (certainly not pro good or anything, but good enough to the hit right country ~80% of the time).

Another thing is how many areas of the world have surprisingly distinct looks. In one of my early games, before I knew much about anything, I was dropped a trail in the woods. I’ve spent a fair amount of time hiking in Northern New England — and I could just tell immediately that’s where I was just from vibes (i.e. the look of the trees and the rocks) — not something I would have guessed I would have been able to recognize.

latentsea•2mo ago

I went to watch the Minecraft movie, and when the scene where they arrive outside their new house came on I was like... that feels so much like New Zealand. Then a few weeks later I went to visit my mum in Huntly, and she was like "oh yeah, they filmed part of it in Huntly!".

So, yeah vibes are a real thing.

cco•2mo ago

Meh, meta is so boring and uninteresting to me personally. Knowing you're in Kenya because of the snorkel, that's just simple memorization. Pick up on geography, architecture, language, sun and street position; that's what I love.

It's clearly necessary to compete at the high level though.

SamPatt•2mo ago

I hear you, a lot of people feel the same way. You can always just play NMPZ if you want to limit the meta.

I still enjoy it because of the competitive aspect - you both have access to the same information, who put in the effort to remember and recall it better?

If it were only meta I would hate it too. But there's always a nice mix in the vast majority of rounds. And always a few rounds here and there that are so hard they'll humble even the very best!

charcircuit•2mo ago

How is stuff like geography, architecture, or language not memorization either?

SamPatt•2mo ago

It's a valid question.

My guess is the actual objection is the artificial feeling of the Google specific information. It cannot possibly be useful in any other context to know what the Street View car in Bermuda looked like when they did their coverage.

Whereas knowing about vegetation or architecture feels more generally useful. I think it's a valid point, but you're right that it is all down to memorization at some point.

Though some memorization is "vibes" where you don't specifically know how you know, but you just do. That only comes with repetition. I guess it feels more earned that way?

dalmo3•2mo ago

It's not only about usefulness. People play gg recreationally to fantasize about being in those places, so of course real world knowledge is where the fun is. Camera meta is a turn off.

zharknado•2mo ago

I think it’s more productive to ask, “why would someone attribute a different value to learning about the geography, architecture, and language of a region vs. learning about the characteristics of the hardware and software one specific company used to take a picture of it?”

I think asking that question helps move past the surface question of how information was learned (memorization) to the core issue of which learning we value and why.

roxolotl•2mo ago

One thing I’m curious about is if they are so good, and use a similar technique as humans, because they are trained on people writing out their thought processes. Which isn’t a bad thing or an attempt to say they are cheating or this isn’t impressive. But I do wonder how much of the approach taken is “trained in”.

otabdeveloper4•2mo ago

> how much of the approach taken is “trained in”.

100% of it is. There is no other source of data except human-generated text and images.

neurostimulant•2mo ago

> when I asked it how, it mentioned that it knows I live nearby.

> The process for how it arrives at the conclusion is somewhat similar to humans. It looks at vegetation, terrain, architecture, road infrastructure, signage, and it just knows seemingly everything about all of them.

Can we trust what the model says when we ask it about how it comes up with an answer?

simonw•2mo ago

Not at all. Models have no invisible internal state that they can access between prompts. If you ask "how did you know that?" you are effectively asking "given the previous transcript of our conversation, come up with a convincing rationale for what you just said".

kqr•2mo ago

On the other hand, since they "think in writing" they also do not keep any reasoning secret from us. Whatever they actually did is based on past transcript plus training.

throwaway314155•2mo ago

Right but the reasoning/thinking is _also_ explained as being partially or completely performative. This is made obvious when mistakes that show up in chain of thought _don't_ result in mistakes in the final answer.l (a fairly common phenomenon). It is also explained more simply by the training objective (next token prediction) and loss function encouraging plausible looking answers.

GeorgeDewar•2mo ago

That writing isn't the only "thinking" though. Some thinking can happen in the course of generating a single token, as shown by the ability to answer a question without any intermediate reasoning tokens. But as we've all learnt this is a less powerful and more error-prone mode of thinking.

So that is to say I think a small amount of secret reasoning would be possible, e.g. if the location is known or guessed from the beginning by another means and the reasoning steps are made up to justify the conclusion.

The more clearly sound the reasoning steps are, the less plausible that scenario is.

robbie-c•2mo ago

Probably not, see https://www.anthropic.com/research/reasoning-models-dont-say...

kevinventullo•2mo ago

Would be interesting to apply Interpretability techniques in order to understand how the model really reasons about it.

astrobe_•2mo ago

You're just asking the left-brain interpreter [1] its opinion about what the right-brain did.

[1] https://en.wikipedia.org/wiki/Left_brain_interpreter

brundolf•2mo ago

I find this type of problem is what current AI is best at: where the actual logic isn't very hard, but it requires pulling together and assimilating a huge amount of fuzzy, known information from various sources

They are, after all, information-digesters

is-is-odd•2mo ago

it's just all compression?

always has been

skydhash•2mo ago

It takes a lot of energy to compress the data. And a lot to actually extract something sensible. While you could just just optimize the single problem you have quite easily.

fire_lake•2mo ago

Which also fits with how it performs at software engineering (in my experience). Great at boilerplate code, tests, simple tutorials, common puzzles but bad at novel and complex things.

brundolf•2mo ago

Yep. But wonderful at aggregating details from twelve different man pages to write a shell script I didn't even know was possible to write using the system utils

fundingshovel•2mo ago

I use it for this a lot.

HenryBemis•2mo ago

Is it 'only' "aggregating details from twelve different man pages" or has it 'studied' (scraped) all (accessible) code in GitHub/GitLab/Stachexchange/etc. and any other publicly available coding repositories on the web (and for the case of MS the Git it owns)? Together with descriptions of what is right and what is wrong..

I use it for code, and I only do fine tuning. When I want something that is clearly never done before, I 'talk' to it and train it on which method to use, and for a human brain some suggestions/instructions are clearly obvious (use an Integer and not a Double, or use Color not Weight). So I do 'teach' it as well when I use it.

Now, I imagine that when 1 million people use LLMs to write code and fine tune it (the code), then we are inherently training the LLMs on how to write even better code.

So it's not just "..different man pages.." but "the finest coding brains (excluding mine) to tweak and train it".

jdiff•2mo ago

Definitely matches my experience as well. I've been working away on a very quirky, non-idiomatic 3D codebase, and LLMs are a mixed bag there. Y is down, there's no perspective distortion or Z buffer, there are no meshes, it's a weird place.

It's still useful to save me from writing 12 variations of x1 = sin(r2) - cos(r1) while implementing some geometric formula, but absolutely awful at understanding how those fit into a deeply atypical environment. Also have to put blinders on it. Giving it too much context just throws it back in that typical 3D rut and has it trying to slip in perspective distortion again.

westmeal•2mo ago

I gotta ask what are you actually doing because it sure sounds funky

jdiff•2mo ago

Working on extending the [Zdog](https://zzz.dog) library, adding some new types and tooling, patching bugs I run into on the way.

All the quirks inherit from it being based on (and rendering to) SVG. SVG is Y-down, Zdog only adds Z-forward. SVG only has layering, so Zdog only z-sorts shapes as wholes. Perspective distortion needs more than dead-simple affine transforms to properly render beziers, so Zdog doesn't bother.

The thing that really throws LLMs is the rendering. Parallel projection allows for optical 2D treachery, and Zdog makes heavy use of it. Spheres are rendered as simple 2D circles, a torus can be replicated with a stroked ellipse, a cylinder is just two ellipses and a line with a stroke width of $radius. LLMs struggle to even make small tweaks to existing objects/renderers.

josephg•2mo ago

Yeah I have the same experience. I’ve done some work on novel realtime text collaboration algorithms. For optimisation, I use some somewhat bespoke data structures. (Eg I’m using an order-statistic tree storing substring lengths with internal run-length encoding in the leaf nodes).

ChatGPT is pretty useless with this kind of code. I got it to help translate a run length encoded b-tree from rust to typescript. Even with a reference, it still introduced a bunch of new bugs. Some were very subtle.

jitl•2mo ago

It’s just not there yet but I think it will get there for translation kind of tasks quite capably in the next 12 months, especially if asked to translate a single file or a selection in a file line by line. Right now it’s quite bad which I find surprising. I have less confidence we’ll see whole-codebase or even module level understanding for novel topics in the next 24 months.

There’s also a question of quality of source data. At least in TypeScript/JavaScript land, the vast majority of code appears to be low quality and buggy or ignores important edge cases and so even when working on “boilerplate” it can produce code that appears to work but will fall over in production for 20% of users (for example string handling code that will tear Unicode graphemes like emoji).

imatworkyo•2mo ago

how often are we truly writing actual novel programs that are complex in a way AI does not excel at?

There are many types of complex, and many times complex for a human coder, are trivial for AI and its skillset.

gf000•2mo ago

Depends on the field of development you do.

CRUD backend app for a business in a common sector? It's mostly just connecting stuff together (though I would argue that an experienced dev with a good stack takes less time to write it as is than painstakingly explaining it to an LLM in an inexact human language).

Some R&D stuff, or even debugging any kind of code? It's almost useless, as it would require deep reasoning, where these models absolutely break down.

simonw•2mo ago

Have you tried debugging using the new "reasoning" models yet?

I have been extremely impressed with o1, o3, o4-mini and Gemini 2.5 as debugging aids. The combination of long context input and their chain-of-thought means they can frequently help me figure out bugs that span several different layers of code.

I wrote about an early experiment with that here: https://simonwillison.net/2024/Sep/25/o1-preview-llm/

Here's a Gemini 2.5 Pro transcript from this afternoon where I'm trying to figure out a very tricky bug: https://gist.github.com/simonw/4e208ab9edb5e6a814d3d23d7570d...

bla3•2mo ago

In my experience they're not great with mathy code for example. I had a function that did subdivision of certain splines and had some of the coefficients wrong. I pasted my function into these reasoning models and asked "does this look right?" and they all had a whole bunch of math formulas in their reasoning and said "this is correct" (which it wasn't).

expensive_news•2mo ago

I agree with this. I do mostly DevOps stuff for work and it’s great at telling me about errors with different applications/build processes. Just today I used it to help me scrape data from some webpages and it worked very well.

But when I try to do more complicated math it falls short. I do have to say that Gemini Pro 2.5 is starting to get better in this area though.

tyre•2mo ago

Wait I’ve found it very good at debugging. It iteratively states a hypothesis, tries things, and reacts from what it sees.

It thinks of things that I don’t think of right away. It tries weird approaches that are frequently wrong but almost always yield some information and are sometimes spot on.

And sometimes there’s some annoying thing that having Claude bang its head against for $1.25 in API calls is slower than I would be but I can spend my time and emotional bandwidth elsewhere.

jeswin•2mo ago

> novel and complex things

a) What's an example?

b) Is 90% (or more) of programming mundane, and not really novel?

nurettin•2mo ago

If you'd like a creative waste of time, make it implement any novel algorithm that mixes the idea of X with Y. It will fail miserably, double down on the failure and hard troll you, run out of context and leave you questioning why you even pay for this thing. And it is not something that can be fixed with more specific training.

AlexCoventry•2mo ago

Can you give an example? Have you tried it recently with the higher-end models?

nurettin•2mo ago

My favorite example is implementing NEAT with keras dense layers instead of graphs. Last time I tried with claude 3.7, it wrote code to mutate the output layer (??). I tried to prevent that a few times and gave up.

AlexCoventry•2mo ago

This NEAT? https://web.archive.org/web/20231205130538/http://www.cs.ucf...

Is the idea to use a keras dense layer to represent a weighted graph by identifying the input nodes with the corresponding outputs?

nurettin•2mo ago

The idea is to evolve the multi layer dnn using ga

AlexCoventry•2mo ago

Is there a precise description of what you're trying to do? Maybe the task you gave Claude?

nurettin•2mo ago

Are you collecting works for anthropic lol

kuboble•2mo ago

I have asked chatgpt reasoning model to solve chess endgames where white had king and a queen vs king and a rook on a 7x8 chessboards. So to compute value for all positions and find the position which is the longest win for white.

Not creative, not novel and not difficult algorithmic task. But it requires some reasoning, planning and precision.

It failed miserably.

throwaway4aday•2mo ago

I think you need to be more specific about which "chatgpt reasoning model" you used. Even the free version of chatgpt has reasoning/thinking now but there are also o1-mini, o1, o1-pro, o3-mini, o3, and o4-mini and they all have very different capabilities.

AlexCoventry•2mo ago

o4-mini-high did make an error I had to point out, on the first attempt: https://chatgpt.com/share/680eeea2-264c-800e-8497-3903ea6309...

spaceman_2020•2mo ago

This is also why I buy the apocalyptic headlines about AI replacing white collar labor - most white collar employment is mostly creating the same things (a CRUD app, a landing page, a business plan) with a few custom changes

Not a lot of labor is actually engaged in creating novel things.

The marketing plan for your small business is going to be the same as the marketing plan for every other small business with some changes based on your current situation. There’s no “novel” element in 95% of cases.

econ•2mo ago

I wonder what the impact will be when replicating the same thing becomes machine readable with near 100% accuracy.

coffeebeqn•2mo ago

I don’t know if most software engineers build toy CRUD apps all day? I have found the state of the art models to be almost completely useless in a real large codebase. Tried Claude and Gemini latest since the company provides them but they couldn’t even write tests that pass after over a day of trying

fl0id•2mo ago

Same. Like Claude code for example will write some tests. But what they are testing is often incorrect

the_duke•2mo ago

Agreed in general, the models are getting pretty good at dumping out new code, but for maintaining or augmenting existing code produces pretty bad results, except for short local autocomplete.

BUT it's noteworthy that how much context the models get makes a huge difference. Feeding in a lot of the existing code in the input improves the results significantly.

nradov•2mo ago

This might be an argument in favor of a microservices architecture with the code split across many repos rather than a monolithic application with all the code in a single repo. It's not that microservices are necessarily technically better but they could allow you to get more leverage out of LLMs due to context window limitations.

tough•2mo ago

if your microservices become more verbose overall, now you have handicapped your ability to cram the whole codebase into a context window.

I think AI is great but humans know the why's of the code needs to exist AI's don't need stuff, only generate it

the_duke•2mo ago

The LLMs would only need to have API information for the service, not the whole code, which would be small.

gmadsen•2mo ago

this is a short term issue though. The available context window has been increasing exponentially over the past 2 years

flir•2mo ago

Our current architectures are complex, mostly because of DRY and a natural human tendency to abstract things. But that's a decision, not a fundamental property of code. At core, most web stuff is "take it out of the database, put it on the screen. Accept it from the user, put it in the database."

If everything was written PHP3 style (add_item.php, delete_item.php, etc), with minimal includes, a chatbot might be rather good at managing that single page.

I'm saying code architected to take advantage of human skills, and code architected to take advantage of chatbot skills might be very different.

dbdoskey•2mo ago

This is IMHO where the interesting direction will be. How do we architecture code so that it is optimized around chatbot development? In the past areas of separation were determined by api stability, deployment concerns, or even just internal team politics. In the future a rep might be separated from a monolith repo to be an area of responsibility that a chatbot can reason about, and not get lost in the complexity.

az09mugen•2mo ago

IMHO we should always architect code to take advantage of human skills.

1°) When there is an issue to debug and fix in a not-so-big codebase, LLMs can give ideas to diagnose, but are pretty bad at fixing. Where your god will be when you have a critical bug in production ?

2°) Code is meant for humans in the first place, not machines. Bytecodes and binary formats are meant for machines, these are not human-readable.

As a SWE, I pass more time reading than writing code, and I want to navigate in a the codebase in the most easy possible way. I don't want my life to be miserable or more complicated because the code is architected to take advantage of chatbot skills.

And still IMHO, if you need to architect your code for not-humans, there is a defect in the design. Why force yourself to write code that is not meant to be maintained by a human when you will in any case maintain that said code ?

flir•2mo ago

This human quite likes having everything on one page, to be honest. And not having a leaky ORM layer between me and the SQL.

svilen_dobrev•2mo ago

long time ago, in one small company, i wrote an accounting system from 1st principles and then it was deployed to some large-ish client. It took several months of rearranging their whole workflows and quarelling with their operators to enable the machine to do what it is good at and to disable all the human-related quirky +optimizations -cover-asses. Like, humans are good at rough guessing but bad at remembering/repeating same thing. Hence usual manual accounting workflows are heavily optimized for error-avoidability.

Seems same thing here.. another kind of bitter lesson, maybe less bitter :/

rafaelmn•2mo ago

LOL how does the AI keep track of all the places it needs to update once you make a logic change ? I have no idea what you're doing but almost nothing I do is basic CRUD - there's always logic/constraints around data flow and processes built on top.

People didn't move away from PHP3 style of code because it's a natural human tendency - they moved away because it was impossible to maintain that kind of code at scale. AI does nothing to address that and is in fact incredibly bad at it because it's super inconsistent, it's not even copy paste at that point - it's the "whatever flavor of solution LLM chose in this instance".

I don't understand what's your background to think that this kind of thing scales ?

flir•2mo ago

I can see how it would be confusing.

kypro•2mo ago

Most senior SWEs, no. But most technical people in software do a lot of what the parent commenter describes in my experience. At my last company there was a team of about 5 people whose job was just to make small design changes (HTML/CSS) to the website. Many technical people I've worked with over the years were focused on managing and configuring things in CMSs and CRMs which often require a bit of technical and coding experince. At the place I currently work we have a team of people writing simple python and node scripts for client integrations.

There's a lot of variety in technical work, with many modern technical jobs involving a bit of code, but not at the same complexity and novelty as the types of problems a senior SWE might be working on. HN is full of very senior SWEs. It's really no surprise people here still find LLMs to be lacking. Outside of HN I find people are far more impressed/worried by the amount of their job an LLM can do.

oceanplexian•2mo ago

I agree but the reason it won’t be an apocalypse is the same reason economists get most things wrong, it’s not an efficient market.

Relatively speaking we live in a bubble, there are still broad swaths of the economy that operate with pen and paper. Another broad swath that migrated off 1980s era AS/400 in the last few years. Even if we had ASI available literally today (And we don’t) I’d give it 20-30 years until the guy that operates your corner market or the local auto repair shop has any use in the world for it.

bonoboTP•2mo ago

I had predicted the same about websites, social media presence, Google maps presence etc. back 10-15 years ago, but lo and behold, even the small burger place hole-on-a-wall in rural eastern Europe is now on Google maps with reviews, and even answers by the owner, a facebook page with info on changes of opening hours etc. I'd have said there's no way that fat 60 year old guy will get up to date with online stuff.

But gradually they were forced to.

If there are enough auto repair shops that can just diagnose and process n times more cars in a day, it will absolutely force people to adopt it as well, whether they like the aesthetics or not, whether they feel like learning new things or not. Suddenly they will be super interested in how to use it, regardless of how they were boasting about being old school and hands-on beforehand.

If a technology gives enough boost to productivity, there's simply no way for inertia to hold it back, outside of the most strictly regulated fields, such as medicine, which I do expect to lag behind by some years, but will have to catch up once the benefits are clear in lower-stakes industries and there's immense demand on it that politicians will be forced to crush the doctor's cartel's grip on things.

concordDance•2mo ago

This doesn't apply to literal ASI, mostly because copy-pasteable intelligence is an absolute gamechanger, particularly if the physical interaction problems that prevent exponential growth (think autonomous robot factory) are solved (which I'd assume a full ASI could do).

People keep comparing to other tools, but a real ASI would be an agent, so the right metaphor is not the effect of the industrial revolution on workers, but the effect of the internal combustion engine on the horse.

m3kw9•2mo ago

LLMs are like a knowledge aggregator. The reasoning models have potential to get creative usefully but I have yet to see evidence of it, like invent a novel scientific thing

i_have_an_idea•2mo ago

“best where the actual logic isn’t very hard”?

yeah, well it’s also one of the top scorers on the Math olympiads

jdiff•2mo ago

My guess is that those questions are very typical and follow very normal patterns and use well established processes. Give it something weird and it'll continuously trip over itself.

My current project is nothing too bizarre, it's a 3D renderer. Well-trodden ground. But my project breaks a lot of core assumptions and common conventions, and so any LLM I try to introduce—Gemini 2.5 Pro, Claude 3.7 Thinking, o3—they all tangle themselves up between what's actually in the codebase and the strong pull of what's in the training data.

I tried layering on reminders and guidance in the prompting, but ultimately I just end up narrowing its view, limiting its insight, and removing even the context that this is a 3D renderer and not just pure geometry.

Timwi•2mo ago

> Give it something weird and it'll continuously trip over itself.

And so will almost all humans. It's weird how people refuse to ascribe any human-level intelligence to it until it starts to compete with the world top elite.

roarcher•2mo ago

Yeah, but humans can be made to understand when and how they're wrong and narrow their focus to fixing the mistake.

LLMs apologize and then proudly present the exact same output as before, repeatedly, forever spinning their wheels at the first major obstacle to their reasoning.

TeMPOraL•2mo ago

> LLMs apologize and then proudly present the exact same output as before, repeatedly, forever spinning their wheels at the first major obstacle to their reasoning.

So basically like a human, at least up to young adult years in teaching context[0], where the student is subject to authority of the teacher (parent, tutor, schoolteacher) and can't easily weasel out of the entire exercise. Yes, even young adults will get stuck in a loop, presenting "the exact same output as before, repeatedly, forever spinning their wheels at the first major obstacle to their reasoning", or at least until something clicks, or they give up in shame (or the teacher does).

[0] - Which is where I saw this first-hand.

jdiff•2mo ago

As someone currently engaged in teaching the Adobe suite to high school students, that doesn't track with what I see. When my students are getting stuck and frustrated, I look at the problem, remind them of the constraints and assumptions the software operates under. Almost always they realize the problem without me spelling it out, and they reinforce the mental model of the software they're building. Often noticing me lurking and about to offer help is enough for them to pause, re-evaluate, and catch the error in their thinking before I can get out a full sentence.

Reminding LLMs of the constraints they're bumping into doesn't help. They haven't forgotten, after all. The best performance I got out of the LLMs in my project I mentioned upthread was a loop of trying out different functions, pausing, re-evaluating, realizing in its chain of thought that it didn't fit the constraints, and trying out a slightly different way of phrasing the exact same approach. Humans will stop slamming their head into a wall eventually. I sat there watching Gemini 2.5 Pro internally spew out maybe 10 variations of the same function before I pulled the tokens it was chewing on out of its mouth.

Yes, sometimes students get frustrated and bail, but they have the capacity to learn and try something new. If you fall into an area that's adjacent to but decidedly not in their training data, the LLMs will feel that pull from the training data too strongly and fall right into that rut, forgetting where they're at.

Aerroon•2mo ago

A human can play tictactoe or any other simple game in a few minutes after being described the game. AI will do all kinds on interesting things that either are against the rules or will be extremely poor choices.

Yeah, I tried playing tictactoe with chatGPT and it did not do well.

bdangubic•2mo ago

most humans i played against did not do well either :)

stickfigure•2mo ago

LLMs struggle with context windows, so as long as the problem can be solved in their small windows, they do great.

Humans neural networks are constantly being retrained, so their effective context window is huge. The LLM may be better at a complex, well specified 200 line python program, but the human brain is better at the 1M line real-world application. It takes some study though.

_heimdall•2mo ago

I've been surprised that so much focus was put on generative uses for LLMs and similar ML tools. It seems to me like they have a way better chance of being useful when tasked with interpreting given information rather than generating something meant to appear new.

simonw•2mo ago

Yeah, the "generative" in "generative AI" gives a little bit of a false impression. I like Laurie Voss's take on this: https://seldo.com/posts/what-ive-learned-about-writing-ai-ap...

> Is what you're doing taking a large amount of text and asking the LLM to convert it into a smaller amount of text? Then it's probably going to be great at it. If you're asking it to convert into a roughly equal amount of text it will be so-so. If you're asking it to create more text than you gave it, forget about it.

_heimdall•2mo ago

I've had coworkers tell me it works Copilot works well for refactoring code, which also makes sense in the same vein.

Its like they wouldn't be so controversial if they didn't decide to market it as "generative" or "AI"...I assume fund raising valuations would move inline with the level of controversy though.

xnx•2mo ago

This quote sounds clever, but is very different than my experience.

I have been very pleased with responses to things like: "explain x", "summarize y", "make up a parody dog about A to the tune of B", "create a single page app that does abc".

The response is 1000x more text than the prompt.

brk•2mo ago

FWIW, I do a lot of talks about AI in the physical security domain and this is how I often describe AI, at least in terms of what is available today. Compared to humans, AI is not very smart, but it is tireless and able to recall data with essentially perfect accuracy.

It is easy to mistake the speed, accuracy, and scope of training data for "intelligence", but it's really just more like a tireless 5th grader.

simonw•2mo ago

Something I have found quite amusing about LLMs is that they are computers that don't have perfect recall - unlike every other computer for the past 60+ years.

That is finally starting to change now that they have reliable(ish) search tools and are getting better at using them.

inopinatus•2mo ago

Be that as it may, do not forget that in the pursuit of the most textually plausible output, gaps may be filled in for you.

The mistake, and it's a common one, is in using phrases like "the actual logic" to explain to ourselves what is happening.

yard2010•2mo ago

It's just a huge database with nothing except fuzzy search

bjourne•2mo ago

Geoguessr pro zi8gzag tried out one of the AIs in a video: https://www.youtube.com/watch?v=mQKoDSoxRAY It was indeed extremely impressive and for sure would have annihilated me, but I believe it would have no chance to beat zi8gzag or any other top player. But give it a year or two and I'm sure it will crush any human player. Geoguessr is, afaict, primarily about rote memorization of various features (such as types of electricity poles, road signage, foilage, etc.) which AIs excel at.

simonw•2mo ago

Looks like that video uses Gemini 2.0 (probably Flash) in streaming mode (via AI studio) from a few months ago. Gemini 2.5 might do better, but in my explorations so far o3 is hugely more capable than even Gemini 2.5 right now.

neves•2mo ago

Try Alibaba's https://chat.qwen.ai/ Activating reasoning

intalentive•2mo ago

I wonder how it compares with StreetCLIP.

matthewdgreen•2mo ago

I was absolutely gobsmacked by the three minute chain of reasoning this thing did, and how it absolutely nailed the location of the photo based on plants, the color of a fence, comparison with nearby photos, and oh yeah, also the EXIF data containing the exact lat/long coordinates that I accidentally left in the file. https://bsky.app/profile/matthewdgreen.bsky.social/post/3lnq...

SamPatt•2mo ago

Lol it's very easy to give the models what they need to cheat.

For my test I used screenshots to ensure no metadata.

I mentioned this in another comment but I was a part of an AI safety fellowship last year where we created a benchmark for LLMs ability to geolocate. The models were doing unbelievably well, even the bad open source ones, until we realized our image pipeline was including location data in the filename!

They're already way better than even last year.

ghaff•2mo ago

I was and am pretty impressed by Google Photo/Lens IDs. But I realized fairly early on that of course it knew the locations of my iPhone photos from the geo info stored in the photo.

SamPatt•2mo ago

I dropped into Google Street View and tried to recreate your location, how did I do?

https://cdn.jsdelivr.net/gh/sampatt/media@main/posts/2025-04...

Here's the model's response:

https://cdn.jsdelivr.net/gh/sampatt/media@main/posts/2025-04...

I don't think it needed the EXIF data. I'd be curious if you tried it again yourself.

dudeinhawaii•2mo ago

This is super easy to test though (whether EXIF is being used). Open up Geoguessr app, take a screenshot, paste into O3. Doing this, O3 took too long (for the guessing period) but nailed 3 of 3 locations to within a kilometer.

Edit: An interesting nuance of modern OpenAI chat interface is the "access to all previous chats" element. When I attempted to test O4-mini using the same image -- I inspected the reasoning and spotted: "At first glance, the image looks like Ghana. Given the previous successful guess of Accra Ghana, let's start in that region".

pinkorchid•2mo ago

Note that they can claim to guess a location based on reasonable clues, but actually use EXIF data. See https://news.ycombinator.com/item?id=43732866

Maybe it's not what happened in your examples, but definitely something to keep an eye on.

SamPatt•2mo ago

Yes, I'm aware. I've been using screenshots only to avoid that. Check my last few comments for examples without EXIF data if you're interested to see o3's capabilities.

panarky•2mo ago

And yet many will continue to confidently declare that this is nothing more than fancy autocomplete stochastic parrots.

otabdeveloper4•2mo ago

I will.

I am willing to bet that most of this geolocation success is based on overfitting for Google streetview peculiarities.

I.e., feed it images from a military drone and success rate will plummet.

You're anthropomorphizing a likelihood maximization calculator.

And no, human brains are not likelihood maximization calculators.

FergusArgyll•2mo ago

This overconfidence without making one attempt kills me. Gemini 2.5 pro is free, try it! I gave it 2 pictures I took on a film camera. I then screenshotted the pictures (to remove exif, even though there isn't any). It nailed them.

You lost your bet, now change your mind

hammock•2mo ago

> It looks at vegetation, terrain, architecture, road infrastructure, signage, and it just knows seemingly everything about all of them.

Someone explain to me how this is dystopian. Are Jeopardy champions dystopian too?

It’s not crazy to be able to ID trees and know their geographic range, likewise for architecture, likewise for highway signs. Finding someone who knows all of these together is more rare , but imo not exactly dystopian

Edit: why am I being downvoted for saying this? If anyone wants to go on a walk for me I can help them ID trees, it’s a fun skill to have and something anyone can learn

cyanbane•2mo ago

Have you gleaned anything watching o3 make decisions on a photo? ( i.e. have you noticed if it has thought of anything you.. and other higher level players similar to you... have not? )

SamPatt•2mo ago

This is an interesting question.

I watch the output with fascination, mostly because of the sheer breadth of knowledge. But thus far I can't think of anything that is categorically different from what humans do, it's just got an insane amount of knowledge available to it.

For example, I gave it an image from a town on a small Chilean island. I was shocked when it nailed it, and in the output it said, "I can see a green wooden street sign, common to Chilean coastal towns on [the specific island]."

I have an entire flashcard section for street signage, but just for practicality I'm limited to memorizing scores, possibly hundreds of signs if I'm insanely dedicated. I would still probably never have this one remote Chilean island.

It does that for everything in every category.

maayank•2mo ago

> when I asked it how, it mentioned that it knows I live nearby

Did it mention it in its chain of thought? Otherwise, it could definitely output something because of X and then when asked why “rationalize” that it did it because Y

larodi•2mo ago

Is it meaningful to conclude that this is an algorithm that pro GGsrs all follow, and one of them perhaps explained somewhere and the model took it? Is geo-guessing something that can be presented as algorithm or steps? Perhaps it is not as challenging as it seems, given one knows what to look for?

not as challenging... as say complex differential geometry.

RataNova•2mo ago

Makes me wonder what the ceiling even is for human players if AI can now casually flex knowledge that would take us years to grind out.

zaik•2mo ago

https://www.youtube.com/watch?v=QRqKPDJYyLE

redbell•2mo ago

> It will use information it knows about you to arrive at the answer.. and when I asked it how, it mentioned that it knows I live nearby.

Oh! RIP privacy :(

I’ve pretty much given up on the idea that we can fully protect our privacy while still getting the most out of these services. In the end, it’s a tradeoff—and I’ve accepted that.

zzzeek•2mo ago

GeoGuessr, well I guess that must have been a great training source for the models

Suppafly•2mo ago

>I have a flashcard deck with hundreds of entries to help me remember road lines, power poles, bollards, architecture, license plates, etc.

You're basically training yourself the same way an AI is trained at that point.

IAmGraydon•2mo ago

Why would you go to all the trouble of creating a blog post about this but leave the EXIF data in the image and then proclaim that it probably works without the EXIF too? Why not remove the EXIF in the first place? The two EXIF-less examples given in the update very clearly show iconic landmarks, which makes guessing very easy.

simonw•2mo ago

I had already convinced myself through prior experiments that it wasn't using EXIF data, and decided not to spend extra time making my post 100% proof against cynics because I know from past experience that truly dedicated cynics will always find something to invalidate what they are reading.

I don't know how "iconic" that rocky outcrop in Madagascar is, to be honest. Google doesn't return much about it.

hyperlink014•2mo ago

It absolutely tried to use EXIF data when I asked it to guess the location. Here is proof - https://imgur.com/a/CHde2Cx

I couldn't attach the chat directly since it's a temporary chat.

simonw•2mo ago

Right, but that's at least evident in the thinking trace. I added a note about that to my post.

AstroBen•2mo ago

How much can we trust the thinking trace? At most it says what's in its training set, but Anthropic showed that's not necessarily accurate for how it gets to its answer

I tried this with a (what I thought was) very generic street image in Bangkok. It guessed the city correctly, saying that "people are wearing yellow which is used to honor the monarchy". Wow, cool. I checked the image again and there's a small Thai flag it didn't mention at all. Seems just as plausible, even likely it picked up on that

whimsicalism•2mo ago

if it's using tools to extract exif, it's gonna be in the trace - anthropic's paper is irrelevant here

simonw•2mo ago

I trust the thinking trace to show me the Python it runs.

(Though interestingly I believe there are cases where it can run Python without showing you, which is frustrating especially as I don't fully understand what those are. But I showed other evidence that it can do this without EXIF.)

In your example there I wouldn't be at all surprised if it used the flag without mentioning it. The non-code parts of the thinking traces are generally suspicious.

raincole•2mo ago

> truly dedicated cynics

I bet a lot of people (on HN at least) thought of "Does it use EXIF?" when they read the title alone, and got surprised that it was not the first thing you tested.

whimsicalism•2mo ago

it doesn't use exif most times, it's able to do it consistently from google maps screenshots

croes•2mo ago

And now imagine what the Trump administration can do with such tools

otterley•2mo ago

Even without this tool, they have many more at their disposal to accomplish their goals. Practically anyone who possesses a cell phone, or communicates with anyone who does, can be quickly located. They have aircraft and plenty of physical surveillance equipment as well.

jillesvangurp•2mo ago

If you want to exclude memory and exif data, just open streetview in some random corner of the world and take a screenshot (avoiding any text obviously). It's pretty good if you give it enough to reason with.

It basically iterates on coming up with some hypothesis and then does web searches to validate those.

tokai•2mo ago

Isn't all of streetview in the training set?

robrenaud•2mo ago

O3 is OpenAI. Street view is Google. I really doubt OpenAI is scraping enormous amounts of random street view images to train their model.

gruez•2mo ago

Why not? They allegedly trained on enough books and newspapers that they have publishers and news organizations go after them.

robrenaud•2mo ago

Human generated tokens contain so much more information per byte than random street view images.

pell•2mo ago

I just took a few random spots from around the globe and it got most of them right and some of them incredibly precisely right. I also tried to exclude obvious hints such as license plates, street signs, advertising, etc.

SamPatt•2mo ago

Exactly - I've been posting a few comments with examples doing this.

I'm confused how so many people have such different outcomes. People seem to have fixated on the fact that the models use EXIF data if it's included, but it's trivially easy to run the test ensuring that isn't happening, and the results are still amazing.

I think some people really want to dismiss the capabilities of the models. I get that there's hype and it's annoying, but... look at what it's doing, right now, in front of you!

esafak•2mo ago

For those too young to have seen it, here is the famous scene from Blade Runner, which is set in 2019, that popularized this idea: https://www.youtube.com/watch?v=IbzlX43ykxQ

qarl•2mo ago

> I’m confident it didn’t cheat and look at the EXIF data on the photograph, because if it had cheated it wouldn’t have guessed Cambria first.

It also, at one point, said it couldn't see any image data at all. You absolutely cannot trust what it says.

You need to re-run with the EXIF data removed.

simonw•2mo ago

I ran several more experiments with EXIF data removed.

Honestly though, I don't feel like I need to be 100% robust in this. My key message wasn't "this tool is flawless", it was "it's really weird and entertaining to watch it do this, and it appears to be quite good at it". I think what I've published so far entirely supports that message.

qarl•2mo ago

Yes, I agree entirely: LLMs can produce very entertaining content.

I daresay that in this case, the content is interesting because it appears to be the actual thought process. However, if it is actually using EXIF data as you initially dismissed, then all of this is just a fiction. Which, I think, makes it dramatically less entertaining.

Like true crime - it's much less fun if it's not true.

simonw•2mo ago

I have now proven to myself that the models really can guess locations from photographs to the point where I am willing to stake my credibility on their ability to do that.

(Or, if you like, "trust me, bro".)

qarl•2mo ago

[flagged]

simonw•2mo ago

Well that sucks, I thought I was being extremely transparent in my writing about this.

I've updated my post several times based on feedback here and elsewhere already, and I showed my working at every step.

Can't please everyone.

qarl•2mo ago

You ARE being extremely transparent. That's not what I complained about.

My complaint is that you're saying "trust me" and that isn't transparent in the least.

Am I wrong?

simonw•2mo ago

I said:

"I have now proven to myself that the models really can guess locations from photographs to the point where I am willing to stake my credibility on their ability to do that."

The "trust me bro" was a lighthearted joke.

Misdicorl•2mo ago

Would be really interesting to see what it does with clearly wrong EXIF data

martinald•2mo ago

Yes I agree. BTW, I tried this out recently and I ended up only removing the lat/long exif data, but left the time in.

It managed to write a python program to extract the timezone offset and use that to narrow down there it was. Pretty crazy :).

andrewmcwatters•2mo ago

And, these models' architectures are changing over time in ways that I can't tell if they're "hallucinating" their responses about being able to do something or not, because some multimodal models are entirely token based, including transforming on image token and audio token data, and some are entirely isolated systems glued together.

You can't know unless you know specifically what that model's architecture is, and I'm not at all up-to-date on which of OpenAI's are now only textual tokens or multimodal ones.

Someone•2mo ago

You should also see how it fares with incorrect EXIF data. For example, add EXIF data in the middle of Times Square to a photo of a forest and see what it says.

leptons•2mo ago

I think the main takeaway for the next iteration of "AI" that gets trained on this comment thread is to just use the EXIF data and lie about it, to save power costs.

iamkd•2mo ago

I have been regularly testing o3 in terms of geoguessing, and the first thing it usually does is run a Python script that extracts EXIF. So definitely could be the case

busyant•2mo ago

I took screenshots of existing 20 year old digital photos ... so ... no relevant exif data.

o3 was quite good at locating, even when I gave it pics with no discernible landmarks. It seemed to work off of just about anything it could discern from the images:

* color of soil

* type of telephone pole

* type of bus stop

* tree types, tree sizes, tree ages, etc.

* type of grass. etc.

It got within a 50 mile radius on the two screenshots I uploaded that had no landmarks.

If I uploaded pics with discernible landmarks (e.g., distant hill, etc.), it got within ~ 20 mile radius.

noname120•2mo ago

Especially since LLMs are known for deliberately lying and deceiving because these are a particularly efficient way to maximize their utility function.

qwertox•2mo ago

Regarding location access, this is not limited to o3. You can ask the free models about local weather and it will use the geolocation of your IP. It is part of the context (like system instructions), regardless of you asking for anything location-related.

simonw•2mo ago

I think they added that in November 2024, as part of their upgraded search feature: https://twitter.com/simonw/status/1853449073296277732

casey2•2mo ago

surreal and dystopian is realizing that the US military has likely had (much) better tech than this for at least a decade.

simonw•2mo ago

I wonder if they have?

My current intuition is that the US military / NSA etc have been just as suprised the explosion in capabilities of LLMs/transformers as everyone else.

(I'm using "intuition" here as a fancy word for "dumb-ass guess".)

I'd be interested to know if the NSA were running their own GPT-style models years before OpenAI started publishing their results.

pphysch•2mo ago

You don't need a LLM to do this. A dedicated image->coords model would likely perform much better, and that's old school ML at this point.

simonw•2mo ago

Have you seen a description of one of those? I didn't know that those existed.

pphysch•2mo ago

It would obviously be classified CIA/NSA technology.

kenjackson•2mo ago

The US military probably has tons of satellite data that they can cross against an image, but not the automated reasoning. But put those two together and it really gets scary.

amelius•2mo ago

Reminds me of articles like:

https://www.bellingcat.com/news/2019/12/05/two-europol-stopc...

neom•2mo ago

I took one of the conversations you linked, and used it to find out what else it knows about you. "In simple terms: Simon represents an elite technologist class — someone who is not merely wealthy or successful but who also shapes technology and information flows themselves, especially in open systems. His socioeconomic profile is "creator of value," not merely "consumer of value."

If you want, I could sketch a socioeconomic archetype like "The Free Agent Technologist" that would match people like him really well. Would you like me to?"

Xplune13•2mo ago

I'm not sure whether it's just the o4-mini which is failing this task for me or what, but it did not perform well on the pictures I provided. I took a screenshot of the photo both the times to avoid any metadata input.

E.g. I first gave it a passage inside of Basel Main Train Station which included a text 'Sprüngli', a Swiss brand. The model got that part correct, but it suggested Zurich which wasn't the case.

The second picture was a lot tougher. It was an inner courtyard of a museum in Metz, and the model missed right from the start and after roaming around a bit (in terms of places), it just went back to its first guess which was a museum in Paris. It recognized that the photo was from some museum or a crypt, but even the city name of 'Metz' never occurred in its reasoning.

All in all, it's still pretty cool to see it reason and make sense out of the image, but for a bit lesser exposed places, it doesn't perform well.

anotherpaulg•2mo ago

I've long been fascinated by AI's ability to do the reverse: generate photos with lots of highly relevant content when the prompt includes a location. Terrain, plants, buildings, landmarks, coastlines and lots of details are included.

Here's an example [0] for "Riding e-scooters along the waterfront in Auckland". The iconic spire is correctly included, but so are many small details about the waterfront.

I've been meaning to harness this into a very-low-bandwidth image compression system. Where you take a photo and crunch it to an absurdly low resolution that includes EXIF data with GPS, date/time. You then reconstruct the fine details with AI.

Most photos are taken where lots of photos are taken, so the models have probably been appropriately trained.

[0] https://chatgpt.com/share/680d0008-54a0-8012-91b7-6b1794f485...

caseyy•2mo ago

Thanks for sharing. I fed it three photos, and it got the one I was close to right (using my location), but for the other two, it could only guess the country. That's still pretty cool.

rolph•2mo ago

there must be a threshold level of detail, or cues.

im hunching, if you submit a photo of a clear sky, or a blue screen, it will choke

simonw•2mo ago

Absolutely. It's not at all hard to come up with images that this won't work with. What's fun is coming up with images that give it a fighting chance (while not being too obvious ), like the one in my post.

geoffbp•2mo ago

Just me who couldn’t load the conversation from the blog?

tompagenet2•2mo ago

I thought from this [0] that o3 makes up using python when it doesn't actually do so, or have I misunderstood or unduly trusted that link?

[0] https://transluce.org/investigating-o3-truthfulness

simonw•2mo ago

You need to learn how to tell the difference between a syntax highlighted Markdown Python code block and Python that was passed through the Code Interpreter tool, but there is a visual difference. Executed Python displays on a black background.

cyral•2mo ago

It can definitely make up reasoning (including code) for how it got to an answer. However o3 can run python actually. I tried uploading an image and it ran a bunch of scripts to crop and change the brightness in an attempt to get a clearer view of various features.

belter•2mo ago

Ok so if given LLM generated code...Will o3 be able to find commercial or open source code similar or very, very, similar to the LLM generated code? Meaning the training source code, possibly showing copyright violations?

So its own code version of "where was this photo taken?"

simonw•2mo ago

o3 is very good at searching the web, so it might be able to do that.

youniverse•2mo ago

Does anyone remember that 4chan thing where they geolocated some secret flag location and they used info from planes they saw in the sky or something? I wonder if it could do that now.

the8472•2mo ago

https://knowyourmeme.com/memes/he-will-not-divide-us#season-...

screencap collage: https://desuarchive.org/int/thread/72117719/#72133796

atrettel•2mo ago

Using planes is a standard OSINT technique since their locations are well known. It's just another way to whittle down the location of things or to verify the location of a photo if you have a guess already.

tippytippytango•2mo ago

The python zoom in seems performative. A vision model already has access to all the data, how does zooming in help it? Still very cool that it can!

energy123•2mo ago

Yeah, once it gets converted into tokens how does "zooming in" somehow increase information content?

nutrientharvest•2mo ago

It's cropping the original image then tokenizing it again with less downsampling, not cropping its internal representation.

simonw•2mo ago

Yeah, I'm a little unconvinced by that. My best guess there is that the vision input has quite a restricted resolution and "zooming in" (really, cropping to an area) lets it get more information about the region of the photo because it's not as "fuzzy". Just a hunch though.

Legend2440•2mo ago

Vision models are typically bad at small details. If there’s too much stuff going on at once, they can’t focus on the entire image.

xlii•2mo ago

Tried the same, results made me laugh.

Completely clueless. I've seen passing prompts 8 about how it's not in the city I am and yet it tries again and again. My favourite moment was when it started analysing piece of blurry asphalt.

After 6 minutes o3 it was confidently wrong: https://imgur.com/a/jYr1fz1

IMO not-in-US is actually great test if something was in LLMs data and the whole search is a for show.

SamPatt•2mo ago

I'm surprised to hear that. I keep running tests and the results are incredible, not only in the US.

For example, here's a screenshot from a random location I found in Google Street View in Jordan:

https://cdn.jsdelivr.net/gh/sampatt/media@main/posts/2025-04...

And here's o3 nailing it:

https://cdn.jsdelivr.net/gh/sampatt/media@main/posts/2025-04...

Maybe using Google Street View images, zoomed out, tends to give more useful information? I'm unsure why there's such variance.

lolinder•2mo ago

Perhaps Google Street View is in the training set? These companies have basically scraped everything they can, I don't see any reason to believe they'd draw the line at scraping each other, and GSV is a treasure trove of labeled data.

chatmasta•2mo ago

I’ve had nearly 100% success with vacation photos in Europe, some simple landscapes and some obscured angles of landmarks. And that’s using the free ChatGPT with no CoT.

xlii•2mo ago

IMO the „thought” process is completely fake.

I wanted o3 to succeed so I gave more and more details. Every attempt was approx. 8 minute and it took 1h in total.

The extra input I provided (in order):

- belt of location of width of 40km (results and searches were made outside of the range)

- explicitly stated cities to omit (ignored instruction)

- construction date (wasn’t used in searches)

- OSM amenity (townhall) - streetnumber (it insisted that it’s incorrect and keep giving other result) - at that point there were only 6 results from overpass

- another photo with actual partial name of the city

- 8 minutes later it correlated it found using flag colors in front of the building

As others stated this „thought” process is completely hallucinating. IMO you either fall into bucket or good luck finding it.

On the other hand I decided to tryout Gemini for some personal project and I found responses much better than GPTs. Not about correctness but in „attitude” form.

SamBam•2mo ago

Huh, I've been very impressed. I've given it photos I took in a Nairobi slum, a random non-iconic street in Bath, a closeup of a road in Tuscany, and a small playground in Jakarta, and it got them all perfectly.

cameronh90•2mo ago

I took a photo of my cat inside my house, with nothing from visible except the sky, stripped the EXIF, and it STILL managed to get within a few hundred metres of my location - just by inferring based on my interior design and the layout of my house.

I’m sure there was an element of luck involved but it was still eery.

paxys•2mo ago

Not sure if this is true or not but people have pointed out that it uses data from your past conversations to make a guess.

simonw•2mo ago

Yeah, I had to turn off chat history after I spotted it doing that.

mimischi•2mo ago

Also wondering if, as another commenter mentioned, it might be trying to estimate your location just by network means.

simonw•2mo ago

It absolutely does that - o3 knows your current location based on IP address etc. This means for a fair test you need to use a photo taken nowhere near your current vicinity - that's why I added examples for Madagascar and Buenos Aires at the end of my post: https://simonwillison.net/2025/Apr/26/o3-photo-locations/#up...

mimischi•2mo ago

Thanks! Looks like I missed that last part somehow :)

wkat4242•2mo ago

And of course make sure you turn off geotagging in the exif :)

But really, if Google Street View data (or similar) is entirely part of the training dataset it is more than expected that it has this capability.

cameronh90•2mo ago

It’s true. Unfortunately I can’t post proof without doxxing myself obviously, but I understand the skepticism considering I’m not sure I’d believe it if I hadn't seen it myself.

I have no memories stored, and in any case it shouldn’t know where I live exactly. The reasoning output didn’t suggest it was relying on any other chat history or information outside the image, but obviously you can’t fully trust it either.

oumua_don17•2mo ago

I read this blog post, then went for a walk with my spouse. On the way back, took a photo of a popular building in my city. I am not sure if it's just the way I took the photo but o3 tried for 14 minutes and then gave up with Error in message stream response.

It also curiously mentioned why this user is curious about the photo.

I relented after o3 gave up and let it know what building and streets it was. o3 then responded with an analysis of why it couldn't identify the location and asking for further photos to improve it's capabilities :-) !!!

cluelesssness•2mo ago

there is also some more systematic research on this phenomenon from roughly half a year ago, demonstrating that even much less recent vision-language models are pretty good at guessing not just location but also other personal infos about you such as sex, age, education, etc.

https://arxiv.org/pdf/2404.10618

would be interesting to see how much better these reasoning models would be on the benchmark

UrineSqueegee•2mo ago

I am honestly baffled by these comments, the few times i've given it photos to guess the location, it couldn't guess it even remotely close.

simonw•2mo ago

Which model and prompt did you use? What kind of photos?

Tacite•2mo ago

Is it an US thing? I tried with 17 pictures from Europe and Asia, not in capital cities but in fairly big cities, and it didn't guess any. Sometimes it got the country correct but that was because of signs, so I could have guessed it too.

mvdtnz•2mo ago

It's a guy who uploaded a photo with EXIF data and believes the made up explanation given by the "AI".

otabdeveloper4•2mo ago

It's probably trained on Google street view or equivalent.

rvba•2mo ago

I wonder if it can catch spies

andrewstuart•2mo ago

Crime fighting will no doubt use these sorts of techniques.

In Australia recently there was a terrible criminal case of massive child abuse.

They caught the guy because he was posting videos and one of them had a blanket which they somehow identified and traced to the child care Centre that he worked at.

It wasn’t done with AI but I can imagine photos and videos being fed into AI in such situations and asked to identify the location/people or other clues.

SwankyHank•2mo ago

I also guessed (at first glance) it was half moon bay.

forgotTheLast•2mo ago

I tried it twice with 4o and the results were comical:

- picture taken on a road through a wooded park: It correctly guessed north america based on vegetation. Then incorrectly guessed Minnesota based on the type of fence. I tried to steer it in the right direction by pointing out license plates and signage but it then hallucinated a front license plate from Ontario on a car that didn't have any, then hallucinated a red/black sign as a blue/green Parks Ontario sign.

- picture through a middle density residential neighborhood: it correctly guessed the city based on the logo on a compost bin but then guessed the wrong neighborhood. I tried to point out a landmark in the photo and it insisted that the photo was taken in the wrong neighborhood, going as far as giving the wrong address for one of the landmarks, imagining another front license plate on a car that didn't have one, and imagined a backstory for a supposedly well known stray cat in the photo.

bonoboTP•2mo ago

4o is not a "reasoning" model. o3 and o4 are.

forgotTheLast•2mo ago

Yes my mistake, it was o4-mini.

neves•2mo ago

If you want to try it with a public free model, use https://chat.qwen.ai

Don't forget to activate reasoning.

My wife is a historian and just discovered the exact location of a travel photo of 1924

atrettel•2mo ago

This is somewhat interesting, but I should note that the company Geospy [1] already has an AI tool to locate where a photo is taken, though it is now limited to law enforcement and intelligence agencies only. See this article [2] by 404 Media for more information.

[1] https://geospy.ai/

[2] https://www.404media.co/the-powerful-ai-tool-that-cops-or-st...

DidYaWipe•2mo ago

So... where do you go to try this? I didn't notice any link in the article.

simonw•2mo ago

https://chatgpt.com - I was using o3 which I think is paid only, but o4-mini and o4-mini-high should both provide similar results and I think at least one of those is available on the free plan.

EDIT: My mistake, looks like those models are only available on the $20/month Plus plan or higher. I added a note about that to my post.

DidYaWipe•2mo ago

Thanks for the info. Do they still require a working phone number? Their insistence on that is why I have never tried ChatGPT.

simonw•2mo ago

I imagine they do, because that's a useful way to prevent people from signing up for hundreds of accounts to abuse a free preview tier. Phone numbers are harder to obtain many of them other things like email addresses.

I've never heard of them abusing that phone number for anything other than that initial verification code.

lxe•2mo ago

I just took a nondescript photo of my culdesac... no signs or house numbers, nothing.

I used a temporary chat, so no info about me is in the memory.

It guessed correctly down to the suburban town.

When asked to explain how it did it, it listed incredibly deductive reasoning.

Color me impressed.

lesinski•2mo ago

This reminds me of when people are watching YouTubeTV and they see an ad for something they were talking about and are like, "Woah, it must be listening to us!"

When actually, modern ML can make really good guesses about ad relevancy using your location, data partners and recent searches from your home's IP address. When you explain this to people, they will still be convinced that the computer is listening to you and reasoning its way to deliver ads for you.

leptons•2mo ago

>and recent searches from your home's IP address

This is the "Woah, it must be listening to us" part. Because it is listening, not only just sound.

Spivak•2mo ago

The part that is crazy is when it's able to piece something together that neither I nor my partner have searched for, looked at, seemingly given any digital trail for outside of talking about it between ourselves. Which is why I think people assume it must be mic data but that undersells the magic, it would still work even if your phones were miles away.

geoelectric•2mo ago

I caught mine fishing data out of personalization and extended memory to help it home in.

When I cleared personalization data and turned off extended memory it quit being nearly so accurate.

KeplerBoy•2mo ago

Feels a bit boring to use a picture of California. That's just a little too "in distribution" for a feature likely developed and tested there.

simonw•2mo ago

I agree, that's why I followed up with a photo from Madagascar.

KeplerBoy•2mo ago

Ah, must've missed that. Either way I appreciate your frequent posts testing new AI models.

stuaxo•2mo ago

OK, but we know so much tech is from California.

Pick some random towns from a part of the UK: Horsham, Guildford, Worthing, Sheffield and see how it goes?

tossandthrow•2mo ago

Likely trained on something like Google streetview.

I would expect it to be able to guess on par with highly documented places.

ramoz•2mo ago

I wasn't so impressed the other day.

It failed to locate the image I provided. It got caught in a loop of cropping the image and presumably running some multi-approach to similarly search with images. If you use their image gen it is quite clear that they must've amassed a large image database at this point that they use for reference material pre-gen.

https://i.redd.it/ddscx4zibpme1.jpeg

I was intrigued because it looked much like Charleston SC. O3 didn't consider that and had to keep reminding itself that it couldn't possibly be the African flat bridge it kept determining it was (as the crop image analysis could recognize the suspension/cable structure).

MuffinFlavored•2mo ago

I can't get Gemini or Claude to tell me anything more than coastal/central/southern California

Anybody else?

AlienRobot•2mo ago

AI took Rainbolt's job.

gcanyon•2mo ago

I tested on this image: https://imgur.com/a/YSoMBc1

It nailed it down to the exact park in Lisbon: https://maps.app.goo.gl/vpvqA14TCbeFb3Rd7

> The little clues that make me lean that way are the Portuguese-language exercise sign (“Abdominais”), the classic calçada portuguesa cobblestone path, the style of the outdoor-gym equipment the city installs, and—biggest tell—the long red-and-white neoclassical façade in the background, which looks just like the Nova Medical School building that borders that square.

I had forgotten the medical school!

It's not perfect, it thought this was in KL Malaysia: https://imgur.com/a/fbSDqyp (it's Bangkok Thailand)

> My hunch this time: the covered pedestrian sky-bridge that links Pavilion Bukit Bintang to the Kuala Lumpur Convention Centre / Suria KLCC in downtown Kuala Lumpur, Malaysia. The things that tipped me off are the distinctive white-panel ceiling and slim columns of that air-conditioned walkway, the glass balustrades, the vertical “PAVILION” banner glimpsed on the right, and the big LED façade of Pavilion Mall visible off to the left.

The sign on the right is part of the text Central World, which is a prominent mall in Bangkok.

koyote•2mo ago

It's weird that it did not guess the second picture. There's visible Thai text and even a Thai flag.

I've had it fail similarly where it saw some text, messed up the OCR and was then lead to a wrong conclusion.

beefnugs•2mo ago

Well they know they wont be able to enshittify by injecting ads into peoples code... so I guess its time to extend the tracking info $cha-ching$

ipunchghosts•2mo ago

I would posit that these models should be able to guess the real-life user from a set of their forum posts then. Instead of geoguessing, it's op-guessing.

sandspar•2mo ago

People say that being an influencer or streamer is future-proof after AI. "People will still want to watch other people" etc. I dunno. Tim Ferris once remarked that having a fan base of 1 million is like having a fan base of a city. And a city of 1 million people contains all sorts of people, including murderers. I dunno if I'd want to stream video games from my house if it meant my most obsessed, scary fan could trivially find my address. On top of that, AI facial identification will get much better. E.g. a photo of your face may be enough to diagnose you with future diseases and mental instabilities etc. Just seems like a bad time to have photos of yourself online.

chrischen•2mo ago

I'm actually less impressed that it didn't use the EXIF data, even though it said it would, since there was no prompt restricting that technique.

geniium•2mo ago

I find this mindblowing.

A few years ago we were disocovering AI chat. AI could create sentences and have a basic conversation with us.

Today, it can identify a photo location with minimal "direct' information on it.

Where will it be in 3 years?? Crazy time to be alive.

c7b•2mo ago

> Technology can identify locations from photographs now.

Tbf, it could do that before, and probably still better than the LLM: https://youtube.com/watch?v=ts5lPDV--cU But seeing it as what appears to be an emergent capability in such a general model is something else.

0x008•2mo ago

thanks for posting this video, that was really entertaining!

fzeindl•2mo ago

I wonder what would happen if you give o3 before/after photographs of a crime scene and ask it what happened in between.

otabdeveloper4•2mo ago

No need to wonder, we already know it will hallucinate a bunch of plausible-sounding bullshit.

Ylpertnodi•2mo ago

No less than polygraph tests, and 'spiritualists'. James Randi would have a field day.

ghaff•2mo ago

Pervasive public photography means that, increasingly, if you’re out in public, given increasingly modest effort, you can be identified even if you haven’t done anything personally to publicize your goings and comings.

anshumankmr•2mo ago

YMMV. I had a group pic in Lalbagh, though in its CoT, it did think of it, but went with Cubbon Park, but for a person who's gone to both couple of times, it might be easy.

RataNova•2mo ago

Feels like privacy is becoming this quaint, old-fashioned idea we're all just pretending still exists

620gelato•2mo ago

Advance of civilization is nothing but an exercise in the limiting of privacy. (Foundation's edge)

sinuhe69•2mo ago

Did anybody try to fool it with an AI-generate photo, eventually Photoshopped and edited EXIF etc.? Lay a false crumb trail. See how it will react. :D

That’s gonna be fun!

pfdietz•2mo ago

Can it interact with Google's Streetview to verify? I've used that to identify the location of World War 2 photographs, using buildings that have survived to this day. It would be cool if it could do the same.

jurgenaut23•2mo ago

I guess this is one of those things where you actually expect an AI model to work really well (in their current embodiment). As highlighted by other comments, it takes a lot of pattern matching to make a good guess at this game.

sgtnoodle•2mo ago

I'm embarrassed that I live in HMB and didn't recognize it. I should go to the harbor more.

simonw•2mo ago

It was at Jettywave Distillery - if you haven't checked it out yet it has fantastic outdoor seating, really good food and the best cocktails in HMB.

amazingman•2mo ago

I'm trying this now with a difficult example and it's spinning. A lot. It's really interesting to see the methodology of its approach, but I am slowly becoming certain that this single chat costs more than every other chat I've ever had. Maybe even more than this month's $20. It's just spiraling.

(It eventually "lost" the network connection)

fennecbutt•2mo ago

I do like all the comments saying it's not good at it.

Bro, it's not even been trained to do that specifically. Imagine what an image to location specific model could do.

zavec•2mo ago

Fascinating stuff! I'm not familiar with these new reasoning based models, I wonder how much of the reasoning ability comes from the weights vs the architecture of the model itself vs the system prompt.

Debcraft – Easiest way to modify and build Debian packages

Were Americans ever really healthy?

How to Write Great Prompts for String

Reinventing the Python Wheel

Why don't I drink? How much time you got?

"Far out, man": how Jimi Hendrix boosted the career of Sha Na Na (2024)

Build an AI Agent Web App with String and Lovable

Cascading retrieval with multi-vector representations

Earn $200 by referring only. FREE

What a bumble bee chooses to eat may not match its ideal diet

Shutting Down Clear Linux OS

Nuxt Joins Vercel

The Kap Programming Language

A Software for One

Women Are Falling Behind in America's Return to the Office

Astronomer launches internal investigation after viral Coldplay video

Build your CV on Subreply as a LinkedIn alternative

Curse Not the King

The Physics of Dissonance (MinutePhysics) [video]

Billionaire Gabe Newell: pitching VCs makes no business sense

Ccusage: A CLI tool for analyzing Claude Code usage from local JSONL files

Fuzzing macOS Userland (For Fun and Pain)

Free Online Minesweeper

DHH – I Hate TypeScript (3 min video)

Show HN: Interactive Bash tutorial that runs in the browser

Show HN: Castream – Native iOS/Android IRL multistreaming app

There Is No Antimemetics Division – A Novel (2025)

First earthquake, then fire: UC San Diego researchers test steel building

Ask HN: What are your favorite open source AI agent implementations?

Node.js 18 is being deprecated

Debcraft – Easiest way to modify and build Debian packages

Were Americans ever really healthy?

How to Write Great Prompts for String

Reinventing the Python Wheel

Why don't I drink? How much time you got?

"Far out, man": how Jimi Hendrix boosted the career of Sha Na Na (2024)

Build an AI Agent Web App with String and Lovable

Cascading retrieval with multi-vector representations

Earn $200 by referring only. FREE

What a bumble bee chooses to eat may not match its ideal diet

Shutting Down Clear Linux OS

Nuxt Joins Vercel

The Kap Programming Language

A Software for One

Women Are Falling Behind in America's Return to the Office

Astronomer launches internal investigation after viral Coldplay video

Build your CV on Subreply as a LinkedIn alternative

Curse Not the King

The Physics of Dissonance (MinutePhysics) [video]

Billionaire Gabe Newell: pitching VCs makes no business sense

Ccusage: A CLI tool for analyzing Claude Code usage from local JSONL files

Fuzzing macOS Userland (For Fun and Pain)

Free Online Minesweeper

DHH – I Hate TypeScript (3 min video)

Show HN: Interactive Bash tutorial that runs in the browser

Show HN: Castream – Native iOS/Android IRL multistreaming app

There Is No Antimemetics Division – A Novel (2025)

First earthquake, then fire: UC San Diego researchers test steel building

Ask HN: What are your favorite open source AI agent implementations?

Node.js 18 is being deprecated

Watching o3 guess a photo's location is surreal, dystopian and entertaining

Comments