OPENAI_API_KEY="$(llm keys get openai)" \
uv run https://tools.simonwillison.net/python/openai_image.py \
-m gpt-image-2 \
"Do a where's Waldo style image but it's where is the raccoon holding a ham radio"
Code here: https://github.com/simonw/tools/blob/main/python/openai_imag...Here's what I got from that prompt. I do not think it included a raccoon holding a ham radio (though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure): https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...
I see an opportunity for a new AI test!
"Inspect waldo.png, and give me the pixel location of a raccoon holding a ham radio."
It used python to split the images into smaller sections, and got this:
"Found it. The raccoon holding a ham radio is inside the HAM RADIO booth, standing between two humans.
Pixel location: approximately (662, 500) in the 1402×1122 image.
Bounding box of the raccoon is roughly (645, 475) to (680, 525). You can see its characteristic dark-masked face, pointy ears,
light belly, and it's holding the radio at chest level." (Baked for 2m 23s)
Looks like a human to me. So I prompted: "That is a human, but I understand the confusion. Look elsewhere.".It made much smaller sections and inspected them one by one.
"Found it — the raccoon is sitting on the ground near the trash can, holding a ham radio with a visible speaker/dial face.
Pixel location: approximately (370, 1030) in the 1402×1122 image.
Bounding box of the raccoon + radio is roughly (340, 1005) to (400, 1060). It's in the lower-left region, just right of the
General Store, next to a silver trash can." (Sautéed for 4m 26s)
Still not it, but it was an interesting visual test.It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.
(I don't think it's right).
> please add a giant red arrow to a red circle around the raccoon holding a ham radio or add a cross through the entire image if one does not exist
and got this. I'm not sure I know what a ham radio looks like though.
https://i.ritzastatic.com/static/ffef1a8e639bc85b71b692c3ba1...
OPENAI_API_KEY="$(llm keys get openai)" \
uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
-m gpt-image-2 \
"Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
--quality high --size 3840x2160
https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!I think that image cost 40 cents.
"Found the raccoon holding a ham radio in waldo2.png (3840×2160).
- Raccoon center: roughly (460, 1680)
- Ham radio (walkie-talkie) center: roughly (505, 1650) — antenna tip around (510, 1585)
- Bounding box (raccoon + radio): approx x: 370–540, y: 1550–1780
It's in the lower-left area of the image, just right of the red-and-white striped souvenir umbrella, wearing a green vest. "
Which is correct!I don't think it'll fail like Sora though. gpt-image-1.5 didn't fail.
> Wow, the difference between AI and non-AI images collapses. I hate the future where I won't be able to tell the difference.
Image generation is now pretty much "solved". Video will be next. Perhaps things will turn out the same as chess: in that even though chess was "solved" by IBM's Deep Blue, we still value humans playing chess. We value "hand made" items (clothes, furniture) over the factory made stuff. We appreciate & value human effort more than machines. Do you prefer a hand-written birthday card or an email?
Photographs, videos, and digital media in general, in contrast, are used for much, much more than just socializing.
Feels like now is a bit of a catchup after pretty tepid period that was most of my life.
I think we all know the feeling of getting an image that is ok, but needs a few modifications, and being absolutely unable to get the changes made.
It either keeps coming up with the same image, or gives you a completely new take on the image with fresh problems.
Anyone know if modification of existing images is any better?
Anything better that OpenAI?
Was this an oversight? Or did their new image generation model generate an image that was essentially a copy of an existing image?
There is definitely enough empirical validation that shows image models retain lots of original copies in their weights, despite how much AI boosters think otherwise. That said, it is often images that end up in the training set many times, and I would think it strange for this image to do that.
Regardless, great find.
magick image-l.webp image-r.jpg -compose difference -composite -auto-level -threshold 30% diff.png
It's practically all dark except for a few spots. It's the same image just different size compression whatever. I can't find it in any stock image search, though. Surely it could not have memorized the whole image at that fidelity. Maybe I just didn't search well enough.I have a sideproject where I want to display standup comedies. I thought I could edit standup comedy posters with some AI to fit my design. Gemini straight up refuses to change any image of any standup comedy poster involving a well know human. OpenAI does not care and is happy to edit away
Just for testing, I just tried this https://i.ytimg.com/vi/_KJdP4FLGTo/sddefault.jpg ("Redesign this image in a brutalist graphic design style"). Gemini refuses (api as well as UI), OpenAI does it
minimaxir•1h ago