Better to read that particular story in the context of, "It would be very difficult to make a seed fund that is an index of all avant garde culture making because [whatever]."
That is, it's nice to make a pretty stand-alone image, but without tools to maintain consistency and place them in context you can't make a project that is more than just one image, or one video, or a scattered and disconnected sequence of pieces.
One curious case demoed here in the docs is the grid use case. Nano Banana Pro can also generate grids, but for NBP grid adherence to the prompt collapses after going higher than 4x4 (there's only a finite amount of output tokens to correspond to each subimage), so I'm curious that OpenAI started with a 6x6 case albeit the test prompt is not that nuanced.
https://mordenstar.com/blog/edits-with-nanobanana
In particular, NB Pro successfully assembled a jigsaw puzzle it had never seen before, generated semi-accurate 3D topographical extrapolations, and even swapped a window out for a mirror.
Also: SUPER ANNOYING. It seems every time you give it a modification prompt it erases the whole conversation leading up to the new pic? Like.. all the old edits vanish??
I added "shaky amateur badly composed crappy smartphone photo of ____" to the start of my prompts to make them look more natural.
Counterpoint from someone on the Musk site: https://x.com/flowersslop/status/2001007971292332520
One thing that gpt-image-1 does exceptionally well that Nano Banana (Pro) can't is previz-to-render. This is actually an incredibly useful capability.
The Nano Banana models take the low-fidelity previz elements/stand-ins and unfortunately keep the elements in place without attempting to "upscale" them. The model tries to preserve every mistake and detail verbatim.
Gpt-image-1, on the other hand, understands the layout and blocking of the scene, the pose of human characters, and will literally repair and upscale everything.
Here's a few examples:
- 3D + Posing + Blocking: https://youtu.be/QYVgNNJP6Vc
- Again, but with more set re-use: https://youtu.be/QMyueowqfhg
- Gaussian splats: https://youtu.be/iD999naQq9A
- Gaussians again: https://youtu.be/IxmjzRm1xHI
We need models that can do what gpt-image-1 does above, but that have higher quality, better stylistic control, faster speed, and that can take style references (eg. glossy Midjourney images).
Nano Banana team: please grow these capabilities.
Adobe is testing and building some really cool capabilities:
- Relighting scenes: https://youtu.be/YqAAFX1XXY8?si=DG6ODYZXInb0Ckvc&t=211
- Image -> 3D editing: https://youtu.be/BLxFn_BFB5c?si=GJg12gU5gFU9ZpVc&t=185 (payoff is at 3:54)
- Image -> Gaussian -> Gaussian editing: https://youtu.be/z3lHAahgpRk?si=XwSouqEJUFhC44TP&t=285
- 3D -> image with semantic tags: https://youtu.be/z275i_6jDPc?si=2HaatjXOEk3lHeW-&t=443
I'm trying to build the exact same things that they are, except as open source / source available local desktop tools that we can own. Gives me an outlet to write Rust, too.
gpt-image-1: https://imgur.com/a/previz-to-image-gpt-image-1-Jq5M2Mh
nano banana / pro: https://imgur.com/a/previz-to-image-nano-banana-pro-Q2B8psd
gpt-image-1 excels in these cases, despite being stylistically monotone.
I hope that Google, OpenAI, and the various Chinese teams lean in on this visual editing and blocking use case. It's much better than text prompting for a lot of workflows, especially if you need to move the camera and maintain a consistent scene.
While some image editing will be in the form of "remove the object"-style prompts, a lot will be molding images like clay. Grabbing arms and legs and moving them into new poses. Picking up objects and replacing them. Rotating scenes around.
When this gets fast, it's going to be magical. We're already getting close.
This tool is keeping my look the same.
My own profile picture? Can’t edit some public figures. A famous Norman Rockwell painting from 80 years ago? Can’t edit some public figures.
Safety’d into oblivion.
At the least, it's not present in these new images.
(although I get what you mean, not easily since you already trained)
I'm guessing when they get a clean slate we'll have Image 2 instead of 1.5. In LMArena it was immediately apparent it was an OpenAI model based on visuals.
And I say "subtle" - but because that model would always "regenerate" an image when editing, it would introduce more and more of this yellow tint with each tweak or edit. Which has a way of making a "subtle" bias anything but.
They forgot to calibrate the cameras, so everything had a green tint.
Meanwhile all the other teams had a billion macbeth charts lying around just in case.
What angle is there for second tier models? Could the future for OpenAI be providing a cheaper option when you don't need the best? It seems like that segment would also be dominated by the leading models.
I would imagine the future shakes out as: first class hosted models, hosted uncensored models, local models.
Where is the image given along with the prompt? If I didn't miss it: Would have been nice to show the attached image.
(Realistically, Seedream 4 is the best at aesthetically pleasing generation, Nano Banana Pro is the best at realism and editing, and Seedream 4.5 is a very strong middleground between the two with great pricing)
gpt-image-1.5 feels like OpenAI doing the bare minimum to keep people from switching to Gemini every time they want an image.
I'm honestly surprised they're still on this post-Sora 2: let the consumer of the API determine their risk appetite. If a copyright holder comes knocking, "the API did it" isn't going to be a defense either way.
They even linked to their Image Playground where it's also not available..
I updated my local playground to support it and I'm just handling the 404 on the model gracefully
It's too bad no OpenAI Engineers (or Marketers?) know that term exists. /s
I do not understand why it's so hard for them to just tell the truth. So many announcements "Available today for Plus/Pro/etc" really means "Sometime this week at best, maybe multiple weeks". I'm not asking for them to roll out faster, just communicate better.
POST "https://api.openai.com/v1/responses": 500 Internal Server Error {
"message": "An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID req_******************* in your message.",
"type": "server_error",
"param": null,
"code": "server_error"
}
Interestingly if you change to request the model foobar you get an error showing this: POST "https://api.openai.com/v1/responses": 400 Bad Request {
"message": "Invalid value: 'blah'. Supported values are: 'gpt-image-1' and 'gpt-image-1-mini'.",
"type": "invalid_request_error",
"param": "tools[0].model",
"code": "invalid_value"
}I really hope everyone is starting to get disillusioned with OpenAI. They're just charging you more and more for what? Shitty images that are easy to sniff out?
In that case, I have a startup for you to invest in. Its a bridge-selling app.
My own main use cases are entirely textual: Programming, Wiki, and Mathematics.
I almost never use image generation for anything. However its objectively extremely popular.
This has strong parallels for me to when snapchat filters became super popular. I know lots of people loved editing and filtering pictures but I always left everything as auto mode, in fact I'd turn off a lot of the default beauty filters. It just never appealed to me.
In late stage capitalism you pay for fake photos with someone. You have chat gpt write about how you dated for a summer, and have it end with them leaving for grad school to explain why you aren't together.
Eventually we'll all just pay to live in the matrix. When your credit card is declined you'll be logged out, to awaken in a shared studio apartment. To eat your rations.
But after a point it'll hit saturation point. The novelty will wear off since everyone has access to it. Who cares if you have a fake photo with a celebrity if everyone knows it's fake.
Question: with copyright and authorship dead wrt AI, how do I make (at least) new content protected?
Anecdotal: I had a hobby of doing photos in quite rare style and lived in a place where you'd get quite a few pictures of. When I asked gpt to generate a picture of that are in that style, it returned highly modified, but recognizable copy of a photo I've published years ago.
(to clarify, OpenAI stops refining the image if a classifier detects your image as potentially violating certain copyrights. Although the gulf in resolution is not caused by that.)
If someone were on vacation and came home to learn that their neighbor had allowed some friends stay in the empty house, we would often expect some kind of outrage regardless of whether there had been specific damage or wear to the home.
Culturally, people have deeply set ideas about what's theirs, and feel like they deserve some say over how their things are used and by whom. Even those that are very generous and want their things be widely shared usually want to have have some voice in making that come to be.
[I won't bother responding to the rest of your appalling comment]
How do you feel about entities taking your face off of your personal website and plastering it on billboards smiling happily next to their product? What if it’s for a gun? Or condoms? Or a candidate for a party you don’t support? Pick your own example if none of those bother you. I’m sure there are things you do not want to be associated with/don’t want to contribute to.
At the end of the day it’s very gross when we are exploited without our knowledge or permission so rich groups can get richer. I don’t care if my visual work is only partially contributing to some mashed up final image. I don’t want to be a part of it.
Apart from the 'newspaper' anachronism, that's pretty much still my take.
Sorry, but you'll just have to deal with it and get over it.
Air gap. If you don’t want content to be used without your permission, it never leaves your computer. This is the only protection that works.
If you want others to see your content, however, you have to accept some degree of trade off with it being misappropriated. Blatant cases can be addressed the same as they always were, but a model overfitting to your original work poses an interesting question for which I’m not aware of any legal precedents having been set yet.
Question: Now that the steamboats have been invented, how do I keep my clipper business afloat ?
Answer: Good riddance to the broken idea of IP, Schumpeter's Gale is around the corner, time for a new business model.
impressive stuff though - as you can give it a base image + prompt.
we have the capability, we just stopped making power more abundant.
It doesn't mention the new model, but it's likely the same or similar.
Now it means whoever has access to uncensored/non-watermarking models can pass off their faked images as real and claim, "Look! There's no watermark, of course, it's not fake!"
Whereas, if none of the image models did watermarking, then people (should) inherently know nothing can be trusted by default.
$ exiftool chatgpt_image.png
...
Actions Software Agent Name : GPT-4o
Actions Digital Source Type : http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgori...
Name : jumbf manifest
Alg : sha256
Hash : (Binary data 32 bytes, use -b option to extract)
Pad : (Binary data 8 bytes, use -b option to extract)
Claim Generator Info Name : ChatGPT
...
I suppose I'm going to have to bite the bullet and actually train an AI detector that works roughly in real time.
So, let's simulate that future. Since no one trusts your talent in coding, art or writing, you wouldn't care to do any of these. But the economy is built on the products and services which get their value based how much of human talent and effort is required to produce them.
So, the value of these services and products goes down as demand and trust goes down. No one knows or cares who is a good programmer in the team, who is great thinker and writer and who is a modern Picasso.
So, the motivation disappears for humans. There are no achievements to target, there is no way to impress others with your talent. This should lead to uniform workforce without much difference in talents. Pretty much a robot army.
-------------------
Highest-impact issues to fix
1) Clear copy editing error in a major section header
The section header reads “Precise edits that preserve what matter”—it should almost certainly be “what matters.” This appears both in the table of contents and the body header, so it’s high-visibility.
Why it matters: This is the kind of basic grammar error that undermines trust in the rest of the claims, especially in a product announcement.
Fix: Update the heading and TOC anchor text site-wide.
-------------------
Shouldn't an AI review of all web posts be part of some kind of agentic workflow for the leading AI lab at this point?
[1]: https://chatgpt.com/share/6941c96c-c160-8005-bea6-c809e58591...
POST "https://api.openai.com/v1/responses": 500 Internal Server Error {
"message": "An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID req_******************* in your message.",
"type": "server_error",
"param": null,
"code": "server_error"
}
POST "https://api.openai.com/v1/responses": 400 Bad Request {
"message": "Invalid value: 'blah'. Supported values are: 'gpt-image-1' and 'gpt-image-1-mini'.",
"type": "invalid_request_error",
"param": "tools[0].model",
"code": "invalid_value"
}That's still dangerously bad for the use-case they're proposing. We don't need better looking but completely wrong infographics.
I'd especially say like 100% of amateur political infographics/memes are wrong. ("climate change is caused by 100 companies" for instance)
Noticed it captured a megaman legends vibe ....
https://x.com/AgentifySH/status/2001037332770615302
and here it generated a texture map from a 3d character
https://x.com/AgentifySH/status/2001038516067672390/photo/1
however im not sure if these are true uv maps that is accurate as i dont have the 3d models itself
but ive tried this in nano banana when it first came out and it couldn't do it
I can tell you with 100% certainty they are not. For example, Crash doesn't have a backside for his torso. You could definitely make a model that uses these as textures, but you'd really have to force it and a lot of it would be stretched or look weird. If you want to go this approach, it would make a lot more sense to make a model, unwrap it, and use the wireframe UV map as input.
Here's the original Crash model: https://models.spriters-resource.com/pc_computer/crashbandic... , its actual texture is nothing like the generated one, because the real one was designed for efficiency.
Not even one. And no one on the team said anything?
Come on Sam, do better.
https://genai-showdown.specr.net/image-editing
Conclusions
- OpenAI has always had some of the strongest prompt understanding alongside the weakest image fidelity. This update goes some way towards addressing this weakness.
- It's leagues better at making localized edits without altering the entire image's aesthetic than gpt-image-1, doubling the previous score from 4/12 to 8/12 and the only model that legitimately passed the Giraffe prompt.
- It's one of the most steerable models with a 90% compliance rate
Updates to GenAI Showdown
- Added outtakes sections to each model's detailed report in the Text-to-Image category, showcasing notable failures and unexpected behaviors.
- New models have been added including REVE and Flux.2 Dev (a new locally hostable model).
- Finally got around to implementing a weighted scoring mechanism which considers pass/fail, quality, and compliance for a more holistic model evaluation (click pass/fail icon to toggle between scoring methods).
If you just want to compare gpt-image-1, gpt-image-1.5, and NB Pro at the same time:
https://genai-showdown.specr.net/image-editing?models=o4,nbp...
Personal request: could you also advocate for "image previz rendering", which I feel is an extremely compelling use case for these companies to develop. Basically any 2d/3d compositor that allows you to visually block out a scene, then rely on the model to precisely position the set, set pieces, and character poses.
If we got this task onto benchmarks, the companies would absolutely start training their models to perform well at it.
Here are some examples:
gpt-image-1 absolutely excels at this, though you don't have much control over the style and aesthetic:
https://imgur.com/a/previz-to-image-gpt-image-1-Jq5M2Mh
Nano Banana (Pro) fails at this task:
https://imgur.com/a/previz-to-image-nano-banana-pro-Q2B8psd
Flux Kontext, Qwen, etc. have mixed results.
I'm going to re-run these under gpt-image-1.5 and report back.
- Gemini/Nano did a pretty average job, only applying some grey to some of the panels. I tried a few different examples and got similar output.
- GPT did a great job and themed the whole app and made it look great. I think I'd still need a designer to finesse some things though.
ChrisArchitect•4h ago
dang•2h ago