That said, I am surprised Seedream 4.0 beat it in these tests.
Google is so weirdly non-integrated.
Seedream 4.0 is somewhat slept on for being 4k at the same cost as nano-banana. It's not as great at perfect 1:1 edits, but it's aesthetics are much better and it's significantly more reliable in production for me.
Models with LLM backbones/omni-modal models are not rare anymore, even Qwen Image Edit is out there for open-weights.
OP here. While Seedream did have the edge in adherence it also tends to introduce slight (but noticeable) color gradation changes. It's not a huge deal for me, but it might be for other people depending on their goals in which case NanoBanana would be the better choice.
Still, to my eye, ai generated images still feel a bit off when doing with real world photographs.
George's hair, for example, looks over the top, or brushed on.
The tree added to the sleeping person on the ground photo... the tree looks plastic or too homogenized.
It's mostly because image model size and required compute for both training and inference have grown faster than self-hosted compute capability for hobbyists. Sure, you can run Flux Kontext locally, but if you have to use a heavily quantized model and wait forever for the generation to actually run, the economics are harder to justify. That's not counting the "you can generate images from ChatGPT for free" factor.
> George's hair, for example, looks over the top, or brushed on.
IMO, the judge was being too generous with the passes for that test. The only one that really passes is Gemini 2.5 Flash Image:
Flux Kontext: In addition to the hair looking too slick, it does not match the VHS-esque color grading of the image.
Qwen-Image-Edit: The hair is too slick and the sharpness/saturation of the face unnecessarily increases.
Seedream 4: Color grading of the entire image changes, which is the case with most of the Seedream 4 edits shown in this post, and why I don't like it.
The economics 1000% do not justify me owning a GPU to do this. I just happen to own one.
Some might critique the prompts and say this or that would have done better, but they were the kind of prompt your dad would type in not knowing how to push the right buttons.
I've been using Nano Banana quite a lot, and I know that it absolutely struggles at exterior architecture and landscaping. Getting it to add or remove things like curbs, walkways, gutters, etc, or to ask to match colors is almost futile.
Still useful comments, as the models mostly overlap
I think this was fairly predictable, but as engineering improvements keep happening and the prompt adherence rate tightens up we're enjoying a wild era of unleashed creativity.
If you've already got a decent GPU (or were going to get one anyways) then cost isn't really a consideration, it's just that you can already do it. For everyone else, you can probably get by just using things like Google's AI Studio for free.
If I were to make an image editing app, this would be the model I'd choose.
E.g. Gemini 2.5 Flash is given extreme leeway with how much it edits the image and changes the style in "Girl with Pearl Earring" only to have OpenAI gpt-image-1 do a (comparatively) much better job yet still be declared failed after 8 attempts, while having been given fewer attempts than Seedream 4 (passed) and less than half the attempts of OmniGen2 (which still looks way farther off in comparison).
joomla199•7h ago