Generative AI Image Editing Showdown

https://genai-showdown.specr.net/image-editing

194•gaws•7h ago

Comments

joomla199•7h ago

Good effort, somewhat marred by poor prompting. Passing in “the tower in the image is leaning to the right,” for example, is a big mistake. That context is already in the image, and passing that as a prompt will only make the model apt to lean the tower in the result.

minimaxir•6h ago

Everyone is sleeping on Gemini 2.5 Flash Image / Nano Banana. As shown in the OP, it's substantially more powerful than most other models while at the same price-per-image, and due to its text encoder it can handle significantly larger and more nuanced prompts to get exactly what you want. I open-sourced a Python package for generating from it with examples (https://github.com/minimaxir/gemimg) and am currently working on a blog post with even more representative examples. Google also allows generations for free with aspect ratio control in AI Studio: https://aistudio.google.com/prompts/new_chat

That said, I am surprised Seedream 4.0 beat it in these tests.

herval•6h ago

Gemini is great when it gets it right, but in my experience, it sometimes gives you completely unexpected results and won't get it right no matter what. You can see that in some of the examples (eg the Girl with the pearl earring one). I'm constantly surprised by how good Flux is, but the tragedy is most people (me included) will just default to whatever they normally use (chatgpt and gemini, in my case), so it doesn't really matter that it's better

dimitri-vs•5h ago

Agreed, to the point where I built my own UI where I can simultaneously generate three images and see a before/after. Most often only one of three is what I actually wanted.

daemonologist•6h ago

I don't think people are really sleeping on it - nano-banana more or less went viral when it first came out. I'd argue that aside from the capabilities built into ChatGPT (with the Ghibli craze and whatnot) craze it's the best known image editing model.

minimaxir•4h ago

It's a weird situation where the Gemini mobile app hit #2 on the App Stores because of free Nano Banana, but no one ever talks about it and most disclosed image generations I've seen are still ChatGPT.

ec109685•1h ago

Google photos should just include the feature. It’s kinda buried in Gemini.

Google is so weirdly non-integrated.

cosama•5h ago

I was trying to use gemini 2.5 flash image / nano banana to tidy up a picture of my messy kitchen. It failed horribly on my first attempt. I was quite surprised how much trouble it had with this simple task (similar to cleaning up the street in the post). On my second attempt I had it first analyze the image to point out all the items that clutter the space, and then on a second prompt had it remove all those items. That worked much better, showing how important prompt engineering is.

cpursley•5h ago

Meh, most Google AI products look great on paper but fail in actual real scenarios. And that ranges from their Claude Code clone to their buggy storybook thing which I really wanted to like.

BoorishBears•3h ago

No one is sleeping on nano-banana/Gemini Flash, it's highly over-tuned for editing vs novel generation and maxes out at a pretty low resolution.

Seedream 4.0 is somewhat slept on for being 4k at the same cost as nano-banana. It's not as great at perfect 1:1 edits, but it's aesthetics are much better and it's significantly more reliable in production for me.

Models with LLM backbones/omni-modal models are not rare anymore, even Qwen Image Edit is out there for open-weights.

vunderba•3h ago

> That said, I am surprised Seedream 4.0 beat it in these tests.

OP here. While Seedream did have the edge in adherence it also tends to introduce slight (but noticeable) color gradation changes. It's not a huge deal for me, but it might be for other people depending on their goals in which case NanoBanana would be the better choice.

hackthemack•6h ago

I do not use ai image generating much lately. It seemed like there was a burst of activity a year and half ago with self hosted models and using some localhost web guis. But now it seems like it is moving more and more to online hosted models.

Still, to my eye, ai generated images still feel a bit off when doing with real world photographs.

George's hair, for example, looks over the top, or brushed on.

The tree added to the sleeping person on the ground photo... the tree looks plastic or too homogenized.

minimaxir•6h ago

> But now it seems like it is moving more and more to online hosted models.

It's mostly because image model size and required compute for both training and inference have grown faster than self-hosted compute capability for hobbyists. Sure, you can run Flux Kontext locally, but if you have to use a heavily quantized model and wait forever for the generation to actually run, the economics are harder to justify. That's not counting the "you can generate images from ChatGPT for free" factor.

> George's hair, for example, looks over the top, or brushed on.

IMO, the judge was being too generous with the passes for that test. The only one that really passes is Gemini 2.5 Flash Image:

Flux Kontext: In addition to the hair looking too slick, it does not match the VHS-esque color grading of the image.

Qwen-Image-Edit: The hair is too slick and the sharpness/saturation of the face unnecessarily increases.

Seedream 4: Color grading of the entire image changes, which is the case with most of the Seedream 4 edits shown in this post, and why I don't like it.

janalsncm•4h ago

For 99% of my use cases I’ll just use ChatGPT or Gemini due to convenience. But if you want something with a specific style, Flux LoRAs are much better, in which case I’ll boot up the old 4090.

The economics 1000% do not justify me owning a GPU to do this. I just happen to own one.

jimmyl02•6h ago

I think reve (https://reve.com) should be in the running and would be very curious to see the results!

keyle•6h ago

This was fun.

Some might critique the prompts and say this or that would have done better, but they were the kind of prompt your dad would type in not knowing how to push the right buttons.

lxe•6h ago

This is vastly more useful than benchmark charts.

I've been using Nano Banana quite a lot, and I know that it absolutely struggles at exterior architecture and landscaping. Getting it to add or remove things like curbs, walkways, gutters, etc, or to ask to match colors is almost futile.

estetlinus•5h ago

I am trying Qwen Image Edit for turning day photos into night, mostly architecture etc. Most models are struggling, and Nano Banana misses edges and stuff, making the pictures align poorly.

kgwgk•6h ago

Recent discussion: https://news.ycombinator.com/item?id=45708795

jumploops•4h ago

Nit: the link there was `Text-to-Image` while this is `Image Editing`

Still useful comments, as the models mostly overlap

amelius•4h ago

A cat's paw has only 4 fingers.

ChickeNES•2h ago

Not always! https://en.wikipedia.org/wiki/Polydactyl_cat

spookie•9m ago

Doesn't look comfortable. Either way the same happens in humans, doesn't mean it is a good genetic mutation.

lschueller•4h ago

I wonder how much longer those annoying stock photo database will continue. They are great for press photography and such. But stock pics of people in offices for a website are nothing, I would buy a min 3 month subscription for anymore

delichon•4h ago

As generative AI eats away at the high royalty, restrictive license, consent evading, stereotype reinforcing business model of stock photo companies, it will be a challenge to resist the schadenfreude.

roenxi•4h ago

It is fun being one of the elderly who set their standards back in distant 2022. All these demos look incredible compared to SD1, 2 & 3. We've entered a very different era where the models seem to actually understand both the prompt and the image instead of throwing paint at the wall in a statistically interesting manner.

I think this was fairly predictable, but as engineering improvements keep happening and the prompt adherence rate tightens up we're enjoying a wild era of unleashed creativity.

CobrastanJorji•4h ago

I'm pretty sure that "replace the homeless man with a park bench" image was a reference to some TV show making a gentrification joke, but I can't put my finger on it. Anyone recall?

pram•4h ago

Simpsons, the Frank Scorpio episode. The advertisement for the company town shows a beggar slowly fading out and being replaced by a mailbox.

vunderba•3h ago

Yeah, I couldn't help myself on that one! It's a reference to the Cypress Creek promotional video from the Simpsons.

https://www.youtube.com/watch?v=foU9W7AkKSY

ineedasername•3h ago

Kontext is very good. Get yourself a 5060 ti 16GB and never have to pay for API calls again for this purpose, at least not when you have the time spare. If you need this sort of editing at the speed of gui-clicking + 10s, then you'll need to pay API tolls, or capex for > 5070/80.

zamadatix•2h ago

You have to REALLY be into AI to do this for generation/API cost reasons (or willing to have this as a hacking project of the month expense). Even ignoring electricity, a 16 GB 5060 Ti is more expensive than 16,000 image generations. Assuming you do one every 15 seconds, that's 240,000 seconds -> more than 2 months of usage at an hour a day of generations.

If you've already got a decent GPU (or were going to get one anyways) then cost isn't really a consideration, it's just that you can already do it. For everyone else, you can probably get by just using things like Google's AI Studio for free.

spookie•21m ago

GPUs are needed for plenty of reasons. I assume plenty have a decent dGPU, even on laptops.

wawayanda•2h ago

This is not the point of this post, but is anyone else getting tired of this front end style that Claude creates? I see it on web apps everywhere and (just like with AI writing and images) I get that funny "is this slop?" feeling

meowface•2h ago

Yes, though it might be GPT-5 UI.

shridharathi•2h ago

Here's a post I wrote on the Replicate blog putting these image editing models head-to-head. Generally, I found Qwen Image Edit to be the cheapest and fastest model that was also quite capable of most image editing tasks.

If I were to make an image editing app, this would be the model I'd choose.

https://replicate.com/blog/compare-image-editing-models

zamadatix•2h ago

I still feel varying the prompt text, number of tries, and varying strictness combined with only showing the result most liked dilute most of the value in these test. It would be better if there was one prompt 8/10 human editors understood and implemented correctly and then every model got 5 generation attempts with that exact prompt on different seeds or something. If it were about "who can create the best image with a given model" then I'd see it more, but most of it seems aimed at preventing that sort of thing and it ends up in an awkward middle zone.

E.g. Gemini 2.5 Flash is given extreme leeway with how much it edits the image and changes the style in "Girl with Pearl Earring" only to have OpenAI gpt-image-1 do a (comparatively) much better job yet still be declared failed after 8 attempts, while having been given fewer attempts than Seedream 4 (passed) and less than half the attempts of OmniGen2 (which still looks way farther off in comparison).

cttet•55m ago

A "worst image" instead of best image competition may be easy to implement and quite indicative of which one has less frustration experience.

What we talk about when we talk about sideloading

ChatGPT's Atlas: The Browser That's Anti-Web

EuroLLM: LLM made in Europe built to support all 24 official EU languages

Tinkering is a way to acquire good taste

Generative AI Image Editing Showdown

Boring is what we wanted

Tips for stroke-surviving software engineers

Keeping the Internet fast and secure: introducing Merkle Tree Certificates

Project Shadowglass

The AirPods Pro 3 flight problem

Why do some radio towers blink?

Gluing and framing a 9000-piece jigsaw

HTTPS by default

Fil-C: A memory-safe C implementation

Using AI to negotiate a $195k hospital bill down to $33k

Mapping the off-target effects of every FDA-approved drug in existence

We need a clearer framework for AI-assisted contributions to open source

Nvidia takes $1B stake in Nokia

Samsung makes ads on smart fridges official with upcoming software update

Ubiquiti SFP Wizard

The Geomys Standard of Care

Apple will phase out Rosetta 2 in macOS 28

Our LLM-controlled office robot can't pass butter

The decline of deviance

I've been loving Claude Code on the web

1X Neo – Home Robot - Pre Order

SigNoz (YC W21) Is Hiring DevRel Engineers in the US – Open Source O11y Platform

Cheese Crystals (2019)

A brief history of random numbers (2018)

Show HN: Butter – A Behavior Cache for LLMs

Generative AI Image Editing Showdown

Comments

What we talk about when we talk about sideloading

ChatGPT's Atlas: The Browser That's Anti-Web

EuroLLM: LLM made in Europe built to support all 24 official EU languages

Tinkering is a way to acquire good taste

Generative AI Image Editing Showdown

Boring is what we wanted

Tips for stroke-surviving software engineers

Keeping the Internet fast and secure: introducing Merkle Tree Certificates

Project Shadowglass

The AirPods Pro 3 flight problem

Why do some radio towers blink?

Gluing and framing a 9000-piece jigsaw

HTTPS by default

Fil-C: A memory-safe C implementation

Using AI to negotiate a $195k hospital bill down to $33k

Mapping the off-target effects of every FDA-approved drug in existence

We need a clearer framework for AI-assisted contributions to open source

Nvidia takes $1B stake in Nokia

Samsung makes ads on smart fridges official with upcoming software update

Ubiquiti SFP Wizard

The Geomys Standard of Care

Apple will phase out Rosetta 2 in macOS 28

Our LLM-controlled office robot can't pass butter

The decline of deviance

I've been loving Claude Code on the web

1X Neo – Home Robot - Pre Order

SigNoz (YC W21) Is Hiring DevRel Engineers in the US – Open Source O11y Platform

Cheese Crystals (2019)

A brief history of random numbers (2018)

Show HN: Butter – A Behavior Cache for LLMs