I was curious which of the three major US AI labs generates images that people like more, so I built ImageDojo.ai.
It shows you two images side-by-side, both generated from the exact same prompt. You vote on which one you like more (you don't see the prompt or which model made each one).
Based on the votes, it calculates ELO ratings for the models — similar to LMSYS Arena for text.
The four models I selected (the original and the new Nano Banana, GPT-Image-1.5, and Grok-Imagine-Image) are all in the same rough price range ($0.02–$0.06 per image), so we're comparing fairly similar-class models. Please try it out and let me know what you think!
vunderba•1h ago
https://huggingface.co/spaces/ArtificialAnalysis/Text-to-Ima...
vtail•1h ago
I'm also a bit surprised they have gpt-image-1.5 so high above Nano Banana 2 - my limited testing shows that, at least for the visual styles, people like Nano Banana more.
vunderba•1h ago
For a point of reference, I run a pretty comprehensive image model comparison site heavily weighted in favor of prompt adherence.
https://genai-showdown.specr.net
EDIT: FWIW, I agree with your assessment. OpenAI's models have always been very strong in prompt adherence but visually weak (gpt-image-1 had the famous "piss filter" until they finally pushed out gpt-image-1.5)
vtail•1h ago
Did you manually review all the edit results manually yourself, or do you have some kind of automated procedure?
vunderba•56m ago
- Takes the platonic set of prompts
- Uses model specific tuning directives with LLMs to create a bunch of prompt variations so that they get a diverse set of natural language expressions to "roll" generations
But I still have to manually review each of the final image - which is pretty time-consuming. I've tried automating it using VLMs (like Qwen3-VL) but unfortunately they can miss the small details and didn't provide as much value as I was hoping.