Show HN: Imagedojo.ai – Blind arena for Google, OpenAI, and xAI image generators

1•vtail•1h ago

Hi HN,

I was curious which of the three major US AI labs generates images that people like more, so I built ImageDojo.ai.

It shows you two images side-by-side, both generated from the exact same prompt. You vote on which one you like more (you don't see the prompt or which model made each one).

Based on the votes, it calculates ELO ratings for the models — similar to LMSYS Arena for text.

The four models I selected (the original and the new Nano Banana, GPT-Image-1.5, and Grok-Imagine-Image) are all in the same rough price range ($0.02–$0.06 per image), so we're comparing fairly similar-class models. Please try it out and let me know what you think!

Comments

vunderba•1h ago

For reference, have you seen the Artificial Analysis Image Arena Leaderboard? They also show you two images from anonymized models (shows after you vote), and calculates crowdsourced ELO ratings.

https://huggingface.co/spaces/ArtificialAnalysis/Text-to-Ima...

vtail•1h ago

Thanks - and no, I haven't seen this one. I like how they have the edit mode dashboard - show the original image + two edits; I was thinking about doing something like this.

I'm also a bit surprised they have gpt-image-1.5 so high above Nano Banana 2 - my limited testing shows that, at least for the visual styles, people like Nano Banana more.

vunderba•1h ago

Yeah I think that it's part of the issue with a single "squashed" comparative metric. Some users are going to grade higher based on the overall visual fidelity and others are going to value following the prompt.

For a point of reference, I run a pretty comprehensive image model comparison site heavily weighted in favor of prompt adherence.

https://genai-showdown.specr.net

EDIT: FWIW, I agree with your assessment. OpenAI's models have always been very strong in prompt adherence but visually weak (gpt-image-1 had the famous "piss filter" until they finally pushed out gpt-image-1.5)

vtail•1h ago

Very cool site - I think I saw it before here on HN, and I liked it a lot.

Did you manually review all the edit results manually yourself, or do you have some kind of automated procedure?

vunderba•56m ago

Thanks. So I have a bespoke python program that basically does this:

- Takes the platonic set of prompts

- Uses model specific tuning directives with LLMs to create a bunch of prompt variations so that they get a diverse set of natural language expressions to "roll" generations

But I still have to manually review each of the final image - which is pretty time-consuming. I've tried automating it using VLMs (like Qwen3-VL) but unfortunately they can miss the small details and didn't provide as much value as I was hoping.

Perceptions of Crime and Disorder

Degoogling

ContactTool – Chrome extension to track job applications across sites. KISS

Show HN: Vaultara – Daily AI News Intelligence Reports

Claude Used in Iran Strikes

Residents living permanently in Japan's cyber-cafés [video]

Show HN: Social proof works 2-7x better on AI shopping agents than humans

How the Government Deceived Congress in the Debate over Surveillance Powers (2013)

Show HN: Reflex – local code search engine and MCP server for AI coding

Bind 2 Port 0

Poll: AI Winter

Show HN: AI Sees Me – CLIP running in the browser

SaaS in, SaaS out: Here's what's driving the SaaSpocalypse

Dbslice: Extract a slice of your production database to reproduce bugs

Show HN: Updater – one command for macOS app updates

PEP 747 – Annotating Type Forms – peps.python.org

Show HN: AfterLive – Preserve a Loved One's Voice and Personality with AI

Samsung Galaxy S26 Ultra Privacy Display Testing

Securing AI Model Weights

The information space around military AI is being weaponized against us

Show HN: ContractPulse – Free intelligence on federal government contracts

Sam Altman AMA on DoD Collaboration

"All Lawful Use": More Than You Wanted to Know

Show HN: Agentic Gatekeeper – Auto-patch your code to enforce Markdown rules

Show HN: Deploybase – Compare GPU and LLM pricing across all major providers

TPM-Sniffing LUKS Keys on an Embedded Linux Device [CVE-2026-0714]

Palantir Sues Swiss Magazine for Accurate Report

3D dashboard to monitor and control your AI coding agents in real-time

$10M factory in 600sqft room

The Zero-Server Code Intelligence Engine