> Pico-Banana-400K serves as a versatile resource for advancing controllable and instruction-aware image editing. Beyond single-step editing, the dataset enables multi-turn, conversational editing and reward-based training paradigms.
I'm happy to see something from Apple but this seems so low-tech that it could be one of my own local ComfyUI workflows.
vunderba•2h ago
> The pipeline (bottom) shows how diverse OpenImages inputs are edited using Nano-Banana and quality-filtered by Gemini-2.5-Pro, with failed attempts automatically retried.
Pretty interesting. I run a fairly comprehensive image-comparison site for SOTA generative AI in text-to-image and editing. Managing it manually got pretty tiring, so a while back I put together a small program that takes a given starting prompt, a list of GenAI models, and a max number of retries which does something similar.
It generates and evaluates images using a separate multimodal AI, and then rewrites failed prompts automatically repeating up to a set limit.
It's not perfect (nine pointed star example in particular) - but often times the "recognition aspect of a multimodal model" is superior to its generative capabilities so you can run it in a sort of REPL until you get the desired outcome.
https://genai-showdown.specr.net/image-editing
typpilol•1h ago
Or there's another very similar site. But I'm pretty sure it's yours
lukasb•1h ago