(Do we say we software engineered something?)
You CREATED something, and I like to think that creating things that I love and enjoy and that others can love and enjoy makes creating things worth it.
No, that simply is not true. If you actually compare the before and after you can see it still regenerates all the details on the "unchanged" aspects. Texture, lighting, sharpness, even scale its all different even if varyingly similar to the original.
Sure they're cute for casual edits but it really pains me people suggesting these things are suitable replacements for actual photo editing. Especially when it comes to people, or details outside their training data theres a lot of nuance that can be lost as it regenerates them no matter how you prompt things.
Even if you
I figured that if you write the text in Google docs and share the screenshot with banana it will not make any spelling mistake.
So, use something like "can you write my name on this Wimbledon trophy, both images are attached. Use them" will work.
That's on my list of blog-post-worthy things to test, namely text rendering to image in Python directly and passing both input images to the model for compositing.
doctorpangloss•47m ago
okay, look at imagen 4 ultra:
https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
Is Imagen thinking?
Let's compare to gemini 2.5 flash image (nano banana):
look carefully at the system prompt here: https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
compare to ideogram, with prompt rewriting: https://ideogram.ai/g/GRuZRTY7TmilGUHnks-Mjg/0
without prompt rewriting: https://ideogram.ai/g/yKV3EwULRKOu6LDCsSvZUg/2
We can do the same exercises with Flux Kontext for editing versus Flash-2.5, if you think that editing is somehow unique in this regard.
Is prompt rewriting "thinking"? My point is, this article can't answer that question without dElViNg into the nuances of what multi-modal models really are.
gryfft•37m ago
PunchTornado•28m ago