(Do we say we software engineered something?)
You CREATED something, and I like to think that creating things that I love and enjoy and that others can love and enjoy makes creating things worth it.
No, that simply is not true. If you actually compare the before and after you can see it still regenerates all the details on the "unchanged" aspects. Texture, lighting, sharpness, even scale its all different even if varyingly similar to the original.
Sure they're cute for casual edits but it really pains me people suggesting these things are suitable replacements for actual photo editing. Especially when it comes to people, or details outside their training data theres a lot of nuance that can be lost as it regenerates them no matter how you prompt things.
Even if you
I figured that if you write the text in Google docs and share the screenshot with banana it will not make any spelling mistake.
So, use something like "can you write my name on this Wimbledon trophy, both images are attached. Use them" will work.
That's on my list of blog-post-worthy things to test, namely text rendering to image in Python directly and passing both input images to the model for compositing.
>Nano Banana is terrible at style transfer even with prompt engineering shenanigans
My context: I'm kind of fixated on visualizing my neighborhood as it would have appeared in the 18th century. I've been doing it in Sketchup, and then in Twinmotion, but neither of those produce "photorealistic" images... Twinmotion can get pretty close with a lot of work, but that's easier with modern architecture than it is with the more hand-made, brick-by-brick structures I'm modeling out.
As different AI image generators have emerged, I've tried them all in an effort to add the proverbial rough edges to snapshots of the models I've created, and it was not until Nano Banana that I ever saw anything even remotely workable.
Nano Banana manages to maintain the geometry of the scene, while applying new styles to it. Sometimes I do this with my Twinmotion renders, but what's really been cool to see is how well it takes a drawing, or engraving, or watercolor - and with as simple a prompt as "make this into a photo" it generates phenomenal results.
Similarly to the Paladin/Starbucks/Pirate example in the link though, I find that sometimes I need to misdirect a little bit, because if I'm peppering the prompt with details about the 18th century, I sometimes get a painterly image back. Instead, I'll tell it I want it to look like a photograph of a well preserved historic neighborhood, or a scene from a period film set in the 18th century.
As fantastic as the results can be, I'm not abandoning my manual modeling of these buildings and scenes. However, Nano Banana's interpretation of contemporary illustrations has helped me reshape how I think about some of the assumptions I made in my own models.
I added a CLI to it (using Gemini CLI) and submitted a PR, you can run that like so:
GEMINI_API_KEY="..." \
uv run --with https://github.com/minimaxir/gemimg/archive/d6b9d5bbefa1e2ffc3b09086bc0a3ad70ca4ef22.zip \
python -m gemimg "a racoon holding a hand written sign that says I love trash"
Result in this comment: https://github.com/minimaxir/gemimg/pull/7#issuecomment-3529...- make massive, seemingly random edits to images - adjust image scale - make very small case but pervasive detail changes obvious in an image diff
For instance, I have found that nano-banana will sporadically add a (convincing) fireplace to a room or new garage behind a house. This happens even with explicit "ALL CAPS" instructions not to do so. This happens sporadically, even when the temperature is set to zero, and makes it impossible to build a reliable app.
Has anyone had a better experience?
doctorpangloss•1h ago
okay, look at imagen 4 ultra:
https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
Is Imagen thinking?
Let's compare to gemini 2.5 flash image (nano banana):
look carefully at the system prompt here: https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
compare to ideogram, with prompt rewriting: https://ideogram.ai/g/GRuZRTY7TmilGUHnks-Mjg/0
without prompt rewriting: https://ideogram.ai/g/yKV3EwULRKOu6LDCsSvZUg/2
We can do the same exercises with Flux Kontext for editing versus Flash-2.5, if you think that editing is somehow unique in this regard.
Is prompt rewriting "thinking"? My point is, this article can't answer that question without dElViNg into the nuances of what multi-modal models really are.
gryfft•51m ago
PunchTornado•41m ago