Unusual Capabilities of Nano Banana (Examples)

https://github.com/PicoTrex/Awesome-Nano-Banana-images/blob/main/README_en.md

85•SweetSoftPillow•1h ago

Comments

Animats•1h ago

I have two friends who are excellent professional graphic artists and I hesitate to send them this.

SweetSoftPillow•1h ago

They better learn it today than tomorrow. Even though it's might be painful for some who does not like to learn new tools and explore new horizons.

mitthrowaway2•1h ago

Maybe they're better off switching careers? At some point, your customers aren't going to pay you very much to do something that they've become able to do themselves.

There used to be a job people would do, where they'd go around in the morning and wake people up so they could get to work on time. They were called a "knocker-up". When the alarm clock was invented, these people lose their jobs to other knockers-up with alarm clocks, they lost their jobs to alarm clocks.

non_aligned•54m ago

A lot of technological progress is about moving in the other direction: taking things you can do yourself and having others do it instead.

You can paint your own walls or fix your own plumbing, but people pay others instead. You can cook your food, but you order take-out. It's not hard to sew your own clothes, but...

So no, I don't think it's as simple as that. A lot of people will not want the mental burden of learning a new tool and will have no problem paying someone else to do it. The main thing is that the price structure will change. You won't be able to charge $1,000 for a project that takes you a couple of days. Instead, you will need to charge $20 for stuff you can crank out in 20 minutes with gen AI.

GMoromisato•38m ago

I agree with this. And it's not just about saving time/effort--an artist with an AI tool will always create better images than an amateur, just as an artist with a camera will always produce a better picture than me.

That said, I'm pretty sure the market for professional photographers shrank after the digital camera revolution.

AstroBen•1h ago

I don't know if "learning this tool" is gunna help..

darkamaul•1h ago

This is amazing. Not that long ago, even getting a model to reliably output the same character multiple times was a real challenge. Now we’re seeing this level of composition and consistency. The pace of progress in generative models is wild.

Huge thanks to the author (and the many contributors) as well for gathering so many examples; it’s incredibly useful to see them to better understand the possibilities of the tool.

xnx•1h ago

Amazing model. The only limit is your imagination, and it's only $0.04/image.

Since the page doesn't mention it, this is the Google Gemini Image Generation model: https://ai.google.dev/gemini-api/docs/image-generation

Good collection of examples. Really weird to choose an inappropriate for work one as the second example.

minimaxir•55m ago

[misread]

pdpi•50m ago

I assume OP means the actual post.

The second example under "Case 1: Illustration to Figure" is a panty shot.

vunderba•49m ago

They're referring to Case 1 Illustration to Figure, the anime figurine dressed in a maid outfit in the HN post.

moralestapia•1h ago

Wow, just amazing.

Is this model open? Open weights at least? Can you use it commercially?

SweetSoftPillow•1h ago

This is a Google's Gemini flash 2.5 model with native image output capability. It's fast, relatively cheap and SOTA-quality, and available via API. I think getting this kind of quality in open source models will need some time, probably first from Chinese models and then from BlackForestLabs or Google's open source (Gemma) team.

minimaxir•1h ago

Flux Kontext has similar quality, is open weight, and the outputs can be used commercially, however prompt adherence is good-but-not-as-good.

vunderba•51m ago

Outside of Google Deepmind open sourcing the code and weights of AlphaFold, I don't think they've released any of their GenAI stuff (Imagen, Gemini, Flash 2.5, etc).

The best multimodal models that you can run locally right now are probably Qwen-Edit 20b, and Kontext.Dev.

https://qwenlm.github.io/blog/qwen-image-edit

https://bfl.ai/blog/flux-1-kontext-dev

SweetSoftPillow•44m ago

Google also open sources Gemma LLMs and embedding models, which are quite good at the time of release (SOTA or near-SOTA in the open source field).

vunderba•41m ago

Oh very nice I wasn't aware of that [1] [2]. Adding the links as well.

[1] https://deepmind.google/models/gemma

[2] https://huggingface.co/google/gemma-7b [2]

minimaxir•1h ago

I recently released a Python package for easily generating images with Nano Banana: https://github.com/minimaxir/gemimg

Through that testing, there is one prompt engineering trend that was consistent but controversial: both a) LLM-style prompt engineering with with Markdown-formated lists and b) old-school AI image style quality syntatic sugar such as award-winning and DSLR camera are both extremely effective with Gemini 2.5 Flash Image, due to its text encoder and larger training dataset which can now more accurately discriminate which specific image traits are present in an award-winning image and what traits aren't. I've tried generations both with and without those tricks and the tricks definitely have an impact. Google's developer documentation encourages the latter.

However, taking advantage of the 32k context window (compared to 512 for most other models) can make things interesting. It’s possible to render HTML as an image (https://github.com/minimaxir/gemimg/blob/main/docs/notebooks...) and providing highly nuanced JSON can allow for consistent generations. (https://github.com/minimaxir/gemimg/blob/main/docs/notebooks...)

vunderba•1h ago

Nano-Banana can produce some astonishing results. I maintain a comparison website for state-of-the-art image models with a very high focus on adherence across a wide variety of text-to-image prompts.

I recently finished putting together an Editing Comparison Showdown counterpart where the focus is still adherence but testing the ability to make localized edits of existing images using pure text prompts. It's currently comparing 6 multimodal models including Nano-Banana, Kontext Max, Qwen 20b, etc.

https://genai-showdown.specr.net/image-editing

Gemini Flash 2.5 leads with a score of 7 out of 12, but Kontext comes in at 5 out of 12 which is especially surprising considering you can run the Dev model of it locally.

destel•59m ago

Some examples are mind blowing. It’s interesting if it can generate web/app designs

AstroBen•34m ago

I just tried it for an app I'm working on.. very bad results

mitthrowaway2•59m ago

I've come to realize that I liked believing that there was something special about the human mental ability to use our mind's eye and visual imagination to picture something, such as how we would look with a different hairstyle. It's uncomfortable seeing that skill reproduced by machinery at the same level as my own imagination, or even better. It makes me feel like my ability to use my imagination is no more remarkable than my ability to hold a coat off the ground like a coat hook would.

m3kw9•54m ago

To be fair, the model's ability came from us generating the training data.

quantummagic•11m ago

To be fair, we're the beneficiaries of nature generating the data we trained on ourselves. Our ability came from being exposed to training in school, and in the world, and from examples from all of human history. Ie. if you locked a child in a dark room for their entire lives, and gave them no education or social interaction, they wouldn't have a very impressive imagination or artistic ability either.

We're reliant on training data too.

FuckButtons•51m ago

I have aphantasia, I’m glad we’re all on a level playing field now.

Revisional_Sin•35m ago

Aphantasia gang!

yoz-y•27m ago

I always thought I had a vivid imagination. But then the aphantasia was mentioned in Hello Internet once, I looked it up, see comments like these and honestly…

I’ve no idea how to even check. According to various tests I believe I have aphantasia. But mostly I’ve got not even a slightest idea on how not having it is supposed to work. I guess this is one of those mysteries when a missing sense cannot be described in any manner.

jmcphers•19m ago

A simple test for aphantasia that I gave my kids when they asked about it is to picture an apple with three blue dots on it. Once you have it, describe where the dots are on the apple.

Without aphantasia, it should be easy to "see" where the dots are since your mind has placed them on the apple somewhere already. Maybe they're in a line, or arranged in a triangle, across the middle or at the top.

Sohcahtoa82•7m ago

After reading your first sentence, I immediately saw an apple with three dots in a triangle pointing downwards on the side. Interestingly, the 3 dots in my image were flat, as if merely superimposed on an image of an apple, rather than actually being on an apple.

How do people with aphantasia answer the question?

micromacrofoot•24m ago

it can only do this because it's been trained on millions of human works

m3kw9•53m ago

The ability to pretty accurately keep the same image from an input is a clear sign of it's improved abilities.

frfl•53m ago

While these are incredibly good, it's sad to think about the unfathomable amount of abuse, spam, disinformation, manipulation and who know what other negatives these advancement are gonna cause. It was one thing when you could spot an AI image, but now and moving forward it's be basically increasingly futile to even try.

Almost all "human" interaction online will be subject to doubt soon enough.

Hard to be cheerful when technology will be a net negative overall even if it benefits some.

signatoremo•32m ago

By your logic email is clearly a net negative, given how much junk it generates - spam, phishing, hate mails, etc. Most of my emails at this point are spams.

frfl•23m ago

If we're talking objectively, yeah by definition if it's a net negative, it's a net negative. But we can both agree in absolute terms the negatives of email are manageable.

Hopefully you understand the sentiment of my original message, without getting into the semantics. AI advancement, like email when it arrived, are gonna turbocharge the negatives. Difference is in the magnitude of the problem. We're dealing with whole different scale we have never seen before.

Re: Most of my emails at this point are spams. - 99% of my emails are not spam. Yet AI spam is everywhere else I look online.

AstroBen•42m ago

The #1 most frustrating part of image models to me has always been their inability to keep the relevant details. Ask to change a hairstyle and you'd get a subtly different person

..guess that's solved now.. overnight. Mindblowing

eig•39m ago

While I think most of the examples are incredible...

...the technical graphics (especially text) is generally wrong. Case 16 is an annotated heart and the anatomy is nonsensical. Case 28 with the tallest buildings has the decent images, but has the wrong names, locations, and years.

SweetSoftPillow•35m ago

Yes, it's Gemini Flash model, meaning it's fast and relatively small and cheap, optimized for performance rather than quality. I would not expect mind-blowing capabilities in fine details from this class of models, but still, even in this regard this model sometimes just surprisingly good.

vunderba•29m ago

Yeah I think some of them are really more proof of concept than anything.

Case 8 Substitute for ControlNet

The two characters in the final image are VERY obviously not in the instructed set of poses.

ChrisArchitect•22m ago

sigh

so many little details off when the instructions are clear and/or the details are there. Brad Pitt jeans? The result are not the same style and missing clear details which should be expected to just translate over.

Another one where the prompt ended with output in a 16:9 ratio. The image isn't in that ratio.

The results are visually something but then still need so much review. Can't trust the model. Can't trust people lazily using it. Someone mentioned something about 'net negative'.

istjohn•5m ago

Yes, almost all of the examples are off in one way or another. The viewpoints don't actually match the arrow directions, for example. And if you actually use the model, you will see that even these examples must be cherry-picked.

flysonic10•19m ago

I added some of these examples into my Nanna Banana image generator: https://nannabanana.ai

istjohn•14m ago

Personally, I'm underwhelmed by this model. I feel like these examples are cherry-picked. Here are some fails I've had:

- Given a face shot in direct sunlight with severe shadows, it would not remove the shadows

- Given an old black and white photo, it would not render the image in vibrant color as if taken with a modern DSLR camera. It will colorize the photo, but only with washed out, tinted colors

- When trying to reproduce the 3 x 3 grid of hair styles, it repeatedly created a 2x3 grid. Finally, it made a 3x3 grid, but one of the nine models was black instead of caucasian.

- It is unable to integrate real images into fabricated imagery. For example, when given an image of a tutu and asked to create an image of a dolphin flying over clouds wearing the tutu, the result looks like a crude photoshop snip and copy/paste job.

plomme•9m ago

This is the first time I really don't understand how people are getting good results. On https://aistudio.google.com with Nano Banana selected (gemini-2.5-flash-image-preview) I get - garbage - results. I'll upload a character reference photo and a scene and ask Gemini to place the character in the scene. What it then does is to simply cut and paste the character into the scene, even if they are completely different in style, colours, etc.

I get far better results using ChatGPT for example. Of course, the character seldom looks anything like the reference, but it looks better than what I could do in paint in two minutes.

Am I using the wrong model, somehow??

epolanski•6m ago

I understand the results are non deterministic but I get absolute garbage too.

Uploaded pics of my (32 years old) wife and we wanted to ask it to give her a fringe/bangs to see how would she look like it either refused "because of safety" and when it complied results were horrible, it was a different person.

The Bill Gates Interview (1994)

Fear and Loathing in America by Hunter S. Thompson

Active Heat Pump Group Buys / Pre-Negotiated Offers (US, Canada, International)

Drug Interaction Checker

SHOW HN: I develop the best Website, Mobile(Android/iOS), AI/ML and many more

Emacs Bankruptcy

Acoustic emissions dose and blood-brain barrier opening with focused ultrasound

Thoughts on How to Disagree

OpenAI's Sam Altman sees a future with a collective 'superintelligence'

Our Diet Is Our Destiny

Statistical ­ Nonsignificance in Empirical Economics [pdf]

PFM-1 Land Mine

Dissecting Batching Effects in GPT Inference

Strawberries in Winter

In Praise of Passivity [pdf]

Show HN: Pbar.io – Distributed progress bars that work in terminals and browsers

SEC Moves to Dismiss Case Against Former Nikola CEO Milton

Show HN: Fast Tor Onion Service vanity address generator

Duke, Michigan.gov, CA.gov Compromised in Large-Scale SEO Attack

The ClickFix Attack That Wasn't: From a Fake AnyDesk Installer to MetaStealer

Architecture by Fashion, Not Fundamentals

I Finished Version v3.0.5

Patela: A basement full of amnesic servers

OpenAI and Microsoft agree key terms in contract renegotiation

Prompt Injection: Some sloppy cheaters who left their evidence all over ArXiv

38C3 – BlinkenCity: Radio-Controlling Street Lamps and Power Plants

Book review: The road to paradox

Free Chrome extension for converting SEC filings to PDFs

Show HN: Masters Tool-Aggregate Practical Toolbox

KVM Forum 2025