I’m the creator of nanobananaapi.dev. I built this because I was frustrated with how most image generation APIs handle text—it’s often garbled or contextually disconnected from the rest of the image, especially when dealing with multilingual layouts like Chinese and English mixed together.
Technical Highlights:
Infrastructure: The entire API is built on Cloudflare Workers. I'm using waitUntil to handle asynchronous tasks like telemetry and image post-processing without blocking the initial response, which keeps the TTFB (Time to First Byte) low even under load.
Text Precision: Instead of just relying on the base diffusion model, I’ve implemented a custom pipeline that optimizes the attention maps specifically for text-heavy areas. This ensures that the generated text remains legible and sharp, even in 4K outputs.
Consistency: For developers building brand-centric apps, I added support for up to 14 reference images. This uses a weighted fusion approach to maintain character or product consistency across multiple generations.
The "Why": There are many wrappers out there, but my goal was to provide a "developer-first" experience: no complex tiered subscriptions, just a simple pay-as-you-go REST API that integrates into a modern Next.js or Go stack in minutes.
Current Limitations: It’s not perfect. Highly complex cursive fonts still struggle occasionally, and I’m currently working on improving the outpainting latency.
I’d love to hear your thoughts on the API design or the output quality. I'll be around all day to answer any technical questions!