How it works under the hood:
1. A Retriever agent searches a curated database of real academic diagrams to find structurally similar references 2. A Planner agent reads your text and generates a detailed visual description (layout, components, connections, groupings) 3. A Stylist agent polishes the visual aesthetics without changing content 4. Then it enters an iterative loop: a Visualizer generates the image, and a Critic evaluates it and suggests revisions — this repeats 1-5 times (you choose)
The key insight is that academic diagrams follow conventions — Transformer architectures, GAN pipelines, RLHF frameworks all have recognizable visual patterns. By retrieving relevant references first, the output is much closer to what you'd actually put in a paper vs. generic AI image generation.
Built with: Next.js + FastAPI + Celery, using Gemini 2.5 Flash for planning/critique and Nanobanana Pro/Seedream for image generation.
Try it here: https://paperbanana.online
Some examples it handles well: Transformer architectures, GAN training pipelines, RLHF frameworks, multi-agent systems, encoder-decoder architectures.
Known limitations: - Works best for CS/AI methodology diagrams — not optimized for biology, chemistry, or general scientific illustration - Text rendering in generated images isn't perfect yet — sometimes labels get slightly garbled - The curated reference database is still small (13 examples), expanding it is ongoing work
Would love feedback from anyone who writes papers regularly. What types of diagrams do you struggle with most?