It works by AI continually observing and responding to live drawing on a canvas. A vision model (using Ollama) interprets what it sees, and that description drives real-time image generation (StreamDiffusion).
For real-time performance, this project is built in C++ and Python, leveraging the GPU for Spout-based texture sharing with minimal overhead.
Reusable components include:
- StreamDiffusionSpoutServer: lightweight Python server for real-time image generation with StreamDiffusion. Designed for interfacing with any Spout-compatible software and uses OSC for instructions.
- OllamaClient: minimal C++ library for interfacing with Ollama vision language models. Includes implementations for openFrameworks and Cinder.
The "visual autocomplete" concept has been explored in recent papers (e.g., arxiv.org/abs/2508.19254, arxiv.org/abs/2411.17673).
Hopefully, these open source components can help accelerate others experimenting and advancing this direction!