I built a web interface for the GLM-Image architecture (by Zhipu AI). Unlike standard diffusion models, it uses a hybrid approach (Autoregressive Transformer + Diffusion) which makes it surprisingly good at rendering coherent text and following complex spatial instructions.
Running it Locally: Since this is an open-source model, you can run it yourself if you have the hardware. It requires about 24GB+ VRAM (e.g., RTX 3090/4090 or A100). I wrote a detailed guide on how to set it up here: https://glmimage.online/docs/how-to-install-glm-image
The Cloud Version: For those without a GPU cluster, I wrapped it into this SaaS so you can try it instantly in the browser. It supports native bilingual prompts (English/Chinese).
I'd love to hear your thoughts on how the typography quality compares to Flux or Ideogram!