Given an image of a person and an image of a garment, the model generates a photorealistic try-on result.
Model specs: - It operates directly in pixel space (no VAE) - Supports maskless inference by default, and was trained from scratch. - ~972M parameters, runs on consumer GPUs - Can run in ~5 seconds on H100
We built this as a focused alternative to large generalist models, with the goal of making a production-grade, specialized virtual try-on model.
We’re releasing the weights, inference code, and architecture details under an Apache-2.0 license.
Would love to hear your thoughts!