GRAT (Grouping First, Attending Smartly) is a training-free method that massively speeds up diffusion-based Transformers—achieving up to 35.8× faster generation for ultra-high-res images (8192×8192)—by leveraging the natural sparsity in attention maps. This is impressive.
badmonster•8h ago