Hi, I'm Matthew, a freshman from CMU, and I built Ephemeral in 24 hours for TartanHacks 2026.
In short, Ephemeral takes a text prompt, generates an image with Nano Banana and uses a 1.3B parameter action-conditioned DiT to generate the next frames in realtime based on user actions (e.g. WASD).
Some other features - Reverse engineered Suno Client to generate music based off text prompt. - Multiple users can interact with a "world" at the same time simply by scanning a QR code from their phone. They can then perform actions and see how their worlds evolve in parallel with everyone elses. GPU infrastructure powered by Modal. - Claude auto generates captions for the world