Most workflows with AI today are single player. Inspired by the possibilities that the Convex backend (https://www.convex.dev) can help us build, I decided to lay infrastructure for testing multiplayer AI—mediated experiences starting with a fun idea: what kind of images can AI help people collaboratively produce, when multiple participants each provide different components of the resulting image? So was born Collide.
Collide offers two types of image generations: Unbounded and Cascade. During an Unbounded round, users cannot see what inputs the other users are putting into the ‘collider’ until the very end—when the AI has gathered everyone’s concepts and synthesized it into a final image.
For Cascade rounds, people can see the words, phrases and concepts other round participants are putting into the collider in realtime, enabling people to riff off and build on each other’s concepts (or not! chaos and tension is welcome exploration here).
The AI has a challenge here in producing coherence with competing and conflicting concepts, as well deciding when to aim for an entertaining output and when to focus on making it about the aesthetics. Each generated image in Collide comes with a ‘Synthesizer’s Statement’, a sort of artful chain of thought written by the AI that reveals how it decided to handle the image generation for that given round.
Users are able to provide feedback towards an image with the like and dislike buttons that are labeled ‘This is Art’ and ‘This Ain’t It, Chef’ respectively. This data is valuable towards understanding what system prompts are producing the most delight, as well helping us understand if users have a preference for either the Unbounded or Cascade mode.
Each Collide Room comes with a live chat with full room message history, and includes typing indicators and unread receipts (on mobile). The chat is also injected with system messages for activity in the room: such as when a new round is started, when users submit entries into the collider during the round, when the round ends and the generated image is ready, when a new user has joined the room, and when users like and dislike images in that specific Collide Room’s gallery of generated images.
Currently, we have a general, public Collide Room called the Atrium available for anyone to come and test out:
It is possible to create a new private or public Collide Room if you navigate to /rooms, but you’ll have to supply your own Claude and Runware API keys. During the initial prototyping stages we were using Luma’s PHOTON image model, but decided to switch to Flux Dev served via Runware, at the very least because Flux Dev is more cost effective for prototyping. Runware also helps towards future proofing the image generation capabilities as the thousands of models they provide can help towards more sustained image creation and editing workflows with their wide selection of LoRas, inpainting, outpainting models, etc. I’ll follow up more on what I mean by this at the end.
Collide currently uses Convex Auth with the Google OAuth option for account creation. It is possible to observe the happenings in a given Collide Room without having an account (anonymous user), but attempting to chat, react to an image or put something into the collider during an active round will trigger the auth flow.
Speaking of active rounds: any logged in user in a Collide Room can trigger a new image generation round, choosing whether it should be of the Unbounded or Cascade type. At least 2 active users must be present for a new round to be triggered. Clients send heartbeats every 2 seconds that their page is in focus and not tabbed away to indicate they’re active (thank you, Page Visibility API), with any heartbeats sent within a 60 second window designating that the person is available to collide.
Up to 7 users can participate in a given round. While there are no time limits, the round ends when the number of active users present when the round was initiated have submitted their entries. So if 4 people were present in the Collide Room when a new round started, 4 entries must be placed into the collider to trigger the end of the round and create the resulting image.
In terms of Convex Components, we’re using the Cloudflare R2 component for storing and serving the images, the Convex Aggregate Component to compute long running values such as the number chat messages in a given room, the number of images generated in a given room, the number of total people that have joined a room, as well as likes and dislike counts. Lastly, we’re using the Convex Rate Limiter component to prevent spam, starting with keeping a lid on how many reactions someone can trigger to show up in the room chat.
Collide is experimental, at the minimum an exploration in contemporary art under the AI genre. Everything hinges on the coherency of the outputs, or at least the intrigue of incoherency when differing concepts slam together unexpectedly.
A potential area of expansion is in allowing people to focus on one image or set of images and having an accompanying editing pipeline. This would allow teams/stakeholders to collaborate together on their creative and IP. Imagine designers being able to iterate on a product mockup where one person handles texture refinements through inpainting, another tweaks the overall composition with outpainting, and someone else applies specialized LoRA models to maintain brand consistency. This is where Runware's modularity with thousands of image models can help us provide that flexibility.
Another thing in my testing was how much fun I had with my wife and son with testing outputs. There's definitely potential here for family and friends to have fun in private rooms. Say with an image model pipeline that focuses on a specific image the user uploads: families can collaborate together on transforming vacation photos into fantasy scenes or turning a simple landscape into something out of their collective imagination. Friends might use it to visualize party concepts or just laugh together at the weird combinations they can produce.
And as always, Convex (https://www.convex.dev) made this whole thing possible. The real-time data syncing just works without the usual headaches. Storage, auth, aggregations—all these pieces that usually eat up weeks of dev time were just there, letting us focus on making the actual experience interesting instead of rebuilding infrastructure that's already been solved. The whole backend just disappears, which is exactly what you want when you're trying to experiment with new interaction models.
While we're still early, Collide sits at this unique intersection of art tool and social experience—a playground powered by Convex's infrastructure that's proven ideal for building real-time, multiplayer AI experiences. The seamless backend allows us to focus entirely on creating a collaborative environment where multiple users can generate images together, helping us test and explore what multiplayer AI interactions might look like when we move beyond the single-player workflows that dominate today's AI landscape.
handfuloflight•3h ago
Collide offers two types of image generations: Unbounded and Cascade. During an Unbounded round, users cannot see what inputs the other users are putting into the ‘collider’ until the very end—when the AI has gathered everyone’s concepts and synthesized it into a final image.
For Cascade rounds, people can see the words, phrases and concepts other round participants are putting into the collider in realtime, enabling people to riff off and build on each other’s concepts (or not! chaos and tension is welcome exploration here).
The AI has a challenge here in producing coherence with competing and conflicting concepts, as well deciding when to aim for an entertaining output and when to focus on making it about the aesthetics. Each generated image in Collide comes with a ‘Synthesizer’s Statement’, a sort of artful chain of thought written by the AI that reveals how it decided to handle the image generation for that given round.
Users are able to provide feedback towards an image with the like and dislike buttons that are labeled ‘This is Art’ and ‘This Ain’t It, Chef’ respectively. This data is valuable towards understanding what system prompts are producing the most delight, as well helping us understand if users have a preference for either the Unbounded or Cascade mode.
Each Collide Room comes with a live chat with full room message history, and includes typing indicators and unread receipts (on mobile). The chat is also injected with system messages for activity in the room: such as when a new round is started, when users submit entries into the collider during the round, when the round ends and the generated image is ready, when a new user has joined the room, and when users like and dislike images in that specific Collide Room’s gallery of generated images.
Currently, we have a general, public Collide Room called the Atrium available for anyone to come and test out:
https://collide.multiplicity.studio/collide/atrium
It is possible to create a new private or public Collide Room if you navigate to /rooms, but you’ll have to supply your own Claude and Runware API keys. During the initial prototyping stages we were using Luma’s PHOTON image model, but decided to switch to Flux Dev served via Runware, at the very least because Flux Dev is more cost effective for prototyping. Runware also helps towards future proofing the image generation capabilities as the thousands of models they provide can help towards more sustained image creation and editing workflows with their wide selection of LoRas, inpainting, outpainting models, etc. I’ll follow up more on what I mean by this at the end.
Collide currently uses Convex Auth with the Google OAuth option for account creation. It is possible to observe the happenings in a given Collide Room without having an account (anonymous user), but attempting to chat, react to an image or put something into the collider during an active round will trigger the auth flow.
Speaking of active rounds: any logged in user in a Collide Room can trigger a new image generation round, choosing whether it should be of the Unbounded or Cascade type. At least 2 active users must be present for a new round to be triggered. Clients send heartbeats every 2 seconds that their page is in focus and not tabbed away to indicate they’re active (thank you, Page Visibility API), with any heartbeats sent within a 60 second window designating that the person is available to collide.
Up to 7 users can participate in a given round. While there are no time limits, the round ends when the number of active users present when the round was initiated have submitted their entries. So if 4 people were present in the Collide Room when a new round started, 4 entries must be placed into the collider to trigger the end of the round and create the resulting image.
In terms of Convex Components, we’re using the Cloudflare R2 component for storing and serving the images, the Convex Aggregate Component to compute long running values such as the number chat messages in a given room, the number of images generated in a given room, the number of total people that have joined a room, as well as likes and dislike counts. Lastly, we’re using the Convex Rate Limiter component to prevent spam, starting with keeping a lid on how many reactions someone can trigger to show up in the room chat.
Collide is experimental, at the minimum an exploration in contemporary art under the AI genre. Everything hinges on the coherency of the outputs, or at least the intrigue of incoherency when differing concepts slam together unexpectedly.
A potential area of expansion is in allowing people to focus on one image or set of images and having an accompanying editing pipeline. This would allow teams/stakeholders to collaborate together on their creative and IP. Imagine designers being able to iterate on a product mockup where one person handles texture refinements through inpainting, another tweaks the overall composition with outpainting, and someone else applies specialized LoRA models to maintain brand consistency. This is where Runware's modularity with thousands of image models can help us provide that flexibility.
Another thing in my testing was how much fun I had with my wife and son with testing outputs. There's definitely potential here for family and friends to have fun in private rooms. Say with an image model pipeline that focuses on a specific image the user uploads: families can collaborate together on transforming vacation photos into fantasy scenes or turning a simple landscape into something out of their collective imagination. Friends might use it to visualize party concepts or just laugh together at the weird combinations they can produce.
And as always, Convex (https://www.convex.dev) made this whole thing possible. The real-time data syncing just works without the usual headaches. Storage, auth, aggregations—all these pieces that usually eat up weeks of dev time were just there, letting us focus on making the actual experience interesting instead of rebuilding infrastructure that's already been solved. The whole backend just disappears, which is exactly what you want when you're trying to experiment with new interaction models.
While we're still early, Collide sits at this unique intersection of art tool and social experience—a playground powered by Convex's infrastructure that's proven ideal for building real-time, multiplayer AI experiences. The seamless backend allows us to focus entirely on creating a collaborative environment where multiple users can generate images together, helping us test and explore what multiplayer AI interactions might look like when we move beyond the single-player workflows that dominate today's AI landscape.
This project was inspired by and is an entry for the Convex Chef Hackathon: https://www.convex.dev/hackathons/chef