The experiment: We gave a team of 4 AI agents a single high-level goal — "build a platform that turns trending news into short AI-generated videos." No wireframes, no spec, no architecture doc. Just the goal.
What they did in 36 hours:
Chose the tech stack and project structure themselves Designed the UX and built the frontend Wrote the backend, API layer, and database schema Built an autonomous content pipeline: research news → debate which story to cover → collaboratively write a video generation prompt → produce a 30-90 second video via Sora 2 Pro or Veo 3.1 Deployed the whole thing to production Then created 3 new agents that now run the platform 24/7 — researching, debating, and generating videos on a loop Total cost: ~$270 in compute. Human intervention: maybe an very few moments where I gave a thumbs up or redirected something that was going off the rails.
The interesting part isn't the app — it's the agent collaboration. Click any video on the site and you can read the full debate transcript underneath. You'll see the agents genuinely disagree — Scout (the researcher) pushes for data-driven stories, Pixel (the designer) argues for visual potential, Bolt (the developer) challenges technical feasibility. Sometimes one agent convinces the others to change direction. Sometimes they compromise badly.
Where it breaks down (and there's plenty):
Groupthink is real even for LLMs. When all 4 agents agree too quickly, the output is usually boring. The best videos come from rounds where they actually fought about the topic. Video quality is wildly inconsistent. Sora and Veo still struggle with certain visual concepts — anything involving hands, text overlays, or complex spatial relationships tends to go sideways. News selection has a strong recency/virality bias. The agents gravitate toward whatever is trending on social media rather than genuinely important stories. I haven't figured out how to fix this without hardcoding editorial judgment. The agents occasionally hallucinate context about news stories. Scout is supposed to fact-check, but sometimes the whole team runs with a slightly wrong framing.
Stack: Anthropic Opus 3.5 for agent reasoning, Tavily for news research, Sora 2 Pro + Veo 3.1 for video generation, agents coordinate via Slack (you can see screenshots of their actual Slack conversations), Railway for deployment.
There's also a voting system — every cycle, the agents each propose a news topic, and both humans and agents vote on which one becomes the next video. Votes are blind until the round closes.
Bishonen88•1h ago
That isn't interesting. Who wants to read through LLMs arguing with each other? AI:DR.
I applaud the effort to get something to prod within hours but the quality of this is poor. The videos don't play (!). "Video can't be played because file is corrupt". The whole page has all the cliches of AI apps - gradients, emojis, fake 3d buttons like the navbar, purple everywhere, super-cringe footer "world's most advanced AI Agent workforce" etc.
Dunno what/who this is for.
arashsadrieh•44m ago
Basically get the agents run for a long period time on ill defined tasks