frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Reelsy – Multi-Agent AI System for Short Video Creation

https://reelsy.ai/
1•smallmartial•2d ago

Comments

smallmartial•2d ago
Hey HN,

We've been working on Reelsy for the past few months and wanted to share what we've learned building a multi-agent AI system for video content creation.

The Problem

Creating short-form video content (YouTube Shorts, TikTok, Reels) at scale is brutal. A single 60-second video costs $500+ with freelancers and takes 3-5 hours. For creators who need to post daily, this is unsustainable.

The bigger technical challenge: AI-generated images have a consistency problem. Ask any image model to generate "the same character" across 15 scenes, and you'll get 15 different-looking people.

Our Approach: Multi-Agent Architecture

Instead of throwing one LLM at the problem, we built a 5-agent pipeline inspired by actual film production:

Director Agent - Breaks down the story concept into a shot list Scriptwriter Agent - Writes dialogue and narration for each scene Character Designer Agent - Creates reference images and locks character identity Cinematographer Agent - Determines camera angles, lighting, composition Hook Generator Agent - Optimizes the first 3 seconds for each platform The agents communicate through structured outputs and can iterate on each other's work. We found this beats a single mega-prompt approach by ~40% on our internal quality benchmarks.

Character Consistency Solution

We use Gemini 2.5 Flash ("Nano Banana" on LMArena) with reference image anchoring. The Character Designer creates a canonical reference image, then every subsequent scene generation includes this reference with specific instructions to maintain identity.

Current results: 85%+ consistency across 15-20 scene generations. Not perfect, but usable for most content types.

Multi-agent coordination is harder than it looks. Race conditions, agent disagreements, and context window limits are real problems.

Character consistency is still the hardest unsolved problem in AI video. Reference anchoring helps but isn't bulletproof.

Platform-specific optimization matters more than raw quality. A slightly lower quality video with proper hooks outperforms beautiful content with weak openings.

Current Pricing

~$0.70 per video (15-20 scenes, voiceover, music). We're not trying to undercut human editors on quality, but for high-volume content needs, this makes previously impossible workflows viable.

Would love feedback from the HN community, especially on the multi-agent architecture. Happy to answer questions about our implementation choices.