frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I built a small Sora-style video generator as a side experiment

https://saro2.ai
2•kelly99•1h ago

Comments

kelly99•1h ago
Hey HN,

I’ve been spending the past month diving into AI video generation — not just using models, but trying to understand the actual constraints behind them. After prototyping a small Sora-style generator on my own, I started to notice a few deeper patterns about the industry that I wanted to share and get feedback on.

1. AI video tools aren’t limited by “models”

Most of the friction today isn’t about model quality:

region-locked access

invite-only rollouts

heavy watermarking

friction in basic usage

short duration limits

no multi-scene support

pricing opaque or unsuitable for small creators

The technology is improving fast — but the accessibility layer hasn’t caught up.

This is why the majority of creators (especially small merchants, indie filmmakers, TikTok sellers, UGC creators) still can’t practically adopt AI video at scale.

2. Multi-scene generation is the “real moat”

Most models can do a single beautiful 2-4 second shot.

But real use cases — ads, storytelling, product demos — need:

shot transitions

visual consistency

character identity retention

stable camera paths

narrative structure

The real challenge is not “make a clip”, but “make a sequence”.

That’s where pipelines, not models, matter.

3. The real bottleneck is temporal coherence

From my experiments, the hardest problems aren’t fancy effects — they’re the boring ones:

slight drift in character identity

physics mismatch between shots

exposure shifts

motion jitter at boundaries

model choosing different “interpretations” each time

There’s no perfect solution yet. Some combination of:

prompt redistribution

style anchors

conditioning

intermediate frames

shot graphs

works “okay”,but there’s huge open research space.

4. Small creators care less about model elegance — more about “does it work for my product?”

This surprised me.

I talked to some merchants and small creators. What they wanted wasn’t:

“best model”

“highest fidelity”

“latest architecture”

They asked for:

no watermark

9:16 format

product-handheld shots

consistent 20–25s video

don’t make me wait

just give me something I can post today

It’s a very different set of priorities than what model researchers focus on.

5. The infra is the unsung hero

Most public discussions focus on models, but from building my prototype I realized:

async queues

model switching

fallback logic

caching policies

GPU scheduling

latency constraints

matter far more for practical AI video creation than architecture diagrams.

Without good infra, even the best models feel unusable.

A prototype I built while exploring these ideas

As a way to understand these bottlenecks more concretely, I built a small prototype called Saro2.ai — basically an experiment in:

10s cinematic clip generation

25s multi-scene “storyboard” generation

attempts at shot consistency

simple scene → shot graph

a multi-model backend with light scheduling

It requires login (to control compute use), but I’m mainly sharing it as an example of the things I’m testing, not trying to “launch a product”.

Here’s the link if anyone wants to see how it behaves: https://saro2.ai/

What I’m hoping to learn

If you’ve worked on:

temporal modeling

multi-scene pipelines

conditioning

generative video infra

shot consistency strategies

I’d love to hear your perspective.

Especially curious about:

what people think the real frontier is

what “must solve” engineering problems exist before AI video is truly usable

whether multi-scene consistency is solvable with heuristics or requires new architectures

Happy to share more details about the pipeline or what didn’t work.

Thanks for reading — and I’d appreciate any thoughts from people working in (or following) this space.

Meta's v19.2 Firmware Update

https://lifehacker.com/tech/meta-releasing-smart-glasses-update-better-video-and-garmin-integrati...
1•I_Nidhi•7m ago•0 comments

Scaling List: fast growing AI-native startups to join

https://scalinglist.com/
1•jkw•8m ago•0 comments

Notes on the Troubleshooting and Repair of Television Sets

https://www.repairfaq.org/sam/tvfaq.htm
1•exvi•9m ago•0 comments

Discovering new solutions to century-old problems in fluid dynamics

https://deepmind.google/blog/discovering-new-solutions-to-century-old-problems-in-fluid-dynamics/
1•danielmorozoff•11m ago•0 comments

OpenAI's new LLM exposes the secrets of how AI works

https://www.technologyreview.com/2025/11/13/1127914/openais-new-llm-exposes-the-secrets-of-how-ai...
1•jonbaer•16m ago•0 comments

OpenSky M-02

https://en.wikipedia.org/wiki/OpenSky_M-02
1•thunderbong•18m ago•0 comments

US announces 'Southern Spear' mission as forces deploy to South America

https://www.aljazeera.com/news/2025/11/14/us-announces-southern-spear-mission-as-forces-deploy-to...
2•Tadpole9181•22m ago•2 comments

Ask HN: Is building for the web even worth it now?

23•spaceman_2020•28m ago•15 comments

Goodbye, Penny: What the End of the Coin's Production Means for Businesses

https://time.com/7333864/penny-production-circulation-businesses-change-cost/
1•danboarder•31m ago•0 comments

Show HN: Pegma, the free and open-source version of the classic Peg solitaire

https://pegma.vercel.app
9•GlebShalimov•32m ago•4 comments

Speed Dating Is a Gym for Romantic Skills

https://psychotechnology.substack.com/p/speed-dating-is-a-gym-for-romantic
1•eatitraw•33m ago•0 comments

Servy 3.4 released, Windows tool to turn any app into a native Windows service

https://github.com/aelassas/servy
1•aelassas•35m ago•0 comments

Suggest Questions for Metaculus/ACX Forecasting Contest

https://www.astralcodexten.com/p/suggest-questions-for-metaculusacx
1•feross•37m ago•0 comments

Continuous, Validated Cough Monitoring

https://www.hyfe.com/
1•zeristor•41m ago•2 comments

Why Are Seattle Drivers Paying More for Gas?

https://oilprice.com/Energy/Energy-General/Why-Are-Seattle-Drivers-Paying-So-Much-More-for-Gas.html
1•PaulHoule•50m ago•0 comments

Ubisoft Delays Earnings and Requests Share Trading Halt

https://www.bloomberg.com/news/articles/2025-11-13/ubisoft-delays-earnings-and-requests-share-tra...
3•cimnine•51m ago•1 comments

Sam Altman and husband fund startup to edit babies' genes

https://www.thetimes.com/us/news-today/article/sam-altman-open-ai-husband-genetically-engineered-...
2•salkahfi•53m ago•0 comments

Show HN: Built my tool for learn English

https://www.neural-grammar.com/
1•baursha•54m ago•0 comments

Peekpoke: Tiny retro fantasy console with two commands peek and poke

https://github.com/abagames/peekpoke
1•woolion•55m ago•0 comments

Bubble and Build: The 2025 Mad (Machine Learning, AI and Data) Landscape

https://www.mattturck.com/mad2025
1•teleforce•56m ago•0 comments

Pleiades star cluster revealed as just one part of a stellar family

https://phys.org/news/2025-11-pleiades-star-cluster-revealed-vast.html
1•divbzero•56m ago•0 comments

Scheduling in LLM Inference

https://fergusfinn.com/blog/scheduling-in-inference-engines/
1•somnial•59m ago•0 comments

Photoroom T2i Open Model

https://huggingface.co/blog/Photoroom/prx-open-source-t2i-model
1•pilooch•1h ago•0 comments

How Markets could topple the global economy

https://www.economist.com/leaders/2025/11/13/how-markets-could-topple-the-global-economy
2•petethomas•1h ago•1 comments

Image Cash Letter – The Federal Reserve Banks image format for exchanging checks [pdf]

https://www.frbservices.org/binaries/content/assets/crsocms/financial-services/check/setup/frb-x9...
1•philippb•1h ago•0 comments

A/B Tests over Evals

https://www.raindrop.ai/blog/thoughts-on-evals/
1•Nischalj10•1h ago•0 comments

Cronmaster: Cron Management Made Easy

https://github.com/fccview/cronmaster
2•thunderbong•1h ago•0 comments

Show HN: Spatial CAPTCHA – 3D spatial reasoning test against AI bots

https://github.com/Shining04/Spatial-CAPTCHA
1•Shining_S•1h ago•1 comments

How markets could topple the global economy

http://economist.com/leaders/2025/11/13/how-markets-could-topple-the-global-economy
1•helsinkiandrew•1h ago•2 comments

Crowdsourced Prompt Engineering

1•notjunior•1h ago•0 comments