Why aren't there any "YouTube competitors?"

https://justinkuiper.substack.com/p/why-arent-there-any-youtube-competitors

2•surprisetalk•2mo ago

Comments

ArtemZ•2mo ago

Peertube? It is such a great piece of software and I use it locally to host family videos and things I want to save from youtube.

I feel like it lacks traction due to how convenient and popular youtube is...but at the same time with more and more ad and the war on ad blockers things can change.

al_borland•2mo ago

Odd that there was no mention of the legacy competitors that are still around, like Vimeo and DailyMotion. Or new places like Rumble, which was born out of YouTube’s heavy hand around moderation.

chasing0entropy•2mo ago

Doesn't fit the narrative. Article also doesn't mention Skype, Whatsapp, Teams, go-to meeting, or literally dozens of other video messaging and chat platforms which, if that data has been retained somewhere out of sight for decades, are an actual goldmine for fresh LLM training data.

benoau•2mo ago

YouTube assembled a catalog of virtually all music videos and recorded concerts that ever existed, I think anyone trying to do this today would spend all their time dealing with automated DMCA takedowns.

qcnguy•2mo ago

How valuable YouTube really is depends a lot on several things the article didn't discuss:

1. To what extent scaling laws hold.

2. To what extent adding more amateur video data increases quality.

3. To what extent Google can stop other AI firms scraping the best quality stuff.

The OpenAI Whisper model is already nearly perfect, despite its habit of occasionally transcribing silence or noise as "Thanks for watching!". Adding another petabyte of data isn't going to make it better. Training on a gazillion terabytes of private videos probably isn't going to be allowed by the lawyers either in case the model memorizes something sensitive. And the long tail of public videos nobody ever watches probably doesn't add anything.

LLM training already became years ago about securing access to unique value adding data, not just throwing more web crawl data into the mix. That's why the big AI firms all pay PhDs to create transcripts of their reasoning as they solve hard problems and similar. YouTube is probably already tapped out as a source of really useful data, although of course there's a thin sliver of new high quality content being uploaded all the time that's useful to keep the knowledge base fresh.

The problem for Google is that it's only possible to stop scraping at scale. If an AI lab uses enough proxies, VMs and similar they can still grab a steady stream of new videos and it's hard to stop them (short of going full DRM for everything and maybe not even then). They can block bulk scrapes of YouTube and are doing so, but that slams the stable door after the horse has bolted.

Beyond Agentic Coding

OpenClaw ClawHub Broken Windows Theory – If basic sorting isn't working what is?

OpenBSD Copyright Policy

OpenClaw Creator: Why 80% of Apps Will Disappear

What Happens When Technical Debt Vanishes?

AI Is Finally Eating Software's Total Market: Here's What's Next

Computer Science from the Bottom Up

Show HN: I built a toy compiler as a young dev

You don't need Mac mini to run OpenClaw

Learning to Reason in 13 Parameters

Convergent Discovery of Critical Phenomena Mathematics Across Disciplines

Ask HN: Will GPU and RAM prices ever go down?

From hunger to luxury: The story behind the most expensive rice (2025)

Substack makes money from hosting Nazi newsletters

A New Crypto Winter Is Here and Even the Biggest Bulls Aren't Certain Why

Moltbook was peak AI theater

Why Claude Cowork is a math problem Indian IT can't solve

Show HN: Built an space travel calculator with vanilla JavaScript v2

Why a 175-Year-Old Glassmaker Is Suddenly an AI Superstar

Micro-Front Ends in 2026: Architecture Win or Enterprise Tax?

These White-Collar Workers Actually Made the Switch to a Trade

The Wonder Drug That's Plaguing Sports

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

Federated Credential Management (FedCM)

Token-to-Credit Conversion: Avoiding Floating-Point Errors in AI Billing Systems

The Story of Heroku (2022)

Obey the Testing Goat

Claude Opus 4.6 extends LLM pareto frontier

Brute Force Colors (2022)

Google Translate apparently vulnerable to prompt injection