frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Why aren't there any "YouTube competitors?"

https://justinkuiper.substack.com/p/why-arent-there-any-youtube-competitors
2•surprisetalk•2mo ago

Comments

ArtemZ•2mo ago
Peertube? It is such a great piece of software and I use it locally to host family videos and things I want to save from youtube.

I feel like it lacks traction due to how convenient and popular youtube is...but at the same time with more and more ad and the war on ad blockers things can change.

al_borland•2mo ago
Odd that there was no mention of the legacy competitors that are still around, like Vimeo and DailyMotion. Or new places like Rumble, which was born out of YouTube’s heavy hand around moderation.
chasing0entropy•2mo ago
Doesn't fit the narrative. Article also doesn't mention Skype, Whatsapp, Teams, go-to meeting, or literally dozens of other video messaging and chat platforms which, if that data has been retained somewhere out of sight for decades, are an actual goldmine for fresh LLM training data.
benoau•2mo ago
YouTube assembled a catalog of virtually all music videos and recorded concerts that ever existed, I think anyone trying to do this today would spend all their time dealing with automated DMCA takedowns.
qcnguy•2mo ago
How valuable YouTube really is depends a lot on several things the article didn't discuss:

1. To what extent scaling laws hold.

2. To what extent adding more amateur video data increases quality.

3. To what extent Google can stop other AI firms scraping the best quality stuff.

The OpenAI Whisper model is already nearly perfect, despite its habit of occasionally transcribing silence or noise as "Thanks for watching!". Adding another petabyte of data isn't going to make it better. Training on a gazillion terabytes of private videos probably isn't going to be allowed by the lawyers either in case the model memorizes something sensitive. And the long tail of public videos nobody ever watches probably doesn't add anything.

LLM training already became years ago about securing access to unique value adding data, not just throwing more web crawl data into the mix. That's why the big AI firms all pay PhDs to create transcripts of their reasoning as they solve hard problems and similar. YouTube is probably already tapped out as a source of really useful data, although of course there's a thin sliver of new high quality content being uploaded all the time that's useful to keep the knowledge base fresh.

The problem for Google is that it's only possible to stop scraping at scale. If an AI lab uses enough proxies, VMs and similar they can still grab a steady stream of new videos and it's hard to stop them (short of going full DRM for everything and maybe not even then). They can block bulk scrapes of YouTube and are doing so, but that slams the stable door after the horse has bolted.