However it's a great idea, go for it!
A youtube crawler.
But yeah, I'd love the idea, it depends on whether big G is ok to have someone else doing what they could be doing but aren't to increase their ad money.
The trick is to do as little as necessary to provide the most essential end result in isolation.
You have big dreams, scale them back to this one idea, you are curating YouTube (or other stream source) content.
You cannot possibly process all YouTube content. Consider an interface like invidious with an alt button that queues the video, and from that precache clusters of recommendations, and then promote these to the similar interest group.
Whatever you do, curation is always a prosperous middle layer if you get the right niche interests.
If I were going to think about an internet revolution it would be a complete change of transmission scheme. WebSockets are vastly superior to HTTP in all regards except that HTTP 1.1 is sessionless. WebSockets also have a processing problem in that they are about 11x faster to send than to receive due to frame header parsing.
So, what I would do is create a new connection scheme with a new protocol that accomplishes only two goals:
1. Lowers transmission latency.
2. Increases security.
To lower latency I would ensure every message size is exactly 16384 bytes, which is maximum message size in TLS 1.3, regardless of actual content. I would also rethink what information is stored in a frame header and when it’s necessary to lower parsing effort. Balancing through put between maximum send rate to maximum receive rate would increase bandwidth and lower cpu management. That would ultimately lower the cost of network transmissions relative to other local hardware processing efforts.
To increase security I would create a new connection scheme away from a client/server model to a local/remote, which is the model git uses. I would continue to use TLS but I would also add an authentication layer based on a scheme of pre-shared keys. That way neither end of a transmission is anonymous, but anonymity is sacrificed for increased trust without reliance on certificates. That will increase privacy in ways the web currently struggles with and saves on security down the line by mitigating away layers of security abstractions that increase opportunities for exploitation.
The results are hit and miss right now. I have the feeling that it gets the video context based on the video description and comments, not the video itself.
I toyed around with something for my own use, where I downloaded the subtitles of a video, put the subtitles together into something semi-readable, then ran that through AI to create a properly readable version of the video. This all worked pretty well. That article could then be summarized if wanted. It was a one-time proof of concept. I may make it into an Apple Shortcut once they release macOS 26 and their AI writing tools are in the Shortcuts app. I mostly did this because it was something my dad said he wanted and I wanted to show him he could do it today if he really wanted. It didn't take that long to throw together.
99% of YT content is entertainment, at least to some degree, there is virtually no content worth reading that is already not available in other places.
sigwinch•5h ago