One thing which sort of worked decently was actually take the frames and put them into a grid and have the agent look at the image of all of the frames together. It did surprisingly well but missed a lot of subtle details that it couldn’t see.
Also tried various kinds of vision embeddings, heat map of motion etc, and blur etc to show motion. But none really worked as well so I ended up just describing it until it got it. Haven’t quite found the right solution yet.
And it's for me measuring different charged speeds at different starting battery capacities and different temperatures and I was like well. What if I just had a video camera pointing at the voltage going in and out and then I could see the battery percentage increase and I can have a temperature gun pointed at the phone as well. And I couldn't know what temperature of the phone is as well and it could just figure it all out create charts..
This would make reviewing different charging equipment really easy as long as you really have to do is plug it in and tell other people to do the same thing and take a video of it and beat it to the system.
I might very well give this a try!
cortexosmain•3h ago
claude-real-video takes a URL or local file and:
1. Extracts frames at every scene change (not fixed intervals) + a density floor 2. Deduplicates with a sliding-window pixel-diff algorithm (so A-B-A interview cutaways don't re-send the same shot) 3. Transcribes audio (prefers embedded subtitles, falls back to Whisper) 4. Optionally keeps the full soundtrack for audio-capable models 5. Writes a clean MANIFEST.txt you can drop into any LLM chat
A 10-min presentation goes from ~600 fixed-interval frames to 5-15 meaningful keyframes. 90%+ token savings with better comprehension.
The dedup approach (v0.2.0) uses real pixel difference on 16x16 RGB thumbnails against a sliding window of the last N kept frames — inspired by videostil's pixelmatch, but simpler and self-contained.
`--report` generates a self-contained HTML showing every keep/drop decision with diff percentages, so you can tune the threshold visually.
pip install claude-real-video && crv "https://youtube.com/watch?v=..." --report
MIT licensed, pure Python + ffmpeg. Happy to answer questions!
ProofHouse•1h ago
AmazingEveryDay•33m ago
garciasn•26m ago
What does it mean that Claude can’t view video; it did it just fine. Or do you mean tool less?
torhorway•3m ago