anyone have a tl;dr for me on what the best way to get the video comprehension stuff going is? i use qwen-30b-vl all the time locally as my goto model because it's just so insanely fast, curious to mess with the video stuff, the vision comprehension works great and i use it for OCR and classification all the time
moralestapia•38m ago
To me, this qualifies as some sort ASI already.
visioninmyblood•1m ago
I was using this for video understanding with inference form vlm.run infra. It definitely has outperformed Gemini which generally is much better than openai or Claude on videos. The detailed extraction is pretty good. With agents you can also crop into a segment and do more operations on it. have to see how the model modal space progresse
https://chat.vlm.run/c/82a33ebb-65f9-40f3-9691-bc674ef28b52
thot_experiment•2d ago