Hey, I am creator of video2docs and this is my first "big" launch, I am nervous (no, I am fine actually) :D
But the idea is quite simple, recently I had to write more and more docs (more often - how-tos and guides for company systems) and got a bit tired of it. Decided it would be cool if I could just record a video clicking through the app (or multiple apps, does not matter) and then analyse the video content even without audio narration.
That is how video2docs was born! I plan to add audio analysis too, for even better quality documentation, but for now, I am happy with how it works even without it.
You can choose from 10 LLM models for video analysis.
Choose documentation style (tutorial, how-to, quickstart...)
And, of course, choose whether to include screenshots in generated markdown docs. Yay, no need to make screenshots manually! :)
I hope someone else might find this useful. I will continue working on this project!
eevmanu•6h ago
Looks awesome!
Is there anything you can share about the architecture or pipeline you used for it? A high-level overview would be enough.
I’m guessing you’re doing video-to-image, image-to-text, and then text-to-docs, right? Since not all of the models you mentioned are multimodal.
alexattt•10h ago
But the idea is quite simple, recently I had to write more and more docs (more often - how-tos and guides for company systems) and got a bit tired of it. Decided it would be cool if I could just record a video clicking through the app (or multiple apps, does not matter) and then analyse the video content even without audio narration. That is how video2docs was born! I plan to add audio analysis too, for even better quality documentation, but for now, I am happy with how it works even without it.
You can choose from 10 LLM models for video analysis. Choose documentation style (tutorial, how-to, quickstart...) And, of course, choose whether to include screenshots in generated markdown docs. Yay, no need to make screenshots manually! :)
I hope someone else might find this useful. I will continue working on this project!
eevmanu•6h ago
Is there anything you can share about the architecture or pipeline you used for it? A high-level overview would be enough.
I’m guessing you’re doing video-to-image, image-to-text, and then text-to-docs, right? Since not all of the models you mentioned are multimodal.