I tried the Google Video Intelligence API. Got a $400 bill for 4 videos (5 minutes average, 4k videos) of analysis (doesn't include video transcription), and I used my GCP startup credits to cover the bill.
I decided to build my own tool that needs to have 3 important things: can transcribe videos, analyse video frames, and everything needs to be done locally.
I don't wanna deal with storing my videos in the cloud because of two concerns: privacy and storage cost.
I've been working for the last couple of months. I have a source available version that can be used for free (personal and commercial use with companies that have fewer than 5 people). Available here (https://github.com/IliasHad/edit-mind), and the project has 1.3k Github stars
Now, I'm building a desktop app with direct NLE integration (Final Cut Pro, DaVinci Resolve, and Adobe Premiere Pro). This includes an editing agent that understands your footage and your editing style.
Preview: https://youtu.be/jcctyfVg_34
Happy to answer questions and hear your feedback.