I decided to build Edit Mind, which started as a simple CLI that would transcribe videos using OpenAI Whisper and search across them using text, nothing fancy.
Then I decided to add frame analysis. I built a Python script that would handle that, take the full video, divide it into smaller parts that would be 1 second to 2 seconds long, pass the frame to another system that would recognize faces, objects, text over image, etc.
After that, I decided to build an Electron desktop app that would manage the UI and have a search and chat feature.
Then, I was like let's open source it and share it with the community over Reddit. People loved it (https://www.reddit.com/r/selfhosted/comments/1ogis3j/i_built...). Many of them requested Docker integration. I decided to focus on that instead, which was a great idea and suggestion. (https://www.youtube.com/watch?v=YrVaJ33qmtg&t=12s)
Now, we have 3 Docker containers: one responsible for the web UI, one for the background jobs , media stuff and local vector integration, one for the ML service (transcription and frame analysis) that will be a Python script communicating via WebSocket.
After posting over X and tagging Twelve Labs for inspiring me to add new features and UI enhancements to the project, I had the opportunity to present the project live at their webinar series (if you wanna check Edit Mind in a live demo: https://www.youtube.com/watch?v=k_aesDa3sFw&t=1271s)
After getting the proof of concept with all the features that I was hoping to get from Edit Mind, it's time to focus on improving the code quality, refactoring, and implementing best practices. Remember that I'm working solo with the help of external contributors.
I would love to get your feedback about the project.