I used the following tech stacks to build it: Python (Whisper models), Avalonia UI (frontend), C# (backend), and SQLite (database).
Here's some of the key features:
Speaker Identification: Automatically labels speakers (SPEAKER 01, SPEAKER 02) and allows you to update them with real names. Whisper Model Transcription: Supports 98 languages and multiple audio/video formats. Real-Time Transcription: Transcribe audio with your device microphone or system speakers which is ideal for meetings, legal depositions, or court proceedings. Bulk Transcription: Queue multiple files for batch processing, useful for podcasts or large audio collections. Proofreading & Editing Tools: Integrated media player and editor for quick corrections or scoping. AI Models: Choose Intermediate, Advanced, or Expert to balance speed and accuracy. GPU Acceleration: Boost transcription performance with NVIDIA GPUs. Export Options: Save transcripts as Word, PDF, or plain text.
We just launched version 2.0.0 on Microsoft Store: https://apps.microsoft.com/detail/9nqjkq3l649b
Check out demos: https://ekhos.ai/demos
Benchmarks & hardware requirements: https://ekhos.ai/help-center/hardware-requirements/
Learn more on our website: https://ekhos.ai
What’s next:
Additional on-device AI tools (e.g., summarization) and MacOS support
Please give it a try. I’d love to hear your feedback and answer any of your questions.