Built a tool combining MLX Whisper + pyannote for fast local audio transcription with speaker diarization on Apple Silicon.
Key benefits: privacy-first (fully local), hardware-accelerated, automatic speaker identification, multiple output formats (TXT/SRT/JSON).
Main technical challenge was making MLX Whisper and pyannote work together despite different audio processing - solved with preprocessing pipeline.
Perfect for interviews, meetings, podcasts. Handles HuggingFace gated models with proper error handling.