Paste a script, press record, and it highlights the current word as you speak. If you pause it waits; if you skip lines it finds its place again.
Everything runs entirely in the browser — speech recognition (Moonshine ONNX), VAD, and fuzzy script matching.
Demo: https://larsbaunwall.github.io/promptme-ai
Most of the project was initially built using Perplexity Computer, which made for an interesting agentic coding workflow.
Curious what people think about the script alignment approach.
lbaune•1h ago
ASR output arrives in ~600 ms chunks and is messy (filler words, homophones, skipped phrases). A simple substring match breaks immediately.
The current tracker uses:
- inverted token index to find candidate windows - banded Levenshtein distance for fuzzy matching - Double Metaphone phonetic normalization - locality penalties to stay near the current position
Between ASR updates the UI speculatively advances the cursor based on measured WPM so the highlight moves smoothly.
Curious if anyone here has worked on similar real-time alignment problems.