I built an app called AI Talk Coach to analyze audio recordings and give structured feedback on how you speak. It looks at clarity, pacing, filler words, and a few other speaking metrics.
The motivation was simple: I wanted to improve my communications skills but hiring a coach wasn’t realistic. So i saw that reviewing yourself is the best way to improve, and that gave me the idea to automate the feedback loop.
The idea behind the app is to have a "speaking gym" where i can practice daily and get feedback on what to improve on and how.
How it works (high-level):
* audio recorded
* processed + normalized client-side
* sent to backend for transcription
* analysis pipeline evaluates cadence, filler words, clarity, structure (via rules and LLM)
* model generates a breakdown + improvement suggestions
Limitations / known issues:
* tone/confidence scoring is still rough
* long recordings (>180s) are not well supported yet
* No video support yet
If anyone has ideas for better speech-analysis heuristics, or thoughts on whether this solves a real need, I’d love feedback.
Happy to answer any technical questions.
xicofigueiredo•35m ago
It is already part of my routine to train everyday with this app. Not sure if my speech skills improved for my company meetings, but my confidence boosted 100%.
Amazing app overall
zsottomayor•1h ago
The motivation was simple: I wanted to improve my communications skills but hiring a coach wasn’t realistic. So i saw that reviewing yourself is the best way to improve, and that gave me the idea to automate the feedback loop.
The idea behind the app is to have a "speaking gym" where i can practice daily and get feedback on what to improve on and how.
How it works (high-level):
* audio recorded * processed + normalized client-side * sent to backend for transcription * analysis pipeline evaluates cadence, filler words, clarity, structure (via rules and LLM) * model generates a breakdown + improvement suggestions
Limitations / known issues:
* tone/confidence scoring is still rough * long recordings (>180s) are not well supported yet * No video support yet
If anyone has ideas for better speech-analysis heuristics, or thoughts on whether this solves a real need, I’d love feedback.
Happy to answer any technical questions.