I've been using AI agents quite a bit over the last year or so and one thing I've noticed is that I'm spending 4-5x more time typing than I had before using AI agents (hand coding...gasp!)
Mind you, I type in colemak and still manage to reach 140WPM on my best days, so I'm no slow typist. My first thought was giving Windows built in dictation tool, but the formatting was awful and quickly found how much I missed having the ability to provide file context to ensure my agent had the best resources possible to fix my problem.
I figured, what's a better way to build my skills as an 'AI engineer' than undertaking a project in a tech stack that I know absolutely nothing about (electron). Well I picked up Windsurf, spent many many late nights over the course of a couple weeks coding & QA testing...and I came up with SpeakEasy.
SpeakEasy is a desktop application that integrates with OpenAI's whisper API to transform your dictated speech into wonderfully and correctly formatted text that agents are more likely to understand. I didn't stop there though, we were still missing the ability to add file context during the voice-to-text process. So I simply added it.
Another few late nights, edge case testing, and viola...a simple, easy to use voice-to-text tool that has the ability to provide file context to your AI Agent (Windsurf, Claude Code currently supported with '@' syntax to add file context).
I know the entire thing was built by AI, but I'm a senior software engineer with 10+ years of experience and like to think that even while I've lost some of the precise details of what exactly is going on under the hood, my foundational architecture that this application was stood up on will hold strong and my strong QA testing will help prevent any disastrous bugs!
So if you're like me and tired of typing all day, I'd love for you to consider trying my app. I honestly use it every day and sharing what I've built with other people who can appreciate it makes me incredibly happy. There's a pretty generous free tier available (100 transcriptions/month), no sign up needed.
The only caveat here is that before using my app, you'll need to make sure you've got an OpenAI API key (stored locally and only used for Whisper API calls), otherwise you won't be able to make any transcriptions.
Once you've installed and gone through the onboarding process, it's pretty simple to get started. Just use your toggle hotkey (Ctrl + Shift + Space) or push-to-talk (Ctrl + `) and start a recording. Once you've finished speaking, your speech will be transcribed and inserted at your current cursor position and automatically press enter (to allow you to fluidly interact with your AI agent).
The transcribed text will also be saved to your clipboard (configurable) in case you weren't quite ready to insert the text. To try out the file context, while you're speaking just clearly say the file name and the app will translate it from 'file.tsx' to '@file.tsx' + press tab. Translating the file name to have this '@' syntax along with pressing tab is the process which allows files to be added to context just using our voice!
Anyways, I hope someone at least finds what I've built cool and useful, I'd love to hear any and all feedback if you have it.
Thanks for reading,
FlyingPanda
sigmaprimus•1h ago
Sounds like a pretty neat program, Have you played around with Voice Access The application included with Windows? I'm using it right now to type this message. Unfortunately I can't use programs that have a push to talk as I am paralyzed from the neck down... But believe me when I say It's nice to see someone working on text to speech And that any improvement or new application can only make mine and people like me lives better. I haven't really done much with the open AI offerings but I do use Gemini CLI and it is exhausting To the point where my throat gets dry and I have to drink some water if I want to Keep working** Edit** drinking water involves me calling a care aid to assist me!
oFlyingPanda•2h ago
Mind you, I type in colemak and still manage to reach 140WPM on my best days, so I'm no slow typist. My first thought was giving Windows built in dictation tool, but the formatting was awful and quickly found how much I missed having the ability to provide file context to ensure my agent had the best resources possible to fix my problem.
I figured, what's a better way to build my skills as an 'AI engineer' than undertaking a project in a tech stack that I know absolutely nothing about (electron). Well I picked up Windsurf, spent many many late nights over the course of a couple weeks coding & QA testing...and I came up with SpeakEasy.
SpeakEasy is a desktop application that integrates with OpenAI's whisper API to transform your dictated speech into wonderfully and correctly formatted text that agents are more likely to understand. I didn't stop there though, we were still missing the ability to add file context during the voice-to-text process. So I simply added it.
Another few late nights, edge case testing, and viola...a simple, easy to use voice-to-text tool that has the ability to provide file context to your AI Agent (Windsurf, Claude Code currently supported with '@' syntax to add file context).
I know the entire thing was built by AI, but I'm a senior software engineer with 10+ years of experience and like to think that even while I've lost some of the precise details of what exactly is going on under the hood, my foundational architecture that this application was stood up on will hold strong and my strong QA testing will help prevent any disastrous bugs!
So if you're like me and tired of typing all day, I'd love for you to consider trying my app. I honestly use it every day and sharing what I've built with other people who can appreciate it makes me incredibly happy. There's a pretty generous free tier available (100 transcriptions/month), no sign up needed.
The only caveat here is that before using my app, you'll need to make sure you've got an OpenAI API key (stored locally and only used for Whisper API calls), otherwise you won't be able to make any transcriptions.
Once you've installed and gone through the onboarding process, it's pretty simple to get started. Just use your toggle hotkey (Ctrl + Shift + Space) or push-to-talk (Ctrl + `) and start a recording. Once you've finished speaking, your speech will be transcribed and inserted at your current cursor position and automatically press enter (to allow you to fluidly interact with your AI agent).
The transcribed text will also be saved to your clipboard (configurable) in case you weren't quite ready to insert the text. To try out the file context, while you're speaking just clearly say the file name and the app will translate it from 'file.tsx' to '@file.tsx' + press tab. Translating the file name to have this '@' syntax along with pressing tab is the process which allows files to be added to context just using our voice!
Anyways, I hope someone at least finds what I've built cool and useful, I'd love to hear any and all feedback if you have it.
Thanks for reading, FlyingPanda
sigmaprimus•1h ago