SpeakEasy – Voice-to-Text with File Context for AI Agents

1•oFlyingPanda•2mo ago

Comments

oFlyingPanda•2mo ago

I've been using AI agents quite a bit over the last year or so and one thing I've noticed is that I'm spending 4-5x more time typing than I had before using AI agents (hand coding...gasp!)

Mind you, I type in colemak and still manage to reach 140WPM on my best days, so I'm no slow typist. My first thought was giving Windows built in dictation tool, but the formatting was awful and quickly found how much I missed having the ability to provide file context to ensure my agent had the best resources possible to fix my problem.

I figured, what's a better way to build my skills as an 'AI engineer' than undertaking a project in a tech stack that I know absolutely nothing about (electron). Well I picked up Windsurf, spent many many late nights over the course of a couple weeks coding & QA testing...and I came up with SpeakEasy.

SpeakEasy is a desktop application that integrates with OpenAI's whisper API to transform your dictated speech into wonderfully and correctly formatted text that agents are more likely to understand. I didn't stop there though, we were still missing the ability to add file context during the voice-to-text process. So I simply added it.

Another few late nights, edge case testing, and viola...a simple, easy to use voice-to-text tool that has the ability to provide file context to your AI Agent (Windsurf, Claude Code currently supported with '@' syntax to add file context).

I know the entire thing was built by AI, but I'm a senior software engineer with 10+ years of experience and like to think that even while I've lost some of the precise details of what exactly is going on under the hood, my foundational architecture that this application was stood up on will hold strong and my strong QA testing will help prevent any disastrous bugs!

So if you're like me and tired of typing all day, I'd love for you to consider trying my app. I honestly use it every day and sharing what I've built with other people who can appreciate it makes me incredibly happy. There's a pretty generous free tier available (100 transcriptions/month), no sign up needed.

The only caveat here is that before using my app, you'll need to make sure you've got an OpenAI API key (stored locally and only used for Whisper API calls), otherwise you won't be able to make any transcriptions.

Once you've installed and gone through the onboarding process, it's pretty simple to get started. Just use your toggle hotkey (Ctrl + Shift + Space) or push-to-talk (Ctrl + `) and start a recording. Once you've finished speaking, your speech will be transcribed and inserted at your current cursor position and automatically press enter (to allow you to fluidly interact with your AI agent).

The transcribed text will also be saved to your clipboard (configurable) in case you weren't quite ready to insert the text. To try out the file context, while you're speaking just clearly say the file name and the app will translate it from 'file.tsx' to '@file.tsx' + press tab. Translating the file name to have this '@' syntax along with pressing tab is the process which allows files to be added to context just using our voice!

Anyways, I hope someone at least finds what I've built cool and useful, I'd love to hear any and all feedback if you have it.

Thanks for reading, FlyingPanda

sigmaprimus•2mo ago

Sounds like a pretty neat program, Have you played around with Voice Access The application included with Windows? I'm using it right now to type this message. Unfortunately I can't use programs that have a push to talk as I am paralyzed from the neck down... But believe me when I say It's nice to see someone working on text to speech And that any improvement or new application can only make mine and people like me lives better. I haven't really done much with the open AI offerings but I do use Gemini CLI and it is exhausting To the point where my throat gets dry and I have to drink some water if I want to Keep working** Edit** drinking water involves me calling a care aid to assist me!

oFlyingPanda•2mo ago

Thanks so much for your response, if nothing more came from this project than directly helping you - I'd consider this an absolute success.

I'm going to look into what I can do to make this more accessible for users like you. If there is anything specific you can think of in the meantime, please feel free to reach me here or at support@speakeasydev.com

If I end up getting something that I think would be useful for you, I'll let you know!!

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

'Washington Post' CEO resigns after going AWOL during job cuts

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

University of Waterloo Webring

Large tech companies don't need heroes

Backing up all the little things with a Pi5

Game of Trees (Got)

Human Systems Research Submolt

The Threads Algorithm Loves Rage Bait

Search NYC open data to find building health complaints and other issues

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Show HN: Grovia – Long-Range Greenhouse Monitoring System

Ask HN: The Coming Class War

Mind the GAAP Again

The Yardbirds, Dazed and Confused (1968)

Agent News Chat – AI agents talk to each other about the news

Do you have a mathematically attractive face?