The development process was probably the most interesting part. I used Gemini 3 Pro Preview and 3 Flash Preview almost exclusively (yes, it was not Claude). It went from chatting to writing light specs and decomposing to tasks, to finally setting up a Ralph-style orchestrator that I would leave running overnight. I also had not a few of late nights; making progress this way was exhausting, but still fun. It's not something I ever would have built without AI. The models helped me maintain strict linters, achieve upper-80s test coverage, write the Cloudflare services, and walked me through the maze of Apple's sandboxing, certificate provisioning and signing. End to end, everything was made through Gemini, down to the scripting and recording of the promo video. The linter became a quality-of-context goad. Yes, I really do want explicit type interfaces even through Swift does fine without them, and no you will not write more than 800 lines of code in a file.
Pain points: The UI tests are slow, test logs flood context, and the AI hallucinates (who knew). Cloning lib code into tmp/ to answer questions became a habit.
Under the hood, it relies on a sizeable set of technologies. It feels like the whole thing is like Aesop's polite gnat that rested on a bull. In other words: built on the shoulders of giants. Georgi Gerganov, Hyeongju Kim, Steve Yegge, Sindre Sorhus, Gwendal Roué, Zorg, the team who made Whisper, and so so many others: you have my thanks.
The app is built in Swift, it uses Whisper.cpp and ONNX for inference and supports Whisper and Parakeet for speech-to-text and uses Supertonic text-to-speech. I got to try a few new (to me) tools: Prek for pre-commit hooks, Tuist for project generation, dvc for model versioning and management, beads for agent work tracking, et.al. Gemini converted raw Whisper .pt files to CoreML using PyTorch, and I spent a lot of time experimenting to see how much of a difference using the Apple Neural Engine would make (interestingly, not as much as I expected in my use case, but both modes work). Parakeet is also in there just for kicks (Whisper produces better results).
I originally planned to launch on the Mac App Store, but the reviewers insisted I remove a behavior I felt was central to the app. So instead I decided to distribute it directly, using Cloudflare Workers/R2, and LemonSqueezy for sales & licensing.
Supertonic's diffusion models are interesting to use; they never read a text exactly the same way twice. If you do decide to try OneSentence, just for fun, turn off the "Refine Punction" setting and see what happens when you have it read a sentence with lots of exclamation marks, my boys got a kick out of it.
I set the defaults for myself. Transcription defaults to Whisper Large V2—which is relatively old, large and slow, but I found the transcription quality excellent.
I am offering this primarily as a one-time purchase, But there is also a cheap subscription option there if people prefer it. I know there are other product options, other models, and even macOS's own functionality. Choice is good.
This has been my evenings-and-weekends project for the last several months, and as a daily driver, I get a lot of value out of it.
You can check it out here: https://onesentence.app/ Use the promo code I3MDE1MQ to get 40% off in the next two weeks.
I'll be around to answer any questions.
Cheers!