Show HN: I built on-device TTS app because I run out of audiobooks on a flight

https://loudreader.io

2•mowmiatlas•2h ago

I didn't want to upload my own material to a third party cloud service, use mobile data for voice synthesis and kept running out of podcast queue on flights, so I spent a while trying to get Kokoro running on my iPhone.

LoudReader is what came out of it - an iOS app that reads essays, articles, and books aloud, fully on-device. No account, no network after install.

The model running once reading a sentence was the easy part. Making it not feel like a demo was the rest: streaming synthesis so playback starts before the sentence finishes, porting misaki to swift because I could only find python releases, thermal monitoring and strategy was a tough one as well. Runs well on iPhone 14 Pro(what I have) and newer. Tested on my mom's iPhone 12 Pro and it chokes sometimes, so I ported KittenTTS as a lighter fallback for older devices. The whole project took around 2-3 months on the weekends with claude code and codex.

Smooth TTS was the hard part but the app around it grew larger than I expected with EPUB/PDF import, Gutenberg browsing, a saved-articles queue, multi-week reading campaigns. Happy to dig into any of it in comments.

PDFs, especially academic papers and scanned docs, still annoy me. I built an OCR flow that handles regular documents, but scientific papers with two-column layouts, equations, and fine print are still messy. Curious if anyone here has shipped PDF extraction on mobile that actually handles this well.

This was my first time designing a user-facing product - I'm more of a deep-engineering person so any feedback is welcome too. I'll post a write up on the biggest hurdles in the comments as well.

If you've ever tried to listen to something long on a plane, you get why this exists.

Comments

mowmiatlas•1h ago

Extra context since the post got long. A few things that ate more time than I expected:

Streaming was the worst one. Kokoro doesn't expose a streaming interface as far as I could find, you hand it a chunk of text, it gives you back the full audio for that chunk. For a reading app you can't wait for a whole paragraph before playback starts, so the whole streaming layer had to be built on top. I didn't want to process the book then serve full audio, I wanted it to be interactive.

The basic shape: chunk into sentence-sized windows, render in the background, queue rendered chunks for playback, keep a small pre-render lookahead so playback never starves but the phone isn't speculatively rendering an entire chapter it might throw away on a skip.

Sentence chunking was its own fight. Too long and the model returns null and playback stops. Too short (four or five words at a time) and the naturalness diminishes, because the model uses context within a sentence to decide intonation. Chopped chunks sound like a bad GPS voice. I had to find the goldilocks window where the model is happy and the result still sounds good and handle long-sentence edge cases by splitting on secondary punctuation and stitching the audio back together without audible seams.

For battery-life there's cruise mode. When the screen is off and the next several sentences are already rendered and cached, the app swaps the whole synthesis/playback pipeline for a much lighter sequential AAC player, hardware-decoded audio files.

When the phone's on a charger, a background task pre-renders a chapter or two of upcoming audio and writes it to disk as M4A. That way, by the time you're actually reading, cruise mode has a cache to play from and the neural engine never has to wake up for long stretches. The system decides when to actually run the task, so it piggybacks on the phone's usual overnight charging window.

The Neural Engine was a disappointment. I was hoping to get Kokoro onto the ANE for the latency/efficiency win, seeing it works quite well on CPU, but it uses ops that CoreML doesn't route to the Neural Engine, so it falls back to GPU/CPU. The weird part: forcing .cpuAndNeuralEngine is actually slower than .cpuAndGPU on this model, probably partitioning cost from unsupported ops bouncing between compute units, but I don't fully understand why. If anyone on CoreML has a principled explanation I'd love to hear it.

iPhone 12 mini and lower, and simulators are cursed. They seem to run Kokoro successfully, i.e. no error, inference completes but the result is pure crackling/screeching gibberish audio. Same model, same weights, same code path. KittenTTS runs fine on the exact same hardware AND the XCode simulator. I still don't know what's going on here; Curious if anyone's seen similar.

KittenTTS was easy. Ported it as a fallback for older devices and published a minimal iOS example repo while I was at it: https://github.com/pepinu/KittenTTS-iOS if you just want to see how to get a neural TTS model running on iPhone without the full app machinery around it.

Before I got the iPhone optimization work far enough along, Kokoro ran in real time on a MacBook that I was literally putting a laptop on the passenger seat for long drives just to have something read to me. Very inconvenient, but it made me commit to getting the phone path right. The current build isn't really tested on Mac, maybe in the future.

On the LLM tooling question up front: YES, used Claude Code and Codex throughout. I might be too much into tokenmaxxing though, since I'd run several sessions in tandem for bug hunting and several more for review to get wisdom of the crowd of sorts.

Show HN: AI in Slack – talk to any model directly in threads

Write stuff down and document things

How Older Adults Are Using V.R. To Counter Social Isolation

The next evolution of the Agents SDK

What China's Great Green Wall can teach the world

Graphs That Explain the State of AI in 2026

Microbes make microplastics more likely to form ice in clouds, research reveals

CPUs Aren't Dead. Gemma2B Out Scored GPT-3.5 Turbo on Test That Made It Famous

Can you steal $10k from a locked iPhone? [video]

Allbirds shares soar 600% as it pivots from footwear to AI

The Download: NASA's nuclear spacecraft and unveiling our AI 10

Prove You Are a Robot: CAPTCHAs for Agents

Gemini on Mac

Show HN: Tine – Drive Wayland Around with Agents

Allbirds Is Pivoting to AI. Why Not?

Show HN: Mac menu bar app for Claude Code rate limits

Show HN: Dependicus, a dashboard for your monorepo's dependencies

Show HN: I built a dev server that runs on half a lightbulb

Inter-1 – Omni-modal model for detecting social signals in video

Nasal spray rewinds the aging brain, restoring memory and reversing inflammation

Introducing: ShaderPad

The Slop KPI Era: How Tokenmaxxing Is Making AI Worse

BBC to cut almost one in 10 staff to make £500M savings

Where the Heck Did My Taxes Go?

Xcaca – an X server in ASCII art

Veevo Health: building preventive CT heart scans (progress and challenges)

Subagents have arrived in Gemini CLI

Debloat Your Async Rust

SHA Algorithm Visualized

The most future-proof job: Entrepreneurship