frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I built on-device TTS app because I run out of audiobooks on a flight

https://loudreader.io
2•mowmiatlas•2h ago
I didn't want to upload my own material to a third party cloud service, use mobile data for voice synthesis and kept running out of podcast queue on flights, so I spent a while trying to get Kokoro running on my iPhone.

LoudReader is what came out of it - an iOS app that reads essays, articles, and books aloud, fully on-device. No account, no network after install.

The model running once reading a sentence was the easy part. Making it not feel like a demo was the rest: streaming synthesis so playback starts before the sentence finishes, porting misaki to swift because I could only find python releases, thermal monitoring and strategy was a tough one as well. Runs well on iPhone 14 Pro(what I have) and newer. Tested on my mom's iPhone 12 Pro and it chokes sometimes, so I ported KittenTTS as a lighter fallback for older devices. The whole project took around 2-3 months on the weekends with claude code and codex.

Smooth TTS was the hard part but the app around it grew larger than I expected with EPUB/PDF import, Gutenberg browsing, a saved-articles queue, multi-week reading campaigns. Happy to dig into any of it in comments.

PDFs, especially academic papers and scanned docs, still annoy me. I built an OCR flow that handles regular documents, but scientific papers with two-column layouts, equations, and fine print are still messy. Curious if anyone here has shipped PDF extraction on mobile that actually handles this well.

This was my first time designing a user-facing product - I'm more of a deep-engineering person so any feedback is welcome too. I'll post a write up on the biggest hurdles in the comments as well.

If you've ever tried to listen to something long on a plane, you get why this exists.

Comments

mowmiatlas•1h ago
Extra context since the post got long. A few things that ate more time than I expected:

Streaming was the worst one. Kokoro doesn't expose a streaming interface as far as I could find, you hand it a chunk of text, it gives you back the full audio for that chunk. For a reading app you can't wait for a whole paragraph before playback starts, so the whole streaming layer had to be built on top. I didn't want to process the book then serve full audio, I wanted it to be interactive.

The basic shape: chunk into sentence-sized windows, render in the background, queue rendered chunks for playback, keep a small pre-render lookahead so playback never starves but the phone isn't speculatively rendering an entire chapter it might throw away on a skip.

Sentence chunking was its own fight. Too long and the model returns null and playback stops. Too short (four or five words at a time) and the naturalness diminishes, because the model uses context within a sentence to decide intonation. Chopped chunks sound like a bad GPS voice. I had to find the goldilocks window where the model is happy and the result still sounds good and handle long-sentence edge cases by splitting on secondary punctuation and stitching the audio back together without audible seams.

For battery-life there's cruise mode. When the screen is off and the next several sentences are already rendered and cached, the app swaps the whole synthesis/playback pipeline for a much lighter sequential AAC player, hardware-decoded audio files.

When the phone's on a charger, a background task pre-renders a chapter or two of upcoming audio and writes it to disk as M4A. That way, by the time you're actually reading, cruise mode has a cache to play from and the neural engine never has to wake up for long stretches. The system decides when to actually run the task, so it piggybacks on the phone's usual overnight charging window.

The Neural Engine was a disappointment. I was hoping to get Kokoro onto the ANE for the latency/efficiency win, seeing it works quite well on CPU, but it uses ops that CoreML doesn't route to the Neural Engine, so it falls back to GPU/CPU. The weird part: forcing .cpuAndNeuralEngine is actually slower than .cpuAndGPU on this model, probably partitioning cost from unsupported ops bouncing between compute units, but I don't fully understand why. If anyone on CoreML has a principled explanation I'd love to hear it.

iPhone 12 mini and lower, and simulators are cursed. They seem to run Kokoro successfully, i.e. no error, inference completes but the result is pure crackling/screeching gibberish audio. Same model, same weights, same code path. KittenTTS runs fine on the exact same hardware AND the XCode simulator. I still don't know what's going on here; Curious if anyone's seen similar.

KittenTTS was easy. Ported it as a fallback for older devices and published a minimal iOS example repo while I was at it: https://github.com/pepinu/KittenTTS-iOS if you just want to see how to get a neural TTS model running on iPhone without the full app machinery around it.

Before I got the iPhone optimization work far enough along, Kokoro ran in real time on a MacBook that I was literally putting a laptop on the passenger seat for long drives just to have something read to me. Very inconvenient, but it made me commit to getting the phone path right. The current build isn't really tested on Mac, maybe in the future.

On the LLM tooling question up front: YES, used Claude Code and Codex throughout. I might be too much into tokenmaxxing though, since I'd run several sessions in tandem for bug hunting and several more for review to get wisdom of the crowd of sorts.

Show HN: AI in Slack – talk to any model directly in threads

https://slack.com/marketplace/A09TXAH9V09-plug-and-ai?nojsmode=1
1•coderai•21s ago•0 comments

Write stuff down and document things

https://thereabouts.bearblog.dev/why-you-should-write-stuff-down-and-document-things/
1•speckx•1m ago•0 comments

How Older Adults Are Using V.R. To Counter Social Isolation

https://www.nytimes.com/2026/04/15/technology/vr-technology-elderly-community-social-isolation.html
1•mitchbob•2m ago•0 comments

The next evolution of the Agents SDK

https://openai.com/index/the-next-evolution-of-the-agents-sdk/
3•meetpateltech•5m ago•0 comments

What China's Great Green Wall can teach the world

https://www.nature.com/articles/d41586-026-01195-3
3•Brajeshwar•5m ago•0 comments

Graphs That Explain the State of AI in 2026

https://spectrum.ieee.org/state-of-ai-index-2026
2•CarbonCycles•6m ago•0 comments

Microbes make microplastics more likely to form ice in clouds, research reveals

https://phys.org/news/2026-03-microbes-microplastics-ice-clouds-reveals.html
2•PaulHoule•6m ago•0 comments

CPUs Aren't Dead. Gemma2B Out Scored GPT-3.5 Turbo on Test That Made It Famous

https://seqpu.com/CPUsArentDead/
2•fredmendoza•6m ago•0 comments

Can you steal $10k from a locked iPhone? [video]

https://www.youtube.com/watch?v=PPJ6NJkmDAo
2•terramex•8m ago•0 comments

Allbirds shares soar 600% as it pivots from footwear to AI

https://www.cnn.com/2026/04/15/investing/allbirds-pivot-to-ai
3•samsolomon•9m ago•0 comments

The Download: NASA's nuclear spacecraft and unveiling our AI 10

https://www.technologyreview.com/2026/04/15/1135904/the-download-nasa-nuclear-powered-spacecraft-...
1•joozio•10m ago•0 comments

Prove You Are a Robot: CAPTCHAs for Agents

https://browser-use.com/posts/prove-you-are-a-robot
2•lukasec•10m ago•0 comments

Gemini on Mac

https://twitter.com/sundarpichai/status/2044452464724967550
1•tosh•10m ago•1 comments

Show HN: Tine – Drive Wayland Around with Agents

https://github.com/smythp/tine
2•tarboreus•11m ago•0 comments

Allbirds Is Pivoting to AI. Why Not?

https://www.wsj.com/livecoverage/stock-market-today-dow-sp-500-nasdaq-04-15-2026/card/allbirds-is...
3•gbourne1•11m ago•1 comments

Show HN: Mac menu bar app for Claude Code rate limits

https://github.com/elliotykim/claudewatch
2•elliotykim•11m ago•0 comments

Show HN: Dependicus, a dashboard for your monorepo's dependencies

https://descriptinc.github.io/dependicus/
4•irskep•12m ago•0 comments

Show HN: I built a dev server that runs on half a lightbulb

https://bhave.sh/how-cheap-agent-dev-server/
1•muunbo•12m ago•1 comments

Inter-1 – Omni-modal model for detecting social signals in video

https://www.interhuman.ai/blog/introducing-inter-1
3•interhuman•13m ago•0 comments

Nasal spray rewinds the aging brain, restoring memory and reversing inflammation

https://isevjournals.onlinelibrary.wiley.com/doi/10.1002/jev2.70232
1•arunc•13m ago•0 comments

Introducing: ShaderPad

https://rileyjshaw.com/blog/introducing-shaderpad/
3•evakhoury•15m ago•0 comments

The Slop KPI Era: How Tokenmaxxing Is Making AI Worse

https://portofcontext.com/blog/welcome-to-the-slop-kpi-era-how-tokenmaxxing-is-making-ai-worse
4•pmkelly4444•15m ago•0 comments

BBC to cut almost one in 10 staff to make £500M savings

https://www.bbc.com/news/articles/cyv11lryv7ro
2•hmmmmmmmmmmmmmm•16m ago•0 comments

Where the Heck Did My Taxes Go?

https://wherethefuckdidmytaxesgo.com/
4•kacy•16m ago•0 comments

Xcaca – an X server in ASCII art

https://github.com/robinpie/xcaca
1•robinpie•18m ago•0 comments

Veevo Health: building preventive CT heart scans (progress and challenges)

2•arvindsr33•19m ago•0 comments

Subagents have arrived in Gemini CLI

https://developers.googleblog.com/subagents-have-arrived-in-gemini-cli/
1•xnx•19m ago•0 comments

Debloat Your Async Rust

https://tweedegolf.nl/en/blog/235/debloat-your-async-rust
3•birdculture•19m ago•0 comments

SHA Algorithm Visualized

https://sha256algorithm.com/
2•robertvc•20m ago•1 comments

The most future-proof job: Entrepreneurship

https://www.shopify.com/news/future-proof-job
1•emersonmacro•20m ago•0 comments