Every TTS browser extension fails on Kindle Cloud Reader. The reason: Amazon renders text using custom font subsets where glyph IDs don't map to standard Unicode. You
select text, copy it, and get garbage. The DOM is useless.
CastReader solves this by intercepting KindleModuleManager to capture font and token data, decoding glyph mappings from the binary font tables, then running Tesseract.js
OCR locally in an offscreen document to calibrate the decoder. The final text comes from glyph decoding (not OCR) so it's accurate enough for word-level highlight sync.
WeRead (the largest Chinese reading platform) has a similar problem — it renders everything on canvas. CastReader uses a main-world content script injected at
document_start to intercept fetch responses containing chapter data before the page consumes them.
For normal websites, there's a 3-tier extraction pipeline: 15+ site-specific extractors (Notion, Google Docs, ChatGPT, Claude, arXiv, etc.), a learned CSS selector rule
system, and a universal visible-text-block algorithm that fuses ideas from Readability.js, Boilerpipe, and JusText — container scoring with text density, link density
scaling, stop-word classification, and progressive retry with flag degradation.
TTS runs through Kokoro, an open model supporting 40+ languages. Audio plays directly in the content script so highlight sync reads currentTime with zero latency — no
message passing, no offscreen document relay.
Limitations I should be honest about: the voice library is small (Kokoro only, no premium neural voices), no mobile support, extraction still fails on some complex
layouts (there's a manual content selector fallback), and the TTS server is something I run myself, so uptime isn't guaranteed.
Completely free. No signup, no usage limits, no premium tier. Chrome and Edge.