frontpage.

I'm based in Taiwan and run 3-4 Claude Code agents in parallel most of the day. Typing instructions to all of them was the actual bottleneck, so I built a voice-to-text tool that runs both STT and LLM polish locally.

Architecture: two-stage pipeline. Stage 1 is speech recognition via Whisper (whisper-rs, 7 model variants, DTW timestamps) or Qwen3-ASR. I quantized the Qwen3-ASR model myself and wrote the inference pipeline in pure Rust. It handles accented speech and dialects better than Whisper in my testing, likely because of broader training data. Silero VAD pre-filters audio before either engine runs.

Stage 2 is text polish via candle (HuggingFace's Rust ML framework). Available models: Phi 4 Mini (2.5 GB), Ministral 3B/14B, Qwen 3 4B/8B. All Q4_K_M GGUF. Metal on macOS, CUDA on Windows.

The polish step does context detection: reads the active app and URL (NSWorkspace + osascript on Mac, GetForegroundWindow on Windows) and selects a prompt accordingly. You can define custom rules keyed on app name, bundle ID, or URL regex.

Other things: - Meeting mode: background transcription to SQLite. Start before a call, stop when done. - Edit by Voice: select text, speak an instruction ("translate to English", "make this shorter"), LLM rewrites in place - Two local STT engines with 100+ languages, automatic code-switching - Optional BYOK cloud: STT via Groq/OpenAI/Deepgram/Azure, polish via OpenRouter/Groq/Gemini/SambaNova

I built this because the existing tools (Wispr Flow, SuperWhisper) are cloud-only for AI processing and subscription-based. I wanted local inference for both stages, custom prompt rules per app, and source code I could actually read.

Rust, GPLv3.

Website: https://sumivoice.com/en/?utm_source=hackernews&utm_medium=forum&utm_campaign=launch_2026q1&utm_content=show_hn

Source: https://github.com/alan890104/sumi

Linux kernel proposal to drop IPv6 as a module

InfluxDB

Human brain cells on a chip learned to play Doom in a week

Show HN: Wiggly border generator (somewhat responsive)

TCS, Google Cloud Launch Gemini Experience Centre for Manufacturing AI

America and Public Disorder

OpenBrushograph – A Brush Painting Robot

LA Cotidianidad

SlowQL – stop bad SQL before it reaches production

Cursor-goes-to-war-for-AI-coding-dominance

Ask HN: Seeking a Lobste.rs Invitation

Using AI Agents in Software Development 2026 [audio]

The Eye of the Mathematician

Making art with CSS gradients and corner-shape and skew

Author of the Cicada 3301 Mystery

Frequently-used FFmpeg recipes (2025)

Show HN: Real Browser MCP – your AI agent can see your real browser

Actuarial Warfare: How Seven Insurance Letters Closed the Strait of Hormuz

Blue Origin Starts 800,000sqft 'Project Horizon' Expansion Process

My distance from web development prepared me for the age of AI agents

Agentcontainer: A standard way to declare agent containers for your projects

Show HN: How many working days in 2026? And your income in pizzas

Show HN: Free market intelligence tool, analyze HN, find users pain points

How I Use Claude Code as a Designer at Shopify [video]

Show HN: Engram – open-source persistent memory for AI agents (Bun and SQLite)

The Complete Guide to Building Skills for Claude [pdf]

Agentic coding doesn't = technical debt

A 10% traffic spike took down a stable system in 3 minutes and 47 seconds

Show HN: This is what I Want from the Internet

Nvidia backs AI data center startup Nscale as it hits $14.6B valuation

Sumi – Open-source voice-to-text with local AI polishing