frontpage.

Show HN: Hitoku Draft – context aware local macOS assistant

https://github.com/Saladino93/hitokudraft

2•lostathome•1h ago

I am working on Hitoku Draft. An open-source, voice-first AI assistant that runs entirely locally. No cloud models, nothing leaves your machine. You press a hotkey, and you talk.

It's context-aware; it reads your screen, documents, and active app to understand what you're working on. You can ask about PDFs, reply to emails, create calendar events, use web search, all by voice.

It supports Gemma 4 and Qwen 3.5 for text generation, plus multiple STT backends (Parakeet, Whisper, Qwen3-ASR).

Examples:

- Gemma4 in action, https://www.youtube.com/watch?v=OgfI-3YjEVU

- query a pdf document, https://www.youtube.com/watch?v=ggaDhut7FnU

- reply to email, https://www.youtube.com/watch?v=QFnHXMBp1gA

- and the usual voice dictation (with optional polishing)

I currently use it a lot with Claude Code, Obsidian and Apple Notes, or just read papers.

Code: https://github.com/Saladino93/hitokudraft/tree/litert

Download of binary: https://hitoku.me/draft/ (free with code HITOKUHN2026)

I am looking for feedback. My goal is to do AI research with clients interfacing, and I thought this is a nice little experiment I could do to iterate/fail quickly.

P.S. (if anyone has tips about this)

Current Gemma4 implementation (with small models) has some problems:

- easy to hallucinate for long contexts, so had to reset it often. Tuned some parameters, but need to find a sweet spot.

- Gemma4 with LiteRT is currently fast compared to the MLX implementation of Qwen3.5 (like 3x faster on my machine when dealing with images). But it has the price of memory spikes. I believe this is because LiteRT's WebGPU backend can allocate significantly more GPU memory than the model weights alone (I got 38GB of memory taken, for the E4B~4GB model!). I guess we need to wait for Google for this.

- App size: because no official Swift package from Google yet, have to bundle some file (LiteRT dylibs) that adds ~98 MB to a previous MLX only version (total app goes from ~50 MB to ~150 MB)

If any of this bothers you: use Qwen 3.5 instead (pure MLX), or wait for the upstream fixes from Google :)

Otherwise, for the mid-term I plan to switch to a potentially slower, but safer, MLX version for Gemma4 (hopefully on the weekend).

AI ran into the cold hard reality of the legal profession

Laundry folding floor lamp for $1500

Building a Web Page That Edits Itself

Anthropic's Mythos Preview and Project Glasswing

New Mexico governor signs nation's first universal child care law

AI Frontier Model Tracker with API

Show HN: RememberMap

Show HN: Soulhunt – your digital twin is loose. capture it or someone else will

Building a Robust Documentation Agent with DigitalOcean Gradient AI Platform

The Age-Old Urge to Destroy Technology

We're Using So Much AI That Computing Firepower Is Running Out

Breaking Rohde and Schwarz AMIQ License Keys – The Hard and the Easy Way

Drawbridge: What SQL Server on Linux is built on (2021)

Building a Grow-Only Counter on a Sequentially Consistent KV Store

Breathing pattern is as unique as a fingerprint

Dummy Client

Austerity Creates Fascism

Why Context Switching Kills Deep Work and How to Fix It on Mac

Show HN: Type-level Fibonacci with a while loop in stable Rust (no const)

From Fossil to Fact: The Denisova Discovery as Science in Action [pdf]

Serenely Fast I/O Buffer (With Benchmarks) – SereneDB

Visualizing CPU Pipelining (2024)

Vercel Claude Code plugin issues are now fixed

Andreas Gursky's Banded Composition

Z.ai doubles it's coding plan prices

Enterprise developers question Claude Code's reliability for complex engineering

Ask HN: Do Agent skills make a difference?

UHM: Consciousness derived from 4 axioms – five critical exponents (paper v2)

A Step-by-Step Guide to Building a Private Cellular Network

Visualizing OpenClaw runs as spans to debug loops and token spikes