Tried to benchmark Google's new on-device dictation model and basically couldn't

https://www.getonit.ai/eloquent-review

1•telenardo•1h ago

Comments

telenardo•1h ago

I tried to benchmark Google’s new on-device dictation app (Eloquent) and basically couldn’t. It drops about half of my dictations.

Background: Google shipped a new fully‑local dictation app yesterday with proprietary new models, so I was excited to benchmark it against the leading open models (Qwen3‑ASR, NVIDIA Parakeet V3, etc).

I have a harness that drives a dictation app by playing an audio file through a virtual input device and captures the app’s pasted output, so I can compare different apps on the same clips. I also have ~1,500 manually corrected clips from my daily engineering work.

What happened: I couldn’t get a clean eval, because ~half of dictations come back missing a large number of words. A clip of with ~20+ words routinely returns just 5-10 words. I assumed my harness was broken, so I used the app manually, speaking slowly and clearly into the mic. Same thing: roughly half the time, I only get a small fraction of what I actually said.

When Eloquent did return a complete transcript (15 of 50 tests), its accuracy was actually competitive ~24% WER vs ~21% for Qwen3-ASR on the same clips. The problem isn't the recognition. It's that for most dictations, you don't get your words back at all!

My theory: The transcriber is a chat‑style AI model, and chat models sometimes reply about your audio instead of transcribing it.

To test this, I ran Gemma 3n (Google's open model from the same family) directly on the same clips bypassing the Eloquent app. On 11 / 44 attempts it responded something like “I’m sorry, I can’t transcribe this,” instead of producing a transcript. Gemma had the same ~60 % word error rate as Eloquent. My guess is that Eloquent’s model has the same issue, the app just hides it.

Has anyone been able to get good results with this app? Or are others seeing this issue?

Disclosure: I build a competitive local dictation app, so not a neutral party!

702 Ultimatum: Warrant Requirement or Bust

Loop-Harness

Show HN: Obsidian Image Upload Toolkit – upload images to 10 cloud providers

Recovering attention during heavy study efforts

Nexus Q Revival

Closing the Loop: One Impressive AI Coding Agent Session for Y-Combinator

A smarter approach to designing metamaterials

Beneath The Enshittification, Something Amazing Is Growing

Unix GC Remastered

LaserWriter Seeds

A Way to Challenge the Groupthink of Scholarly Journals

Plinko Input – type a code by dropping balls

The theory taking the rich by storm: China funds data center haters

Show HN: I let an AI C-suite run my company – starter kit from the inside

Show HN: Llmbuffer – Python library for cache-optimized LLM conversation history

Show HN: Vatnode – EU VAT validation REST API with national registry fallback

/dmg – a Claude Code skill for persistent memory and session sync

Nuts – pip/NPM for Java with first-class workspaces and JDK provisioning (9y+)

Show HN: Magenta Real-Time Music Generation on iPhone, Without the GPU

Australia's Social Media Ban Is Floundering. Can It Still Help Younger Kids?

Propel: Breaking the Solver Bottleneck in Task-Generator RL

Widespread attacks on Iran have begun [video][50 mins]

Patch for critical vulnerability in p2pool (Monero) to be released on 2026-06-13

Did a Chatbot Write a Prize-Winning Story? Does It Matter?

Could Switzerland Become the First Country to Cap Its Population?

Oracle beats on earnings, but stock drops on plans to raise another $20B

Next 100 Days: XBOX Reset

Feedback Alignment in Self-Distillation

US President says 'I love the inflation'

Trump baffles Wall Street with top dealmaker praise for Citi