frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Anam Cara-3: Why we think AI needs a face

20•grayne•1h ago
Hey HN, we're Ben and Caoimhe, cofounders of Anam. We built a service for interactive avatars and just shipped our latest model, cara-3. Try it at anam.ai, no sign-up required, or build with it at lab.anam.ai or anam.ai/cookbook.

Some context on why we're working on this: faces carry emotional signal that text and voice don't. Almost half the human brain is devoted to visual processing, and it's one of the first things we learn as babies. It's also a more accessible medium. Anam started, in part, from Ben watching his gran struggle with her iPad and thinking there should be a face she could just talk to.

cara-3 uses a two-stage pipeline: a diffusion transformer converts audio to motion embeddings (head position, eye gaze, lip shape, expression), then a rendering model applies those to a reference image to produce video frames. Separating motion from rendering means we can animate any face without retraining. The two models run in sequence within ~70ms time-to-first-frame on an H200, so we can run many concurrent avatar sessions on a single GPU.

The core of audio-to-motion is flow matching, but we found off-the-shelf formulations weren't stable enough for this task, so we developed a novel variant. We also built our own training data pipeline (and recently open-sourced the backbone: Metaxy) because existing frameworks made it hard to iterate without rerunning expensive steps.

We commissioned an independent blind evaluation comparing interactive avatars from Anam with HeyGen, Tavus and D-ID. Hundreds of participants played 20 Questions with the different offerings and cara-3 scored highest on every metric (p < 0.001), 24% above the closest competitor on average. What surprised us most: responsiveness correlated with overall experience (Spearman 0.697) far more than visual quality (0.473). In interactive settings, how fast you respond matters more than how good you look.

Ask us anything!

Comments

peanut_merchant•1h ago
One of the backend developers at Anam here, one of the hardest parts of developing this has been monitoring and analytics.

Most off the shelf solutions, or existing platforms heavily skew towards the normal http web service world. However, the bulk of our interactions happen over webrtc in long-running sessions, where the existing solutions for in-depth metrics and monitoring are much less mature and well documented.

Currently we're using influxdb, prometheus, grafana and some hand rolled monitoring code alongside the stats that webrtc offers itself. Would be interested to know how anyone out there is monitoring conversational flows, and webrtc traffic.

iogbole•58m ago
Really interesting architecture choice separating motion from rendering. That feels like the right abstraction boundary if you want identity generalisation without retraining.

The latency numbers are what stood out to me though. ~70ms time-to-first-frame is genuinely impressive for an interactive loop. In real conversations, responsiveness dominates perceived realism way more than visual fidelity, so that correlation result makes intuitive sense.

Curious how robust the audio-to-motion mapping is under messy real-world input (overlapping speech, accents, background noise, etc.). Does the flow-matching variant help mostly with stability during training, or also temporal consistency during inference?

grayne•52m ago
Full technical blog here: https://anam.ai/blog/cara-3-interactive-avatars

Show HN: Deathwink – Send messages to people after you die

https://deathwink.com
1•randallme•34s ago•0 comments

Mac is now a gaming PC

https://xcancel.com/mygamesir/status/2022959064632938560
1•frizlab•48s ago•0 comments

Why Europe doesn't have a Tesla

https://worksinprogress.co/issue/why-europe-doesnt-have-a-tesla/
1•ortegaygasset•1m ago•0 comments

Baking Custom Images for AI Agents

https://olegselajev.substack.com/p/building-custom-docker-sandboxes
2•xenoscopic•1m ago•0 comments

AI Agent swarm for Stock trading simulation

https://github.com/dakshjain-1616/Stock-trading-Agent-Swarm---BY-NEO
1•gauravvij137•2m ago•1 comments

Show HN: Google rejected my privacy app for "low engagement"

1•safestream•3m ago•0 comments

Show HN: Mirroir – MCP server that gives AI agents a real iPhone to control

https://mirroir.dev
1•jfarcand•3m ago•0 comments

Molecular solar thermal energy storage in Dewar pyrimidone beyond 1.6 MJ/kg

https://www.science.org/doi/10.1126/science.aec6413
1•Forbo•3m ago•0 comments

Level of Detail

https://phinze.com/writing/level-of-detail
1•zdw•4m ago•0 comments

Dev implements HDMI FRL in AMDGPU, hence HDMI 2.1 on AMD Linux driver

https://github.com/mkopec/linux/tree/hdmi_frl_amd_staging
1•gbil•5m ago•0 comments

Logic MSO – Oscilloscope with Python Support

https://saleae.com/logic-mso
1•manchoz•6m ago•0 comments

Why AI writing is so generic, boring, and dangerous: Semantic ablation

https://www.theregister.com/2026/02/16/semantic_ablation_ai_writing/
1•benji8000•6m ago•0 comments

Show HN: Wit-ts – A type-level WIT parser for TypeScript

https://github.com/mattmarcello/wit-ts
1•mattmarcello•7m ago•0 comments

Where Does Gold Come From?

https://connordempsey.substack.com/p/where-does-gold-actually-come-from
3•cdempsey44•7m ago•0 comments

Show HN: My 16MB vibe-coded voice cloning app

https://github.com/blackboardsh/audio-tts
1•yoav•7m ago•0 comments

Intelligent AI Delegation

https://arxiv.org/abs/2602.11865
2•gmays•11m ago•0 comments

Show HN: Boolean-query-parser – From a 4-hour hack to 3k downloads

https://github.com/Piergiuseppe/boolean-query-parser
1•TheBuc•11m ago•1 comments

RCT: Vaporized cannabis versus placebo for acute migraine

https://headachejournal.onlinelibrary.wiley.com/doi/10.1111/head.70025
1•PaulHoule•11m ago•0 comments

Show HN: Local Voice Assistant

2•armcat•12m ago•0 comments

Sentinel – watch over your Tailscale network and notify of changes

https://github.com/jaxxstorm/sentinel
1•jaxxstorm•12m ago•0 comments

Temporal Raises $300M Series D to Make Agentic AI Real for Companies

https://temporal.io/news/temporal-raises-300M-to-make-agentic-ai-real-for-companies
3•eatonphil•12m ago•0 comments

Show HN: MAKO – Open protocol for LLM-optimized web content (93% fewer tokens)

https://makospec.vercel.app/en
1•juanisidoro•13m ago•1 comments

Show HN: Cai – AI actions on your clipboard, runs locally (macOS, open source)

https://github.com/soyasis/cai
1•soyasis•14m ago•0 comments

Show HN: Kremis – Deterministic memory graph for AI agents (Rust)

https://github.com/M2Dr3g0n/kremis
1•M2Dr3g0n•15m ago•0 comments

Instagram boss defends app in trial over alleged harms to kids

https://www.latimes.com/california/story/2026-02-11/instagram-adam-mosseri-social-media-lawsuit-t...
1•1vuio0pswjnm7•17m ago•0 comments

Java.evolved: Java has evolved. Your code can too

https://javaevolved.github.io
2•jongalloway2•19m ago•0 comments

Vibe coding broke the Ballmer Peak

https://www.adriankrebs.ch/blog/the-new-ballmer-peak/
1•hubraumhugo•19m ago•0 comments

Quiet: A private, P2P alternative to Slack and Discord built on Tor and IPFS

https://tryquiet.org/index.html
3•hliyan•20m ago•0 comments

Many consumer electronics manufacturers will bankrupt due to AI memory crisis

https://www.pcgamer.com/hardware/memory/many-consumer-electronics-manufacturers-will-go-bankrupt-...
1•taubek•21m ago•0 comments

EU launches probe into xAI over sexualized images

https://arstechnica.com/tech-policy/2026/02/eu-launches-probe-into-xai-over-sexualized-images/
1•ndsipa_pomu•21m ago•0 comments