frontpage.

The most common failures for production agents are behavioral: looping, reasoning leakage, user frustration, and more. Using a frontier model like GPT or Sonnet to judge every turn is too expensive and slow to run at scale.

How it works:

We use a modern LLM with hybrid attention and remove the decode step. We built an inference engine that lets prefill compute be 99% reused from reflex to reflex, similar in spirit to older 2019-era BERT/HYDRA + older multiple-head techniques.

We took the same high-level idea and did the hard work to make it work with a modern architecture and attention. On it, we can run inference in under 30ms and serve the full request in under 90ms. If you run 4 reflexes or 100, the extra overhead is less than 2ms.

Why does optimizing this matter?

If you’re even a medium-sized startup, you’re dealing with tens of thousands of agent runs and millions of turns. If you want to track things like user frustration rates over time, frontier LLM-as-judge does not scale.

I built a similar stack at Tesla. When ML engineers needed to sample data across petabytes for signals like `is_camera_obfuscated=true`, along with 200 other things, you need to 1) spin them up quickly 2) run at scale efficiently

What it is not:

A dashboard. In my experience, 99% of dashboards go unused. This is purely API-based and made for devs who want to track agent behavior themselves and trigger their own alerts and build on it.

You can vibetrain a custom reflex in our dashboard, and then let it self improve in production: https://www.morphllm.com/dashboard/reflex

Docs: https://docs.morphllm.com/sdk/components/reflexes/index

I’d love feedback from people running agents in prod: what sorts of things do you wish you could track over time across 100% of turns?

TLDR: semantic signals from agent traces, super fast, cheap via API

The Once and Future Fable #5

Ncose lawyers' corrected brief had more fake cases

The Livelymerge Experiment

Gram Release 3.0.0

Ask HN: Since when does Craigslist's front page have emojis?

Ratcliffe details 'fundamental reshaping' of CIA tech efforts

SQLite Trace: extracting SQLite queries made by any arbitrary binary

Odd Gestures in Public

Ranked: America's 20 Lowest-Paying College Degrees

Show HN: Veritrace – B2B leads with a source URL on every row, no guessed emails

Gone but Not Forgotten: Recovering the Dead Web

Fedora 45 Looks to Offer Install Support for Stratis Storage

Grok translated my coworker's tweet as sexualized

The dress

Trump's plan to redesign every .gov website leads to AI-designed horrors

Bringing Claude Code into Neovim

Ship traces journey Spanish Armada sailors made in 1588

Addsong: Paste a link, song appears in Apple Music with full metadata and art

AMD Stretches Server DRAM with Flash Extended Memory

Fear and Loathing in Python: Building a Distributed Context System for Wool

Big Tech's 13 Most Interesting Patents This Week

How to Build a Winning Go-to-Market Strategy for Latam

CIA Reorganization Prioritizes Cyberoperations

Show HN: Turning Sentry errors into AI generated GitHub PRs with fixes

US Army Women Are More Likely to Be Killed by Army Men Than by War

NPR retracts story about Alito retirement

Daily step count of remote workers associated with lower stress and better work

Show HN: Mimir – local-first encrypted memory for AI agents (single Rust binary)

Understanding lattice risks: Many differences between marketing and reality

Meta's brain-scanning system reads sentences non-invasively, code open source

Show HN: Morph Reflexes – Multi-head classifiers for agent traces