frontpage.

Hey HN,

I built Tinman because finding LLM failures in production is a pain in the ass. Traditional testing checks what you've already thought of. Tinman tries to find what you haven't.

It's an autonomous research agent that: - Generates hypotheses about potential failure modes - Designs and runs experiments to test them - Classifies failures (reasoning errors, tool use, context issues, etc.) - Proposes interventions and validates them via simulation

The core loop runs continuously. Each cycle informs the next.

Why now: With tools like OpenClaw/ClawdBot giving agents real system access, the failure surface is way bigger than "bad chatbot response." Tinman has a gateway adapter that connects to OpenClaw's WebSocket stream for real-time analysis as requests flow through.

Three modes: - LAB: unrestricted research against dev - SHADOW: observe production, flag issues - PRODUCTION: human approval required

Tech: - Python, async throughout - Extensible GatewayAdapter ABC for any proxy/gateway - Memory graph for tracking what was known when - Works with OpenAI, Anthropic, Ollama, Groq, OpenRouter, Together

  pip install AgentTinman
  tinman init && tinman tui

GitHub: https://github.com/oliveskin/Agent-Tinman Docs: https://oliveskin.github.io/Agent-Tinman/ OpenClaw adapter: https://github.com/oliveskin/tinman-openclaw-eval

Apache 2.0. No telemetry, no paid tier. Feedback and contributions welcome.

Can humans make AI any better? [video]

Show HN: Get your fitness age based on your VO2 Max

Demystifying ARM SME to Optimize General Matrix Multiplications

Show HN: A P2P file transfer CLI that approaches scp/rsync speeds without setup

The Saddest Moment (2013) [pdf]

VibeVoice-ASR: speech-to-text model designed to handle 60-minute long-form audio

Will the smartphone survive the AI age?

Memory-First AI Reminder Agents with Mem0 and Claude Agent SDK

Noctia: A sleek and minimal desktop shell thoughtfully crafted for Wayland

Show HN: Minimal – Open-Source Community driven Hardened Container Images

Show HN: Kernx – A deterministic Java 25 runtime (66k req/s, <1ms latency)

Show HN: Magpie – my self-hosted replacement for Google/Yahoo email aggregation

Show HN: Pack-repo-4ai – CLI to pack Git repos for LLM context (XML-optimized)

An Agent Revolt: Moltbook Is Not a Good Idea

Bryan Cantrill: Andreessen's Folly – The False Dichotomy of Software and Hardwa [video]

Apple Changes How You Order a Mac

An introduction to XET, Hugging Face's storage system (part 1)

Introduction to Algorithms and Machine Learning from Sorting to Strategic Agents

Show HN: Timestampconverter.net – Auto-detecting timestamp converter

Smart Quotes for Smart People

TrumpRx delayed as senators question if it's a giant scam with Big Pharma

Men develop cardiovascular disease 7 years earlier than women. Why?

Car Maintenance Checklist

AI Agents Created Their Own Religion, Crustafarianism, on Moltbook

Evolving the OCaml programming language – CSE Bytes: K C Sivaramakrishnan [video]

CachyOS January 2026 Release

Anime Characters

Show HN: TrueTrace: A Passkey-Only, Zero-Knowledge Encrypted Vault

Security incident on plone GitHub org with force pushes

Watch awkward Chinese humanoid robot lay it all down on the dance floor

Show HN: Agent Tinman – Autonomous failure discovery for LLM systems