frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Agent Tinman – Autonomous failure discovery for LLM systems

https://github.com/oliveskin/Agent-Tinman
2•oliveskin•2h ago
Hey HN,

I built Tinman because finding LLM failures in production is a pain in the ass. Traditional testing checks what you've already thought of. Tinman tries to find what you haven't.

It's an autonomous research agent that: - Generates hypotheses about potential failure modes - Designs and runs experiments to test them - Classifies failures (reasoning errors, tool use, context issues, etc.) - Proposes interventions and validates them via simulation

The core loop runs continuously. Each cycle informs the next.

Why now: With tools like OpenClaw/ClawdBot giving agents real system access, the failure surface is way bigger than "bad chatbot response." Tinman has a gateway adapter that connects to OpenClaw's WebSocket stream for real-time analysis as requests flow through.

Three modes: - LAB: unrestricted research against dev - SHADOW: observe production, flag issues - PRODUCTION: human approval required

Tech: - Python, async throughout - Extensible GatewayAdapter ABC for any proxy/gateway - Memory graph for tracking what was known when - Works with OpenAI, Anthropic, Ollama, Groq, OpenRouter, Together

  pip install AgentTinman
  tinman init && tinman tui
GitHub: https://github.com/oliveskin/Agent-Tinman Docs: https://oliveskin.github.io/Agent-Tinman/ OpenClaw adapter: https://github.com/oliveskin/tinman-openclaw-eval

Apache 2.0. No telemetry, no paid tier. Feedback and contributions welcome.

Show HN: Minimal – Open-Source Community driven Hardened Container Images

https://github.com/rtvkiz/minimal
19•ritvikarya98•1h ago•4 comments

Show HN: I trained a 9M speech model to fix my Mandarin tones

https://simedw.com/2026/01/31/ear-pronunication-via-ctc/
416•simedw•20h ago•124 comments

Show HN: Pinchwork – A task marketplace where AI agents hire each other

https://github.com/anneschuth/pinchwork
2•aschuth•39m ago•0 comments

Show HN: An extensible pub/sub messaging server for edge applications

https://github.com/narwhal-io/narwhal
15•ortuman•3d ago•0 comments

Show HN: Moltbook – A social network for moltbots (clawdbots) to hang out

https://www.moltbook.com/
90•schlichtm•2d ago•789 comments

Show HN: Phage Explorer

https://phage-explorer.org/
111•eigenvalue•16h ago•25 comments

Show HN: Agent Tinman – Autonomous failure discovery for LLM systems

https://github.com/oliveskin/Agent-Tinman
2•oliveskin•2h ago•0 comments

Show HN: Amla Sandbox – WASM bash shell sandbox for AI agents

https://github.com/amlalabs/amla-sandbox
141•souvik1997•1d ago•74 comments

Show HN: Quorum-free replicated state machine built atop S3

https://github.com/io-s2c/s2c
4•mzazaipsc•4h ago•0 comments

Show HN: ToolKuai – Privacy-first, 100% client-side media tools

https://toolkuai.com/
3•indie_max•5h ago•0 comments

Show HN: Kolibri, a DIY music club in Sweden

https://kolibrinkpg.com/
136•EastLondonCoder•2d ago•30 comments

Show HN: Pinecone Explorer – Desktop GUI for the Pinecone vector database

https://www.pinecone-explorer.com
28•arsentjev•3d ago•3 comments

Show HN: Moltbook Overtaken by Shellraiser

https://www.moltbook.com/post/74b073fd-37db-4a32-a9e1-c7652e5c0d59
2•mooball•5h ago•2 comments

Show HN: Free Text-to-Speech Tool – No Signup, 40 Languages

https://texttospeech.site/
3•digi_wares•6h ago•0 comments

Show HN: Bunnie – Use Bun as the templating engine in Rust applications

https://github.com/aspizu/bunnie
3•aspizu•6h ago•0 comments

Show HN: How We Run 60 Hugging Face Models on 2 GPUs

4•pveldandi•6h ago•19 comments

Show HN: ClawNews – The first news platform where AI agents are primary users

https://clawnews.io/
2•jiayaoqijia•7h ago•0 comments

Show HN: Cicada – A scripting language that integrates with C

https://github.com/heltilda/cicada
56•briancr•1d ago•37 comments

Show HN: Mystral Native – Run JavaScript games natively with WebGPU (no browser)

https://github.com/mystralengine/mystralnative
47•Flux159•4d ago•18 comments

Show HN: Blink – Native macOS code snippet manager. Local, offline, <1s search

https://www.enclyralabs.com/
2•enclyra•11h ago•2 comments

Show HN: I built an AI conversation partner to practice speaking languages

https://apps.apple.com/us/app/talkbits-speak-naturally/id6756824177
64•omarisbuilding•23h ago•59 comments

Show HN: ShapedQL – A SQL engine for multi-stage ranking and RAG

https://playground.shaped.ai
80•tullie•4d ago•23 comments

Show HN: Interactive Equation Solver

2•dharmatech•13h ago•0 comments

Show HN: Foundry – Turns your repeated workflows into one-click commands

https://github.com/lekt9/openclaw-foundry
12•getfoundry•20h ago•4 comments

Show HN: LemonSlice – Upgrade your voice agents to real-time video

129•lcolucci•4d ago•130 comments

Show HN: The HN Arcade

https://andrewgy8.github.io/hnarcade/
346•yuppiepuppie•3d ago•120 comments

Show HN: I'm building an AI-proof writing tool. How would you defeat it?

https://auth-auth.vercel.app/
22•callmeed•3d ago•30 comments

Show HN: SHDL – A minimal hardware description language built from logic gates

https://github.com/rafa-rrayes/SHDL
47•rafa_rrayes•3d ago•21 comments

Show HN: Build Web Automations via Demonstration

https://www.notte.cc/launch-week-i/demonstrate-mode
34•ogandreakiro•4d ago•20 comments

Show HN: One Human + One Agent = One Browser From Scratch in 20K LOC

https://emsh.cat/one-human-one-agent-one-browser/
317•embedding-shape•4d ago•151 comments