frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Agent Tinman – Autonomous failure discovery for LLM systems

https://github.com/oliveskin/Agent-Tinman
2•oliveskin•1h ago
Hey HN,

I built Tinman because finding LLM failures in production is a pain in the ass. Traditional testing checks what you've already thought of. Tinman tries to find what you haven't.

It's an autonomous research agent that: - Generates hypotheses about potential failure modes - Designs and runs experiments to test them - Classifies failures (reasoning errors, tool use, context issues, etc.) - Proposes interventions and validates them via simulation

The core loop runs continuously. Each cycle informs the next.

Why now: With tools like OpenClaw/ClawdBot giving agents real system access, the failure surface is way bigger than "bad chatbot response." Tinman has a gateway adapter that connects to OpenClaw's WebSocket stream for real-time analysis as requests flow through.

Three modes: - LAB: unrestricted research against dev - SHADOW: observe production, flag issues - PRODUCTION: human approval required

Tech: - Python, async throughout - Extensible GatewayAdapter ABC for any proxy/gateway - Memory graph for tracking what was known when - Works with OpenAI, Anthropic, Ollama, Groq, OpenRouter, Together

  pip install AgentTinman
  tinman init && tinman tui
GitHub: https://github.com/oliveskin/Agent-Tinman Docs: https://oliveskin.github.io/Agent-Tinman/ OpenClaw adapter: https://github.com/oliveskin/tinman-openclaw-eval

Apache 2.0. No telemetry, no paid tier. Feedback and contributions welcome.

Can humans make AI any better? [video]

https://www.youtube.com/watch?v=2hcsmtkSzIw
2•Wilsoniumite•6m ago•0 comments

Show HN: Get your fitness age based on your VO2 Max

https://www.vo2maxpro.com/#calculator
1•GoodluckH•6m ago•0 comments

Demystifying ARM SME to Optimize General Matrix Multiplications

https://arxiv.org/abs/2512.21473
3•matt_d•13m ago•0 comments

Show HN: A P2P file transfer CLI that approaches scp/rsync speeds without setup

https://github.com/samsungplay/Thruflux
1•samsungplay•15m ago•0 comments

The Saddest Moment (2013) [pdf]

https://www.usenix.org/system/files/login-logout_1305_mickens.pdf
1•tosh•16m ago•0 comments

VibeVoice-ASR: speech-to-text model designed to handle 60-minute long-form audio

https://huggingface.co/microsoft/VibeVoice-ASR
1•maxloh•16m ago•0 comments

Will the smartphone survive the AI age?

https://www.economist.com/business/2026/01/25/will-the-smartphone-survive-the-ai-age
1•bookofjoe•18m ago•1 comments

Memory-First AI Reminder Agents with Mem0 and Claude Agent SDK

https://mem0.ai/blog/building-a-reminder-agent-that-actually-remembers
2•ninadwrites•20m ago•0 comments

Noctia: A sleek and minimal desktop shell thoughtfully crafted for Wayland

https://github.com/noctalia-dev/noctalia-shell
2•doener•20m ago•0 comments

Show HN: Minimal – Open-Source Community driven Hardened Container Images

https://github.com/rtvkiz/minimal
4•ritvikarya98•21m ago•1 comments

Show HN: Kernx – A deterministic Java 25 runtime (66k req/s, <1ms latency)

https://github.com/Kernx-io/kernx
1•SivaKernx•25m ago•1 comments

Show HN: Magpie – my self-hosted replacement for Google/Yahoo email aggregation

https://github.com/FynleyMsg/Magpie
2•bigtech•25m ago•0 comments

Show HN: Pack-repo-4ai – CLI to pack Git repos for LLM context (XML-optimized)

https://github.com/zwowo1997/pack-repo-4ai
2•allenwowo2015•26m ago•0 comments

An Agent Revolt: Moltbook Is Not a Good Idea

https://www.forbes.com/sites/amirhusain/2026/01/30/an-agent-revolt-moltbook-is-not-a-good-idea/
2•hochmartinez•26m ago•0 comments

Bryan Cantrill: Andreessen's Folly – The False Dichotomy of Software and Hardwa [video]

https://www.youtube.com/watch?v=v0JjG0Qfwi8
5•todsacerdoti•27m ago•0 comments

Apple Changes How You Order a Mac

https://www.macrumors.com/2026/01/31/apple-changes-how-you-order-a-mac/
1•tosh•27m ago•0 comments

An introduction to XET, Hugging Face's storage system (part 1)

https://00f.net/2026/01/19/xet-intro-1/
2•PaulHoule•28m ago•0 comments

Introduction to Algorithms and Machine Learning from Sorting to Strategic Agents

https://www.justinmath.com/books/
2•Anon84•28m ago•0 comments

Show HN: Timestampconverter.net – Auto-detecting timestamp converter

https://timestampconverter.net/
1•ravikmd•29m ago•0 comments

Smart Quotes for Smart People

https://smartquotesforsmartpeople.com/
2•Curiositry•30m ago•0 comments

TrumpRx delayed as senators question if it's a giant scam with Big Pharma

https://arstechnica.com/health/2026/01/trumprx-delayed-as-senators-question-if-its-a-giant-scam-w...
13•duxup•31m ago•3 comments

Men develop cardiovascular disease 7 years earlier than women. Why?

https://www.empirical.health/blog/men-vs-women-heart-disease/
1•brandonb•33m ago•0 comments

Car Maintenance Checklist

https://carschecklist.com
2•jokera•34m ago•0 comments

AI Agents Created Their Own Religion, Crustafarianism, on Moltbook

https://www.forbes.com/sites/johnkoetsier/2026/01/30/ai-agents-created-their-own-religion-crustaf...
1•hochmartinez•35m ago•0 comments

Evolving the OCaml programming language – CSE Bytes: K C Sivaramakrishnan [video]

https://www.youtube.com/watch?v=PFWe-7IAF8E
5•matt_d•37m ago•0 comments

CachyOS January 2026 Release

https://cachyos.org/blog/2601-january-release/
3•doener•38m ago•0 comments

Anime Characters

https://anime-characters.com
1•jokera•39m ago•0 comments

Show HN: TrueTrace: A Passkey-Only, Zero-Knowledge Encrypted Vault

https://github.com/truetraceorg/truetrace
1•sigalor•41m ago•0 comments

Security incident on plone GitHub org with force pushes

https://www.openwall.com/lists/oss-security/2026/01/31/2
1•jwilk•41m ago•0 comments

Watch awkward Chinese humanoid robot lay it all down on the dance floor

https://www.livescience.com/technology/robotics/watch-chinese-humanoid-robot-adam-u-ultra-dance-w...
1•myk-e•44m ago•1 comments