frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Agent Tinman: A FD Research Agent for Discovering AI Failures in Production

https://github.com/oliveskin/Agent-Tinman
1•oliveskin•1d ago

Comments

oliveskin•1d ago
Hi HN, I’m sharing an open-source project I’ve been building called Agent Tinman.

It’s a forward-deployed research agent designed to live alongside real AI systems and continuously:

Generate hypotheses about where models may fail

Design and run experiments in: LAB (sandboxed) SHADOW (mirrored production traffic) PRODUCTION (real users, gated)

Classify failures across: - Reasoning - Long-context behavior - Tool use - Feedback loops - Deployment & latency

Propose interventions

Simulate those interventions on real traces before deployment

Gate risky changes with optional human approval

It’s meant for teams who already run AI in production and want continuous, structured failure discovery, not just offline evals.

It’s: Open source (Apache 2.0) Python-first Designed to integrate as a sidecar via a pipeline adapter

Built around explicit modes, risk tiers (SAFE / REVIEW / BLOCK), and severity levels (S0–S4)

This is early but functional. I’d really appreciate:

Skeptical feedback

Edge cases you think would break this

Whether this solves a real problem for you or not

Repo: https://github.com/oliveskin/Agent-Tinman

Happy to answer anything technical.

Anytime Algorithm

https://en.wikipedia.org/wiki/Anytime_algorithm
1•raw_anon_1111•1m ago•0 comments

TypeSlayer – a TypeScript types performance tool [video]

https://www.youtube.com/watch?v=IP6EZXzXBzY
2•wildpeaks•6m ago•0 comments

I build a live crypto-sentiment analyzer

https://risingwave.com/blog/risingwave-python-udf-tutorial/
1•WavyPeng•6m ago•0 comments

Pebble Index

https://repebble.com/index
1•mcyc•10m ago•0 comments

Neuroscientist Doris Tsao joins Astera to lead its new neuroscience program

https://astera.org/neuroscientist-doris-tsao-joins-astera-to-lead-its-new-neuroscience-program/
1•memming•21m ago•0 comments

Parachutists told to check software after jumper dangled from a plane

https://www.theregister.com/2025/12/11/atsb_parachute_snagged_software/
2•defrost•24m ago•0 comments

Tool for analyzing GitLab SOS bundles without Elasticsearch

https://gitlab.com/gitlab-com/support/toolbox/soslab
1•s_shaik•24m ago•1 comments

A Letter from My Grandfather

https://lorn.us/posts/a-letter-from-my-grandfather/
2•atropoles•29m ago•0 comments

A Friendly Guide to Exorcising Maxwell's Demon (Paper)

https://journals.aps.org/prxquantum/abstract/10.1103/phkv-wrsd
1•mrcgnc•35m ago•0 comments

The Component Gallery

https://component.gallery/
1•handfuloflight•38m ago•0 comments

Fish Alpinism

https://triapul.cz/_/1765291397
1•todsacerdoti•41m ago•0 comments

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

https://arxiv.org/abs/2512.09742
1•bearseascape•47m ago•0 comments

Slovenia gives cash constitutional protection

https://sloveniatimes.com/45857/slovenia-gives-cash-constitutional-protection
1•walterbell•47m ago•0 comments

China's AI Power Play: Cheap Electricity from Biggest Grid

https://www.wsj.com/tech/china-ai-electricity-data-centers-d2a86935
2•perihelions•49m ago•0 comments

Portals must bend gravity [video]

https://www.youtube.com/watch?v=DydIhwLrbMk
1•chii•50m ago•0 comments

GLM-4.6V: Open-Source Multimodal Models with Native Tool Use

https://z.ai/blog/glm-4.6v
2•gmays•55m ago•0 comments

Ask HN: Why are people using Claude or ChatGPT when Gemini is free?

2•muunbo•1h ago•1 comments

Trump launches $1M 'gold card' immigration visas

https://www.bbc.com/news/articles/cj4q1lddj8go
4•e2e4•1h ago•0 comments

Is it possible to fix the "Power Law" problem in user-generated content?

https://ideavo.tripivo.co.in
2•ideavo•1h ago•1 comments

Are there Proton Drive alternatives with true client-only key handling?

2•hasanur_m•1h ago•1 comments

OpenAI (2015)

https://openai.com/index/introducing-openai/
2•vinhnx•1h ago•0 comments

Shapes Inc founders committed the cardinal sin of mass emailing by CCing

https://twitter.com/Zencep_NA/status/1998965773126218184
1•matthewsh•1h ago•1 comments

The Wild West tale of the first cow-buffalo hybrid

https://www.popsci.com/science/cow-buffalo-hybrid-history/
1•gmays•1h ago•0 comments

A list of parks around the world that are perfect to sit down and enjoy a book

https://www.placestoread.xyz/
2•animal_spirits•1h ago•0 comments

Instagram gives users control of their algorithms in new feature

https://abcnews.go.com/GMA/Living/instagram-users-control-algorithms-new-feature/story?id=128252102
2•SilverElfin•1h ago•0 comments

Oil Tanker U.S. Seized Has Faked Its Location Before, Data Shows

https://www.nytimes.com/2025/12/10/us/politics/oil-tanker-venezuela-tracking-data.html
7•jbegley•1h ago•2 comments

High Performance SSH/SCP

https://www.psc.edu/hpn-ssh-home/
2•gslin•1h ago•0 comments

Show HN: DocLet – End-to-end encrypted storage with user-owned key branches

https://doclet.app/
1•hasanur_m•1h ago•0 comments

Incomplete list of mistakes in the design of CSS

https://wiki.csswg.org/ideas/mistakes
33•OuterVale•1h ago•10 comments

Oracle Credit Risk Gauge Deteriorates After Earnings Report

https://www.bloomberg.com/news/articles/2025-12-10/oracle-credit-risk-gauge-deteriorates-after-ea...
5•zerosizedweasle•1h ago•1 comments