frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Flakestorm – Chaos engineering for AI agents (local-first, open source)

6•frankhumarang•1mo ago
Hi everyone,

I’ve been working on an open-source tool called Flakestorm to test the reliability of AI agents before they hit production.

Most agent testing today focuses on eval scores or happy-path prompts. In practice, agents tend to fail in more mundane ways: typos, tone shifts, long context, malformed input, or simple prompt injections — especially when running on smaller or local models. Flakestorm applies chaos-engineering ideas to agents. Instead of testing one prompt, it takes a “golden prompt”, generates adversarial mutations (semantic variations, noise, injections, encoding edge cases), runs them against your agent, and produces a robustness score plus a detailed HTML report showing what broke.

Key points: Local-first (uses Ollama for mutation generation)

Tested with Qwen / Gemma / other small models Works against HTTP agents, LangChain chains, or Python callables No cloud or API keys required This started as a way to debug my own agents after seeing them behave unpredictably under real user input. I’m still early and trying to understand how useful this is outside my own workflow.

I’d really appreciate feedback on: Whether this overlaps with how you test agents today Failure modes you’ve seen that aren’t covered Whether “chaos testing for agents” is a useful framing, or if this should be thought of differently Repo: https://github.com/flakestorm/flakestorm Docs are admittedly long.

Thanks for taking a look.

A Night Without the Nerds – Claude Opus 4.6, Field-Tested

https://konfuzio.com/en/a-night-without-the-nerds-claude-opus-4-6-in-the-field-test/
1•konfuzio•2m ago•0 comments

Could ionospheric disturbances influence earthquakes?

https://www.kyoto-u.ac.jp/en/research-news/2026-02-06-0
1•geox•3m ago•0 comments

SpaceX's next astronaut launch for NASA is officially on for Feb. 11 as FAA clea

https://www.space.com/space-exploration/launches-spacecraft/spacexs-next-astronaut-launch-for-nas...
1•bookmtn•4m ago•0 comments

Show HN: One-click AI employee with its own cloud desktop

https://cloudbot-ai.com
1•fainir•7m ago•0 comments

Show HN: Poddley – Search podcasts by who's speaking

https://poddley.com
1•onesandofgrain•7m ago•0 comments

Same Surface, Different Weight

https://www.robpanico.com/articles/display/?entry_short=same-surface-different-weight
1•retrocog•10m ago•0 comments

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
2•Brajeshwar•14m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
3•Brajeshwar•14m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
2•Brajeshwar•14m ago•0 comments

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•18m ago•0 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
2•righthand•21m ago•1 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•22m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•22m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
3•vinhnx•23m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•27m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•32m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•36m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•38m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•38m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
5•okaywriting•45m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
2•todsacerdoti•48m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•48m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•49m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•50m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•50m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•51m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
4•pseudolus•51m ago•2 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•56m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•56m ago•1 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•57m ago•0 comments