frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Flakestorm – Chaos engineering for AI agents (local-first, open source)

6•frankhumarang•1mo ago
Hi everyone,

I’ve been working on an open-source tool called Flakestorm to test the reliability of AI agents before they hit production.

Most agent testing today focuses on eval scores or happy-path prompts. In practice, agents tend to fail in more mundane ways: typos, tone shifts, long context, malformed input, or simple prompt injections — especially when running on smaller or local models. Flakestorm applies chaos-engineering ideas to agents. Instead of testing one prompt, it takes a “golden prompt”, generates adversarial mutations (semantic variations, noise, injections, encoding edge cases), runs them against your agent, and produces a robustness score plus a detailed HTML report showing what broke.

Key points: Local-first (uses Ollama for mutation generation)

Tested with Qwen / Gemma / other small models Works against HTTP agents, LangChain chains, or Python callables No cloud or API keys required This started as a way to debug my own agents after seeing them behave unpredictably under real user input. I’m still early and trying to understand how useful this is outside my own workflow.

I’d really appreciate feedback on: Whether this overlaps with how you test agents today Failure modes you’ve seen that aren’t covered Whether “chaos testing for agents” is a useful framing, or if this should be thought of differently Repo: https://github.com/flakestorm/flakestorm Docs are admittedly long.

Thanks for taking a look.

Show HN: PaySentry – Open-source control plane for AI agent payments

https://github.com/mkmkkkkk/paysentry
1•mkyang•56s ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
1•ShinyaKoyano•10m ago•0 comments

The Crumbling Workflow Moat: Aggregation Theory's Final Chapter

https://twitter.com/nicbstme/status/2019149771706102022
1•SubiculumCode•14m ago•0 comments

Pax Historia – User and AI powered gaming platform

https://www.ycombinator.com/launches/PMu-pax-historia-user-ai-powered-gaming-platform
2•Osiris30•15m ago•0 comments

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
1•ambitious_potat•21m ago•0 comments

Scams, Fraud, and Fake Apps: How to Protect Your Money in a Mobile-First Economy

https://blog.afrowallet.co/en_GB/tiers-app/scams-fraud-and-fake-apps-in-africa
1•jonatask•21m ago•0 comments

Porting Doom to My WebAssembly VM

https://irreducible.io/blog/porting-doom-to-wasm/
1•irreducible•22m ago•0 comments

Cognitive Style and Visual Attention in Multimodal Museum Exhibitions

https://www.mdpi.com/2075-5309/15/16/2968
1•rbanffy•23m ago•0 comments

Full-Blown Cross-Assembler in a Bash Script

https://hackaday.com/2026/02/06/full-blown-cross-assembler-in-a-bash-script/
1•grajmanu•28m ago•0 comments

Logic Puzzles: Why the Liar Is the Helpful One

https://blog.szczepan.org/blog/knights-and-knaves/
1•wasabi991011•40m ago•0 comments

Optical Combs Help Radio Telescopes Work Together

https://hackaday.com/2026/02/03/optical-combs-help-radio-telescopes-work-together/
2•toomuchtodo•45m ago•1 comments

Show HN: Myanon – fast, deterministic MySQL dump anonymizer

https://github.com/ppomes/myanon
1•pierrepomes•51m ago•0 comments

The Tao of Programming

http://www.canonical.org/~kragen/tao-of-programming.html
1•alexjplant•52m ago•0 comments

Forcing Rust: How Big Tech Lobbied the Government into a Language Mandate

https://medium.com/@ognian.milanov/forcing-rust-how-big-tech-lobbied-the-government-into-a-langua...
2•akagusu•52m ago•0 comments

PanelBench: We evaluated Cursor's Visual Editor on 89 test cases. 43 fail

https://www.tryinspector.com/blog/code-first-design-tools
2•quentinrl•54m ago•2 comments

Can You Draw Every Flag in PowerPoint? (Part 2) [video]

https://www.youtube.com/watch?v=BztF7MODsKI
1•fgclue•1h ago•0 comments

Show HN: MCP-baepsae – MCP server for iOS Simulator automation

https://github.com/oozoofrog/mcp-baepsae
1•oozoofrog•1h ago•0 comments

Make Trust Irrelevant: A Gamer's Take on Agentic AI Safety

https://github.com/Deso-PK/make-trust-irrelevant
6•DesoPK•1h ago•3 comments

Show HN: Sem – Semantic diffs and patches for Git

https://ataraxy-labs.github.io/sem/
1•rs545837•1h ago•1 comments

Hello world does not compile

https://github.com/anthropics/claudes-c-compiler/issues/1
34•mfiguiere•1h ago•20 comments

Show HN: ZigZag – A Bubble Tea-Inspired TUI Framework for Zig

https://github.com/meszmate/zigzag
3•meszmate•1h ago•0 comments

Metaphor+Metonymy: "To love that well which thou must leave ere long"(Sonnet73)

https://www.huckgutman.com/blog-1/shakespeare-sonnet-73
1•gsf_emergency_6•1h ago•0 comments

Show HN: Django N+1 Queries Checker

https://github.com/richardhapb/django-check
1•richardhapb•1h ago•1 comments

Emacs-tramp-RPC: High-performance TRAMP back end using JSON-RPC instead of shell

https://github.com/ArthurHeymans/emacs-tramp-rpc
1•todsacerdoti•1h ago•0 comments

Protocol Validation with Affine MPST in Rust

https://hibanaworks.dev
1•o8vm•1h ago•1 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
5•gmays•1h ago•0 comments

Show HN: Zest – A hands-on simulator for Staff+ system design scenarios

https://staff-engineering-simulator-880284904082.us-west1.run.app/
1•chanip0114•1h ago•1 comments

Show HN: DeSync – Decentralized Economic Realm with Blockchain-Based Governance

https://github.com/MelzLabs/DeSync
1•0xUnavailable•1h ago•0 comments

Automatic Programming Returns

https://cyber-omelette.com/posts/the-abstraction-rises.html
1•benrules2•1h ago•1 comments

Why Are There Still So Many Jobs? The History and Future of Workplace Automation [pdf]

https://economics.mit.edu/sites/default/files/inline-files/Why%20Are%20there%20Still%20So%20Many%...
2•oidar•1h ago•0 comments