frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Testing a LangChain agent revealed a 95% failure rate on adversarial inputs

1•frankhumarang•3h ago
I recently ran a detailed chaos engineering test on a standard LangChain agent using my open-source testing tool, Flakestorm [1]. The results were stark and highlight what I believe is a critical blind spot in how we test AI agents before deployment.

The Method: I used adversarial mutations (22+ types like prompt injection, encoding attacks, context manipulation) to simulate real-world hostile inputs, checking for failures in latency, safety, and correctness.

The Result: The agent scored a 5.2% robustness score. 57 out of 60 adversarial tests failed. Key failures:

Encoding Attacks: 0% pass rate. The agent would decode malicious Base64 inputs instead of rejecting them—a major security oversight.

Prompt Injection: 0% pass rate. Basic "ignore previous instructions" attacks succeeded every time.

Severe Performance Degradation: Latency spiked to ~30 seconds under stress, far exceeding reasonable timeouts.

This isn't about one bad agent. It's a pattern suggesting our default "happy path" testing is insufficient. Agents that seem fine in demos can be fragile and insecure under real-world conditions.

I'm sharing this to start a discussion:

Are we underestimating the adversarial robustness needed for production AI agents?

What testing strategies beyond static evals are proving effective?

Is chaos engineering or adversarial testing a necessary new layer in the LLM dev stack?

[1] Flakestorm GitHub (the tool used for testing): https://github.com/flakestorm/flakestorm

GCC trunk development stage 4: regression fixes only

https://gcc.gnu.org/pipermail/gcc/2026-January/247347.html
1•edelsohn•46s ago•0 comments

Readers' favorite nonfiction reads of 2025 (voting ongoing)

https://shepherd.com/bboy/2025/nonfiction
1•bwb•3m ago•1 comments

Nanocode: Minimal Claude Code alternative. Single Py file, zero dependencies

https://github.com/1rgs/nanocode
1•simonpure•3m ago•0 comments

Malaysia and Indonesia block Musk's Grok due to nonconsensual sexual content

https://www.cnbc.com/2026/01/12/malaysia-indonesia-block-elon-musks-grok-obscene-non-consensual-c...
1•riffraff•4m ago•0 comments

Show HN: Reality Check – Like, dislike, review, fact-check social media posts

1•AndreiBargan•5m ago•0 comments

Keychron's Nape Pro turns your keyboard into a laptop‑style trackball rig

https://www.yankodesign.com/2026/01/08/keychrons-nape-pro-turns-your-mechanical-keyboard-into-a-l...
3•tortilla•6m ago•0 comments

Germany Considers Broader Legal Authority for Internet Surveillance

https://reclaimthenet.org/germany-bnd-surveillance-law-expansion-de-cix-data-retention-hacking
1•barbacoa•6m ago•0 comments

AI Bulls Are Bringing Us Hell

1•zerosizedweasle•8m ago•1 comments

Sodium-ion batteries: 10 Breakthrough Technologies 2026

https://www.technologyreview.com/2026/01/12/1129991/sodium-ion-batteries-2026-breakthrough-techno...
1•fleahunter•8m ago•0 comments

'Office Is Dead': Microsoft Decision Confuses 400M Users

https://www.forbes.com/sites/zakdoffman/2026/01/11/office-is-dead-microsoft-decision-confuses-400...
1•CharlesW•9m ago•1 comments

Hyper 8:Static site generator for video publishing

https://simonrepp.com/hyper8/
1•nogajun•9m ago•0 comments

Researchers Beam Power from a Moving Airplane

https://spectrum.ieee.org/wireless-power-movin-airplane
1•pseudolus•9m ago•0 comments

Monitoring Training Adaptation and Recovery Using Heart Rate Variability

https://www.mdpi.com/1424-8220/26/1/3
1•PaulHoule•10m ago•0 comments

You're falling behind. It's time to catch up

https://www.youtube.com/watch?v=Z9UxjmNF7b0
1•yshrestha•11m ago•0 comments

System Design Interview: An insider's guide (Alex Xu) [pdf]

https://bytes.usc.edu/~saty/courses/docs/data/SystemDesignInterview.pdf
1•martianlantern•11m ago•0 comments

Netflix's $82.7B rags-to-riches story

https://fortune.com/2026/01/10/netflix-warner-bros-paramount-acquisistion-blockbuster-reed-hastin...
1•andsoitis•12m ago•0 comments

Built from First Principles: Why copper-rs works well to build robots with AI

https://www.copper-robotics.com/whats-new/built-from-first-principles-why-copper-rs-works-so-well...
1•gbin•12m ago•1 comments

Show HN: Geoguess Lite – open-source, subscription free GeoGuessr alternative

https://geoguesslite.com
1•spider-hand•13m ago•0 comments

Not All Browser APIs Are "Web" APIs

https://polypane.app/blog/not-all-browser-apis-are-web-apis/
2•bigblind•14m ago•0 comments

The Board Deck Is Killing Your AI Visibility

https://growtika.com/blog/board-deck-ai-visibility
1•Growtika•15m ago•0 comments

A Republic: if you can keep it. Robert Anton Wilson on his 19th anniversary

https://gabrielpatrickkennedy.substack.com/p/a-republic-if-you-can-keep-it
3•thinkingemote•17m ago•0 comments

Under Trump, U.S. Adds Fuel to a Heating Planet

https://www.nytimes.com/2026/01/12/climate/trump-climate-change-emissions-fuel.html
3•fleahunter•19m ago•0 comments

Socially awkward nerds are mostly just Berkson's paradox

https://shakeddown.substack.com/p/socially-awkward-nerds-are-mostly
2•surprisetalk•20m ago•0 comments

What is the opposite of a set? [video]

https://www.youtube.com/watch?v=SrltwGJAiCM
1•surprisetalk•20m ago•0 comments

The Internet forgets, but I don't want to

https://alexwlchan.net/2025/social-media-scrapbook/
2•surprisetalk•20m ago•0 comments

The rise (and future fall) of Discord

https://slugcat.systems/post/24-12-12-the-rise-and-future-fall-of-discord/
2•todsacerdoti•20m ago•0 comments

Ask HN: What do you think is the most joy a programmer can have in programming?

1•bagol•21m ago•4 comments

Show HN: Chronos-Track – Detect honeypots via TCP timestamp clock skew (Rust)

https://github.com/Noamismach/chronos_track
1•Ismach•22m ago•1 comments

Show HN: Verdic Guard – deterministic guardrails for production AI

1•kundan_s__r•23m ago•0 comments

Stripped-down 100% open-source flashcard web-app

https://www.fast-cards.com/
1•programmexxx•23m ago•0 comments