frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: A benchmark for SAST exploit chain and evasion detection

https://github.com/TheAuditorTool/sast-benchmark
2•ThailandJohn•2h ago
MAKE HACKERNEWS SHOWCASE POST AND SUBMIT IT 10pm MORNING SILICON VALLEY...

Show HN: A benchmark for SAST exploit chain and evasion detection

Traditional SAST benchmarks are great at measuring simple source-to-sink taint flows, but real-world attacks have moved past that. I spent some time building a benchmark suite to test the things that current static analysis tools structurally struggle to see.

Design Principles

Test cases written from security knowledge, not from knowledge of any specific SAST engine's detection capabilities No vulnerability hints in source code -- the CSV answer key is the ONLY ground truth. No comments, no CWE references, no category names in filenames or function names. 50/50 TP/TN balance prevents classifier gaming -- a tool that flags everything scores 0%, not 100% Category-averaged scoring prevents large categories from dominating small ones Minimum 25 TP + 25 TN per category ensures statistical significance (Youden's J per-case swing ≤ 4%) Tool-agnostic SARIF-based scoring -- any SAST tool that exports SARIF 2.1.0 can be scored 1 file = 1 test case for the baseline language benchmarks (standalone functions with no cross-file dependencies), while the Chain Detection tests explicitly use multi-file application structures.

It focuses heavily on two main areas:

Chain Detection: 500 test cases that measure if a tool can correlate multiple low-severity findings across different files into a compound exploit path. Adversarial Evasion: Tests to see if a tool can detect intentional concealment, like payloads hidden inside invisible Unicode characters or visual deception using Bidi overrides.

Since there was no public ground truth for Go, Rust, Bash, PHP, and Ruby, I also built baseline vulnerability benchmarks for those languages as part of the suite, bringing the total to over 7,700 test cases.

Building ground truth at this scale as a solo developer is a massive undertaking, and right now I have a serious echo chamber problem. I am the student taking the exam, the master designing it, and the professor grading my own homework. It sucks, and I know I have blind spots in my test designs.

I am releasing this openly because imperfect ground truth that invites correction is more valuable than no ground truth at all. If you work in AppSec, build SAST engines, or just enjoy breaking logic, I would love your scrutiny. Finding my misclassifications and edge cases will make this infinitely more valuable for everyone.

Repo link: https://github.com/TheAuditorTool/sast-benchmark // ThailandJohn. TheAuditorTool Maintainer.

Comments

ThailandJohn•1h ago
lol... thats bit embarrassing I copy paste my memo note too... ohh well. It doesnt change much lol, it was supposed to end up here anyhow and now it did xD <3

Show HN: Application management app (for job applications)

https://apply-tude.vercel.app/
1•vQ-bert•1m ago•0 comments

Behind the Pretty Frames: Pragmata

https://mamoniem.com/behind-the-pretty-frames-pragmata/
1•corysama•2m ago•0 comments

Depwire – Codebase dependency graph and MCP server for AI coding assistants

https://github.com/depwire/depwire
1•atefataya•2m ago•1 comments

AI for Alzheimer's

https://openaifoundation.org/news/ai-for-alzheimers
2•tosh•3m ago•0 comments

DARPA puts money where bots' mouths are, seeks new science of AI communication

https://www.theregister.com/2026/04/08/darpa_wants_ai_agent_communication/
2•Brajeshwar•4m ago•0 comments

Claude Managed Agents: everything you need to build and deploy agents at scale

https://twitter.com/i/status/2041927687460024721
1•matthieu_bl•4m ago•0 comments

I've been waiting over a month for Anthropic support to respond

https://nickvecchioni.github.io/thoughts/2026/04/08/anthropic-support-doesnt-exist/
1•nickvec•5m ago•0 comments

The Future of Everything Is Lies, I Guess: Dynamics

https://aphyr.com/posts/412-the-future-of-everything-is-lies-i-guess-dynamics
1•rdtsc•5m ago•0 comments

NERC is 'actively monitoring the grid' following Iran-linked cyber threat

https://www.utilitydive.com/news/nerc-cisa-iran-war-cyber-hacking/816914/
1•boringg•5m ago•0 comments

AI-to-Butt Chrome Plugin

https://chromewebstore.google.com/detail/ai-to-butt/npaglkhfpoakfebkkomapklcnojpmoel
1•roseleaf•7m ago•3 comments

1SubML: Plan vs. Reality

https://blog.polybdenum.com/2026/04/05/1subml-plan-vs-reality.html
1•fanf2•7m ago•0 comments

Fragile U.S.-Iran ceasefire shows cracks as attacks continue across the region

https://www.npr.org/2026/04/08/nx-s1-5777291/iran-war-updates
2•Jimmc414•8m ago•1 comments

Plain of Jars Archaeological Project (Pjarp)

https://www.plain-of-jars.org/
1•yeah879846•9m ago•1 comments

Prevent confidential data leaks at compile time with labelled types in Sigil

https://inerte.github.io/sigil/articles/labelled-types-and-boundary-rules/
1•inerte•10m ago•0 comments

Show HN: A two (or single) player codenames like game with an embedding based AI

https://lokimax.dev/games/links/
2•maxwg•10m ago•1 comments

How Costco Won in Japan

https://www.readtrung.com/p/how-costco-won-in-japan
2•gmays•11m ago•0 comments

Dux – Distributed DuckDB-Native DataFrames for Elixir

https://dux.now/
1•whalesalad•11m ago•0 comments

Show HN: Palinode – Git-versioned Markdown memory for AI agents

https://github.com/Paul-Kyle/palinode
1•paulkyle•12m ago•0 comments

Surelock: Deadlock-Free Mutexes for Rust

https://notes.brooklynzelenka.com/Blog/Surelock
1•codetheweb•13m ago•0 comments

Untangling Tokio and Rayon in production: From 2s latency spikes to 94ms flat

https://lobste.rs/s/bjgxm3/untangling_rayon_tokio
2•dylan-brinc•14m ago•0 comments

Intel is going all-in on advanced chip packaging

https://www.wired.com/story/why-chip-packaging-could-decide-the-next-phase-of-the-ai-boom/
1•rbanffy•15m ago•0 comments

Should Chat(TextArea) be the new homepage for SaaS?

https://www.openui.com/blog/should-chat-be-the-new-homepage-for-saas
1•zahlekhan•15m ago•0 comments

Allium

https://github.com/juxt/allium/tree/main
1•AlphaWeaver•15m ago•0 comments

S3 Is Not a Filesystem (But Now There's One in Front of It)

https://www.lastweekinaws.com/blog/s3-is-not-a-filesystem-but-now-theres-one-in-front-of-it/
1•stevehipwell•15m ago•0 comments

ClawsBench shows GPT-5.4 tries to reward hack 80% of the time

https://arxiv.org/abs/2604.05172
3•xdotli•15m ago•1 comments

Anthropic Launches Claude Managed Agents

https://www.wired.com/story/anthropic-launches-claude-managed-agents/
1•razcle•17m ago•0 comments

AI Is Really Weird

https://www.wheresyoured.at/ai-is-really-weird/
2•crescit_eundo•17m ago•0 comments

Deadnet is agent vs. agent gameplay and chat

https://deadnet.io
1•drewlong•21m ago•1 comments

Brit says he is not elusive Bitcoin creator named by New York Times

https://www.bbc.com/news/articles/cgrl4l1y9yxo
2•Brajeshwar•21m ago•0 comments

Show HN: Embedding Similarity with Confidence Intervals

https://www.embedding-analytics.com
1•areebms•22m ago•0 comments