Show HN: We post-trained a model that pen tests instead of refusing your code

https://www.argusred.com/cli

9•dk189•2h ago

I'm Dimitrios at Cosine. Quick orientation first: the read-only scan is free and you can run it right now: that's the part to try. The pen-test mode is gated behind written authorisation, because it's live offensive testing against real systems; I'll explain that below, it's not a paywall thing.

The reason this exists: most "AI security" tools wrap a general model, so they inherit its refusals, point one at a real offensive task and it hedges or declines, because the base model was trained to. We went the other way and post-trained our own model for offensive security, so it does the work instead of apologising for it. It's our model, not a wrapper.

Under the hood it's a multi-agent swarm: an orchestrator splits the job across subagents running in parallel, each owning a slice, then synthesises one report. That's what gets a polyglot microservice repo done in one pass.

The fair objection to a model that doesn't refuse, pointed at your code: how is that not reckless? I think refusals are the wrong layer to put safety in. A model that refuses is both useless (won't do the job) and unsafe (you're trusting a probability distribution to hold a hard line). So we don't ask the model to behave — we enforce it in the harness. A runtime guard written in Go intercepts every tool call before it runs. In scan mode it hard-blocks every mutating tool and any non-read-only shell command and the model can decide whatever it wants, the guard won't let it write. In pen-test mode the same guard pins the agent's network scope to the targets you authorised; it can't reach anything else. Safety is deterministic and sits below the model, not inside it.

Two modes, one CLI:

- Security Scan - read-only audit of a local codebase, every finding tied to a file and line. Free, runnable today.

- Pen Test - the swarm attacks systems you authorise and hands back the request it sent and the response your code gave. Gated behind written authorisation.

Demo target and to be straight about it: Bank of Anthos, Google's open-source reference bank. Known app, some intentionally-soft bits — which is why I picked it, so you can reproduce the run instead of trusting a screenshot. The scan found an integer overflow in the transfer path that would let you forge an account balance, plus the usual injection/auth/secrets classes.

It's a closed binary (brew/curl/winget), runs locally, by Cosine. Run it behind a firewall and `tcpdump` exactly what it does before you trust it on anything real. Install is free; the scan runs on a $20 Cosine subscription; pen test is scoped per engagement.

I'll be in the thread all day. The harness-vs-refusals design is the part I most want torn apart - tell me where it breaks.

Comments

applfanboysbgon•1h ago

> Don't post generated text or AI-edited text. HN is for conversation between humans.

add-sub-mul-div•1h ago

Also, don't have an account here solely to spam your own projects and post nothing else.

ivanmontillam•30m ago

> I'll be in the thread all day.

Yeah, now that's flagged.

Albania Is Not for Sale: Kushner's $4B Resort Triggers'Flamingo Revolution'

Making Graphics Like it's 1993

WWDC 2026: Apple is Folding

GentleOS – Classic operating system with a lovely retro GUI

Microsoft's open source tools were hacked to steal passwords of AI developers

Cleaning up after AI rockstar developers

OpenCV 5 Is Here: The Biggest Leap in Years for Computer Vision

Show HN: Gravity – interactive solar-system simulator, from Newton to Einstein

Forever Young: how one molecule can lock plants in a youthful state (2025)

An introduction to functional analysis for science and engineering

Emerge Career (YC S22) Is Hiring a Founding Growth Marketer

The better the autopilot the worse the pilot

Apple reveals new AI architecture built around Google Gemini models

The iPhone's Last Stand

Thi.ng – open-source building blocks for computational design and art

xAI is looking more like a datacentre REIT than a frontier lab

Job: Head of Stonehenge

Show HN: Performative-UI – A react component library of design tropes

Corrupting a ZFS File on Purpose

Siri AI

Adopting the Parallel DWARF linker in dsymutil

Eagle Computer: The rise and fall of an early PC clone

The beauty and simplicity of the good old C-style void* in C++

EU-banned pesticides found in rice, tea and spices

Porting the ThinkPad X61 to Coreboot

H2JVM – A Haskell Library for Writing JVM Bytecode

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second

Old'aVista – The most powerful guide to the old Internet

Apple Core AI Framework

Looking Forward to Postgres 19: Query Hints