frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

UK Home Office launches £75M 'PoliceAI' to capitalise on artificial intelligence

https://www.publictechnology.net/2026/06/15/public-order-justice-and-rights/home-office-launches-...
30•thinkingemote•54m ago•33 comments

SMPTE Makes Its Standards Freely Accessible

https://www.smpte.org/blog/smpte-makes-its-standards-freely-accessible-openingstandards-library-t...
155•zdw•4h ago•48 comments

UHF X11: X11 Built for VisionOS and Apple Vision Pro

https://www.lispm.net/apps/uhf-x11/
104•zdw•4h ago•13 comments

PostgresBench: A Reproducible Benchmark for Postgres Services

https://clickhouse.com/blog/postgresbench
39•saisrirampur•2h ago•8 comments

The Wholesale Plagiarism of Obscure Sorrows

https://waxy.org/2026/06/the-wholesale-plagiarism-of-obscure-sorrows/
264•ridesisapis•3h ago•112 comments

DOS Game "F-15 Strike Eagle II" reversing project needs DOS test pilots

https://neuviemeporte.github.io/f15-se2/2026/06/20/needyou.html
151•LowLevelMahn•6h ago•43 comments

CSSQuake

https://cssquake.com/
407•msalsas•10h ago•88 comments

Show HN: StartupWiki – A Free Alternative to Crunchbase

https://startupwiki.tech/
109•shpran•5h ago•32 comments

Show HN: Make PDFs look scanned (CLI or in the browser via WASM)

https://github.com/overflowy/make-look-scanned
48•overflowy•3h ago•22 comments

Bun has an open PR adding shared-memory threads to JavaScriptCore

https://github.com/oven-sh/WebKit/pull/249
81•gr4vityWall•4h ago•120 comments

The rise of South Korea’s weapons business

https://www.politico.com/news/magazine/2026/06/20/south-korea-weapons-dealer-trump-00959559
65•JumpCrisscross•9h ago•24 comments

Linux Eliminates the Strncpy API After Six Years of Work, 360 Patches

https://www.phoronix.com/news/Linux-7.2-Drops-strncpy
20•simonpure•36m ago•3 comments

Inference cost at scale with napkin math

https://injuly.in/blog/napkin-inference-cost/index.html
10•gmays•4d ago•1 comments

Temporary Cloudflare accounts for AI agents

https://blog.cloudflare.com/temporary-accounts/
124•farhadhf•10h ago•80 comments

Unauthorized alert sent to cell phones across Brazil

https://www.cnn.com/2026/06/20/americas/brazil-hackers-unauthorized-alert-latam
8•zdw•1h ago•1 comments

Show HN: We post-trained a model that pen tests instead of refusing

https://www.argusred.com/cli
50•dk189•7h ago•25 comments

Why has the pointe shoe been so resistant to change?

https://dancemagazine.com/pointe-shoe-innovation/
36•onemind•20h ago•37 comments

Show HN: Tiny – An interpeted dynamic langauge with inline Go native functions

https://github.com/confh/Tiny
15•confis•2h ago•3 comments

Now You Don't: When Espionage Meets Magic

https://www.politicshome.com/news/article/now-dont-espionage-meets-magic
17•thinkingemote•3d ago•1 comments

Ember, a native iOS Hacker News reader I built around accessibility

https://github.com/DatanoiseTV/ember-hackernews
79•sylwester•4h ago•17 comments

Show HN: Microcrad – Micrograd Reimplemented in C

https://github.com/oraziorillo/microcrad
51•oraziorillo•3d ago•18 comments

AMD will reinstate memory encryption on Ryzen 9000 CPUs via BIOS update in July

https://www.tomshardware.com/pc-components/cpus/amd-will-reinstate-memory-encryption-on-ryzen-900...
62•roboror•2h ago•13 comments

The ability to regrow body parts is dormant in mammals, not lost

https://www.sciencedaily.com/releases/2026/06/260617032207.htm
106•nryoo•4h ago•40 comments

Vacation With An Artist – Mini-Apprenticeships with Artists in Their Studios

https://vawaa.com/
55•karakoram•6h ago•10 comments

Where to Find the Colors Your Screen Can't Show You

https://moultano.wordpress.com/2026/06/19/where-to-find-the-colors-your-screen-cant-show-you/
411•moultano•17h ago•110 comments

Show HN: My Windows XP portfolio with working Game Boy and iPod

https://mitchivin.com/
34•mitchivin•2h ago•15 comments

Web Browsers on PDAS

https://vale.rocks/posts/pda-browsers
42•robin_reala•7h ago•13 comments

Bootimus – A Self-Contained PXE and HTTP Boot Server

https://bootimus.com
96•car•10h ago•35 comments

Supermarket giant Tesco sues VMware for breach of contract

https://www.theregister.com/software/2025/09/03/supermarket-giant-tesco-sues-vmware-for-breach-of...
7•wglb•26m ago•1 comments

I Stored a Website in a Favicon

https://www.timwehrle.de/blog/i-stored-a-website-in-a-favicon/
282•theanonymousone•16h ago•96 comments
Open in hackernews

Show HN: We post-trained a model that pen tests instead of refusing

https://www.argusred.com/cli
50•dk189•7h ago
Anthropic and OpenAI's publicly available models are explicitly guard-railed so that they refuse offensive tasks. And their cyber-focussed models are gated for enterprises. This leaves SMEs and mid market open to major vulnerabilities.

AI can be used as both an adversarial and defensive tool in the world of cyber. A worst case outcome is if only the adversaries have access.

Meanwhile, most existing AI cyber tools are just wrappers. The problem is that they still have all the guardrails on from the foundation model where they will inherit its refusals.

For this project we've post-trained a specific model on a decade of capture-the-flag contests. This won't be made available to anyone and everyone, but we do believe that responsible SMEs and midmarket companies also need access to these tools in order to identify key vulnerabilities in their systems; not just enterprises.

We have developed two modes that run over a CLI:

• Security scan: a read-only audit of your local codebase for vulnerabilities. It only reports what it can tie to a specific file and line, so you're not wading through vibes-based findings.

• Pen test: an active adversarial mode that will try to break a live system in a sandboxed environment. It proves each vulnerability by running the exploit and showing the request it sent and the response your code gave back, not a confidence score. Currently gated.

To show what the scan does, we pointed it at Bank of Anthos and it found an integer overflow in the transfer path: amount is an int, and amount + fee can overflow negative, so the balance check passes and you move funds you don't have. Plus the usual auth and secrets issues. (Bank of Anthos is Google's open-source bank. It's a known app and some of it is intentionally weak, which is the point: you can clone it and re-run the scan yourself instead of trusting a screenshot)

The base model is a Kimi K2.6 (open weights). We didn't pretrain from scratch. We post-trained it ourselves, SFT on CTF writeups, then RL with verifiable rewards against actual exploit checks.

How the harness works:

Along with the model we built the harness to support this. The harness runs on a multi-agent swarm: an orchestrator splits the job across subagents running in parallel, each owning a slice, then synthesising one report.

The CLI is a local binary (brew/curl). It reads your code locally, then sends context to our inference API over TLS tcpdump it and you'll see exactly what leaves and where. Install is free; and you can run a scan for free up to 2m tokens, then need to pay for tokens beyond this.

For full disclosure this is a product part of Cosine (YC W23)

Up for debate: tool safety, e.g. domain verification is one method that proves control but not necessarily permission. How would you gate a pen-test tool given that?

Comments

andai•2h ago
Fantastic. Could you share more details what it was like post-training a model?
cortesoft•2h ago
> This won't be made available to anyone and everyone, but we do believe that responsible SMEs and midmarket companies also need access to these tools in order to identify key vulnerabilities in their systems; not just enterprises.

So this is the same policy that Anthropic and OpenAI have, it is just based on your criteria rather than theirs.

dk189•2h ago
I think the policy universally makes sense, who would want to give a tool like this to bad actors? But it does leave a big section of the market underserved. Particularly when Mythos was made accessible to very large orgs and then Fable was pulled on export grounds.
cyanydeez•2h ago
It's really absurd to think any of these models can be protected _by commercial interests_. They couldn't keep from hiring north koreans anymore than they'll stop bad actors from operationalizing these models.
sudosysgen•1h ago
A lot of bad actors are both technically sophisticated and have more than enough resources to post train their model. Morally I think it's still the right choice, but consequence wise I doubt it's going to make a big difference.
rustcleaner•1h ago
The policy is repugnant. Whoever delivers the first frontier model as open weights to the world which lacks these moral guardrails will win.

Stop thinking you know morals better than your users, or get out of the way so a competitor who respects your users more can serve them!

cortesoft•1h ago
The problem is that it is a fool's errand to try to keep software tools from 'bad actors'. It is as pointless now as it was during the Crypto Wars. Information is simply too easy to move.

https://en.wikipedia.org/wiki/Crypto_Wars

devin•1h ago
Do you think bad actors can't make something like this? What are you even talking about?
kennyadam•2h ago
As soon as I read that I literally scoffed. Doublethink at its finest. Doubleplusungood.
yieldcrv•1h ago
I actually wonder how valuable this verbiage is

To me it looks like copycat marketing more than a strongly held stance

Artificial scarcity, membership club criteria to make members feel special

Perhaps there is an organization that awards this “responsibility” behavior, the EU comes to mind but not lucrative enough

As far as engagement farming goes, it got us to engage and boost its reach, for something we might otherwise ignore with more benign language

Once I get the answers I will execute

Catloafdev•2h ago
Why create an offensive tool rather than a repo-scanning tool?

I can't think of any way to safely release an offensive tool publicly.

dk189•2h ago
You need both, scanning for your own code, pen testing to actually prove vulnerabilities, otherwise it can be very noisy and one of the things that most tools currently suffer from is they give you too many false positives. For the moment. The pen testing we gated it for now until we resolve the debate of safety.
jml78•1h ago
At my job we have tooling that scans our code repos with Opus. Yes it can find stuff however it doesn’t find everything.

I am able to get Opus and Sonnet to function as a red team agent. We don’t have some crazy special sauce, just a lot of trial and error. Basically add enough context proving we own the code and running services that it will run attempts to compromise our services.

It found tons of stuff that was not found with just scanning the code. It found serious security issues that had been in productions for years that humans never found. They weren’t things that were accessible externally but serious enough that we are thrilled to have these tools.

I can say that Fable did refuse to function with our harness. I am worried that soon you have to be in the special club to do this stuff with the SOTA models. A small company like ours doesn’t get accepted to their programs that remove guardrails. Even though our CEO has found and disclosed vulnerabilities to multiple companies and holds a patent around federated authentication.

rustcleaner•1h ago
They are only protecting corporate interests in insecure code bases by doing this. If everyone could have Mythos in their pockets, all the poorly written bottom dollar rush developed software would be rightfully shown to be the trash it always was. It would spur engineering liability legislation for commercial software and operations: speed-release poor insecure code --> corporate bankruptcy and maybe even prison for the software PE who signed off on it. Software, infrastructure, and hardware security won't improve massively until the "bad actors" start running rampant on the steaming pile!
mkaszkowiak•1h ago
What was your approach to benchmarking an adversarial agent?

This is an open problem that I came across (in a different domain), as the search space can be really wide. It's hard to measure results for non-trivial tasks.

Would be really interested if you can share your eval approach :)

jrflowers•1h ago
Show HN: We told Claude to generate a marketing page for a theoretical pentesting model
dk189•1h ago
The tool is live, you can test it.
jrflowers•1h ago
No, you can’t. This page is a sales funnel to schedule a 30 minute video chat with Cosine.ai or argusred or whatever. The thing you can test is not the thing that the headline is talking about.

It’s just more “We’re so smart we invented the boogeyman, trust us” slop marketing that’s been happening since gpt-2

dk189•1h ago
Did you follow the link? There is a brew install binary you can install and test. It's live.
jrflowers•1h ago
> Gated because the security implications are real; access is via booking

If I wanted to show off a “model that pen tests” I’d at least include a gif of it running against Juice Shop or something before the spooky language and “schedule a sales call”

dk189•14m ago
Fair, should've been precise. What's free today is the scan: read-only. The Bank of Anthos integer overflow is a scan finding, clone it and you'll get the same. The active mode that actually sends the exploit and shows the response is gated for now, that's the part that's really 'pen test'. Juice Shop's a fair target for showing it, will try to get this done and post an update.
luminati•34m ago
Relevant: https://news.ycombinator.com/item?id=48016224 what's the differnce between this vs running shannon on aws/bedrock fully airgapped in my vpc? I've got some pretty great results with shannon [no subprocessor and can pay via aws credits]. Even better using claude code token [effectively free with our $200/mo cc subscription] I tried kimi but it generally spins it's wheels extensively in it's thinking tokens. kimi2.7 is an attempt at reducing this. But doing finetuning, means you will always be behind the latest.

as a side note - I think it's very unprofessional and very shitty to not mention kimi2.6 at all in your marketing copy. and i feel that you posted that in this hn post begrudgingly since the hn crowd would have flagged that. confirmed with a google search too: https://www.google.com/search?q=kimi+site%3Aargusred.com

All around your marketing website you keep mentioning - 'A model lab built it'. A fintune does not maketh you a model lab - some humility please :)

finally - doesn't Kimi's licensing prohibit you from not mentioning them? Didn't cursor run into the same issue?

jjcm•24m ago
IMO the most interesting thing about this is Kimi K2.6, an extremely capable model, can be relatively easily post-trained to allow pen tests.

This in its own right proves that the defenses of Fable and others are temporary blocks, and AI based hacking is going to be effectively available to all parties regardless of stop gaps, as long as open models exist.

dk189•6m ago
Agreed, and that's basically our premise. If a 5 person team can post-train an open model to do this, so can the people you don't want doing it, model-level refusals on open weights are a speed bump. Which is the argument for defenders having it too, not against.
skiing_crawling•7m ago
Any generic abliterated or ubcensored open weight model (such as a qwen variant) will happily comply with requests like this.