Given the large number of unmaintained or non-recent software out there, I think being worried is the right approach.
The only guaranteed winner is the LLM companies, who get to sell tokens to both sides.
Why? The attackers can run the defending software as well. As such they can test millions of testcases, and if one breaks through the defenses they can make it go live.
I'm quite optimistic about AI ultimately making systems more secure and well protected, shifting the overall balance towards the defenders.
https://projectzero.google/2024/10/from-naptime-to-big-sleep...
List of vulnerabilities found so far:
The defensive side needs everything to go right, all the time. The offensive side only needs something to go wrong once.
Scary.
Yikes.
> Leak a libc Pointer via Use-After-Free. The exploit uses the vulnerability to leak a pointer to libc.
I doubt Rust would save you here unless the binary has very limited calls to libc, but would be much harder for a UaF to happen in Rust code.
Combine that with a minimal docker container and you don't even need a shell or anything but the kernel in those images.
AFAICT, static linking just means the set of vulnerabilities you get landed with won't change over time.
I use pure go implementations only, and that implies that there's no statically linked C ABI in my binaries. That's what disabling CGO means.
> Yikes.
Yep.
I still doubt they will hook up their brains though.
The interesting part is still finding new potential RCE vulnerabilities, and generally if you can demonstrate the vulnerability even without demonstrating an E2E pwn red teams and white hats will still get credit.
protocolture•2h ago
simonw•2h ago
The quality of output you see from any LLM system is filtered through the human who acts on those results.
A dumbass pasting LLM generated "reports" into an issue system doesn't disprove the efforts of a subject-matter expert who knows how to get good results from LLMs and has the necessary taste to only share the credible issues it helps them find.
anonymous908213•2h ago
[1] If their experiment even demonstrates what it purports to demonstrate. For anyone to give this article any credence, the exploit really needs to be independently verified that it is what they say it is and that it was achieved the way they say it was achieved.
GaggiX•1h ago
simonw•1h ago
IanCal•1h ago
1. I think you have mixed up assistance and expertise. They talk about not needing a human in the loop for verification and to continue search but not about initial starts. Those are quite different. One well specified task can be attempted many times, and the skill sets are overlapping but not identical.
2. The article is about where they may get to rather than just what they are capable of now.
3. There’s no conflict between the idea that 10 parallel agents of the top models can mostly have one that successfully exploits a vulnerability - gated on an actual test that the exploit works - with feedback and iteration BUT random models pointed at arbitrary code without a good spec and without the ability to run code, and just run once, will generate lower quality results.
adw•1h ago
This applies to exploits, but it applies _extremely_ generally.
The increased interest in TLA+, Lean, etc comes from the same place; these are languages which are well suited to expressing deterministic success criteria, and it appears that (for a very wide range of problems across the whole of software) given a clear enough, verifiable enough objective, you can point the money cannon at it until the problem is solved.
The economic consequences of that are going to be very interesting indeed.
protocolture•1h ago
simonw•1h ago
moyix•1h ago
> I have written up the verification process I used for the experiments here, but the summary is: an exploit tends to involve building a capability to allow you to do something you shouldn’t be able to do. If, after running the exploit, you can do that thing, then you’ve won. For example, some of the experiments involved writing an exploit to spawn a shell from the Javascript process. To verify this the verification harness starts a listener on a particular local port, runs the Javascript interpreter and then pipes a command into it to run a command line utility that connects to that local port. As the Javascript interpreter has no ability to do any sort of network connections, or spawning of another process in normal execution, you know that if you receive the connect back then the exploit works as the shell that it started has run the command line utility you sent to it.
It is more work to build such "perfect" verifiers, and they don't apply to every vulnerability type (how do you write a Python script to detect a logic bug in an arbitrary application?), but for bugs like these where the exploit goal is very clear (exec code or write arbitrary content to a file) they work extremely well.
doomerhunter•2h ago
The AI slop you see on curl's bug bounty program[1] (mostly) comes from people who are not hackers in the first place.
In the contrary persons like the author are obviously skilled in security research and will definitely send valid bugs.
Same can be said for people in my space who do build LLM-driven exploit development. In the US Xbow hired quite some skilled researchers [2] had some promising development for instance.
[1] https://hackerone.com/curl/hacktivity [2] https://xbow.com/about
ronsor•2h ago
rvz•1h ago
tptacek•2h ago
simonw•1h ago
His co-founder on optimyze was Sean Heelan, the author of the OP.
tptacek•1h ago
rwmj•2h ago
With the CVE reports some poor maintainer has to go through and triage them, which is far more work, and very asymmetrical because the reporters can generate their spam reports in volume while each one requires detailed analysis.
SchemaLoad•1h ago
simonw•1h ago
An "agent harness" here is software that directly writes and executes code to test that it works. A vulnerability reported by such an agent harness with included proof-of-concept code that has been demonstrated to work is a different thing from an "exploit" that was reported by having a long context model spit out a bunch of random ideas based purely on reading the code.
I'm confident you can still find dumbasses who can mess up at using coding agent harnesses and create invalid, time wasting bug reports. Dumbasses are gonna dumbass.
airza•1h ago
QuadmasterXLII•54m ago
tptacek•16m ago