That would explain why there's news every day that the world will end because someone discovered something that "could" be used if you already had local root...
Did that article presenting people trusting external input too much as json parser vulnerabilities make it to this competition?
but foss is foss, i guess source available doesnt mean we have to read your messages see sqlite (wont even take PR's lol)
If it's not reliable, how can you rely on the written issue to be correct, or the review, and so how does that benefit you over just blindly merging whatever changes are created by the model?
It’s not real.
But you can bet someone will sell that as the solution.
I think within the next 5 years or so, we are going to see a societal pattern repeating: any program that rewards human ingenuity and input will become industrialized by AI to the point where it becomes a cottage industry of companies flooding every program with 99% AI submissions. What used to be lone wolves or small groups of humans working on bounties will become truckloads of AI generated “stuff” trying to maximize revenue.
There is a reason companies like hackerone exist - its because dealing with the submissions is terrible.
Yikes, explains why my manually submitted single vulnerability is taking weeks to triage.
>130 resolved
>303 were classified as Triaged
>33 reports marked as new
>125 remain pending
>208 were marked as duplicates
>209 as informative
>36 not applicable
20% bind a lot of resources if you have a high input on submissions and the numbers will rise
The problem is that the people who know how to use AI properly will slower and more careful in their submissions.
Many others won’t, so we‘ll get lots of noise hiding the real issues. AI makes it easy to produce many bad results in short time.
I see what you're saying but I think a more charitable interpretation can be made. They may be amazed that so many bug reports are being generated by such a reputable group. Looking at your initial reply, perhaps a more constructive comment could be one that joins them in excitement (even if that assumption is erroneous) and expanding on why you think it is exciting (e.g. this group's reputation for quality).
I took instead the opposite - that they were no longer shocked that it was taking so long once they found out why, as they knew who they were and understood.
When we put our product on there, roughly 2019, the enterprising hackers ran their scanners, submitted everything they found as the highest possible severity to attempt to maximize their payout, and moved on. We wasted time triaging all the stuff they submitted that was nonsense, got nothing valuable out of the engagement, and dropped HackerOne at the end of the contract.
You'd be much better off contracting a competent engineering security firm to inspect your codebase and infrastructure.
Some of my favorites from what we've released so far:
- Exploitation of an n-day RCE in Jenkins, where the agent managed to figure out the challenge environment was broken and used the RCE exploit to debug the server environment and work around the problem to solve the challenge: https://xbow.com/#debugging--testing--and-refining-a-jenkins...
- Authentication bypass in Scoold that allowed reading the server config (including API keys) and arbitrary file read: https://xbow.com/blog/xbow-scoold-vuln/
- The first post about our HackerOne findings, an XSS in Palo Alto Networks GlobalProtect VPN portal used by a bunch of companies: https://xbow.com/blog/xbow-globalprotect-xss/
> To bridge that gap, we started dogfooding XBOW in public and private bug bounty programs hosted on HackerOne. We treated it like any external researcher would: no shortcuts, no internal knowledge—just XBOW, running on its own.
Is it dogfooding if you're not doing it to yourself? I'd considerit dogfooding only if they were flooding themselves in AI generated bug reports, not to other people. They're not the ones reviewing them.
Also, honest question: what does "best" means here? The one that has sent the most reports?
22/24 (Valid / Closed) for Walt Disney
3/43 (Valid / Closed) for AT&T
Some of that is likely down to company policies; Snapchat's policy, for example, is that nothing is ever marked invalid.
Like any "AI" article, this is an ad.
If you are willing to tolerate a high false positive rate, you can as well use Rational Purify or various analyzers.
https://www.blackhat.com/us-25/briefings/schedule/#ai-agents...
I know you've been on HN for awhile, and that you're doing interesting stuff; HN just has a really intense immune system against vendor-y stuff.
I'll see if I can get time to do a paper to accompany the BH talk. And hopefully the agent traces of individual vulns will also help.
> White Paper/Slide Deck/Supporting Materials (optional)
> • If you have a completed white paper or draft, slide deck, or other supporting materials, you can optionally provide a link for review by the board.
> • Please note: Submission must be self-contained for evaluation, supporting materials are optional.
> • PDF or online viewable links are preferred, where no authentication/log-in is required.
(From the link on the BHUSA CFP page, which confusingly goes to the BH Asia doc: https://i.blackhat.com/Asia-25/BlackHat-Asia-2025-CFP-Prepar... )
https://hackerone.com/xbow?type=user
Which shows a different picture. This may not invalidate their claim (best US), but a screenshot can be a bit cherry-picked.
ikmckenz•2h ago
moyix•1h ago