Is this suspected vulns or actual vulns? If I recall correctly, it produced 5 for curl but only 1 was legit
> 1,752 of those high- or critical-rated vulnerabilities have now been carefully assessed by one of six independent security research firms, or in a small number of cases by ourselves. Of these, 90.6% (1,587) have proved to be valid true positives, and 62.4% (1,094) were confirmed as either high- or critical-severity. That means that even if Mythos Preview finds no further vulnerabilities, at our current post-triage true-positive rates, it’s on track to have surfaced nearly 3,900 high- or critical-severity vulnerabilities in open-source code
I am still a believer that a 100 subagents with good-enough intelligence can get same results as mythos, I am ready for this opinion to be shattered when I eventually try mythos and I believe others here must have tried mythos out too.
I'd say it is about 90% accurate for us. Often even the "Low" findings lead us to dig and realize it is actually exploitable. Everyone makes these mistakes, from the most junior to the most senior. They are just a class of bugs after all.
I expect tools like this to be a regular part of the development lifecycle from here on. We code with AI, we review with AI, we search for vulns with AI. Even if it isn't perfect, it is easily worth the cost IMHO. Highly recommend you get something enabled for your own repos ASAP
OsrsNeedsf2P•57m ago
bobbycastorama•48m ago
So yeah, huge marketing as always.
wiwiwq•39m ago
The American firms are focused on marketing now to convince people to not even consider open sourced models / open weight models as they are inferior (that’s what they want you to believe).
rhubarbtree•37m ago
wiwiwq•35m ago
If people actually believe the narrative then the bankers will over price Anthropic and get away with it.
Brystephor•33m ago
krisbolton•22m ago
boston_clone•46m ago
https://xbow.com/blog/mythos-offensive-security-xbow-evaluat...
OsrsNeedsf2P•38m ago
pertymcpert•46m ago
4.6 but close.
OsrsNeedsf2P•41m ago
parker-3461•45m ago
wiwiwq•36m ago
smoe•33m ago
https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5...
energy123•42m ago
applfanboysbgon•29m ago
properbrew•21m ago
And how much with Opus 4.7? 5x?
kllrnohj•18m ago
https://www.flyingpenguin.com/mythos-mystery-in-mozilla-numb...
moyix•10m ago
simonw•6m ago
enlightenedfool•32m ago
krisbolton•28m ago
arjie•16m ago
There is also a pretty big risk that anyone who is not you would leak the answer to the test. We are close to n=1 epistemics here. You’re going to have to do the research yourself.