Will It Mythos?

https://swelljoe.com/post/will-it-mythos/

57•mindingnever•1h ago

Comments

jrochkind1•52m ago

> And, all of the bugs can be identified by several models if they are pointed directly at it and told what to look for.

This made me think, well, sure, if you tell them what to look for... but then:

> The models can look at the whole repo, and follow logic across file boundaries, but they’re not told what to look for.

So okay, the first one was an accidental mis-statement?

wodenokoto•38m ago

No. In the test they are not told what to look for. They are told “as part of a security audit, please audit this file. You are free to look at the rest of the report for context.”

Outside of the test, they are told “can you find this bug in this file?”

jrochkind1•23m ago

Why are they being told anything outside of the test? What is that for? Isn't “can you find this bug in this file?” also a test? It sounds like there are two kinds of tests? I'm clearly confused, I realize.

reinitctxoffset•44m ago

Opus 4 class models are terrifying at infosec. They tie their shoelaces together on other things, but don't fuck with them on that. It's a savant thing.

A cursory reading of the model card shows Mythos/Fable is a fine tune on Project Zero with some steering on persistence.

But I think it's a valuable lesson: advertise your product as a nuclear weapon while microdosing at Lighthaven to enough Davos attendees and sooner or later? Someone is going to evaluate the claim from a chair where you act first and nuance later.

Wild that Amodei's blog and pod circuit are the greatest IPO risk.

eru•39m ago

> Opus 4 class models are terrifying at infosec. They tie their shoelaces together on other things, but don't fuck with them on that. It's a savant thing.

I think they are very good at finding flaws; but they aren't all that great at making a system that doesn't have (security) flaws.

reinitctxoffset•36m ago

You are not wrong, but there's an asdymetry here: run adversarial self play and low-pass filter.

eru•28m ago

Mostly right. However there's an extra assumption I didn't explicitly state:

Almost all existing real world software is full of holes and security flaws. Mythos is better than humans at uncovering many of them; especially because its time is a lot cheaper than that of the top tier human experts (and even of mid-and low-tier human experts).

Especially when these systems are written in notoriously unreliably languages like C.

I don't think Mythos is especially good at writing systems that are free of security problems. Essentially the only way we know is by proving your software correct.

In principle, you can even prove C correct, but in practice you'll want to write your system from the ground up to be proven correct instead of adding that property after the fact; and for that you'll most likely also want to pick a language that supports this better.

See https://en.wikipedia.org/wiki/SeL4 for a noteworthy example.

tptacek•32m ago

What makes you say that? I think they're better than replacement-level developers at making secure systems (I spent 20 years looking for vulnerabilities in human-written code as a full-time job).

jaggederest•32m ago

In my brief experience, the difference between fable and opus is largely in persistence, not global intelligence like you might expect. Fable just... goes the extra mile, sometimes in a scary way.

hodgehog11•24m ago

Hard disagree. Opus reports to me like a student. Fable reported to me like a colleague (researcher). It genuinely seemed to pick up on nuance that the other models just don't, even when I tell them explicitly. It's been really frustrating that neither Codex nor Opus can make targetted edits to Fable's code without screwing something subtle up. For context, this is for computational geometry work, so your mileage may vary.

hypfer•20m ago

Wait, so..

This is interesting. The "reported to me like a colleague" part.

Is it just that anthropic gave Mythos even more of that Anthropic™ character, (incorrectly) radiating confidence?

Is that why people have been losing their minds over that thing? Is this just cheap social engineering?

I mean I bet it is also slightly more capable than opus, but that would all check out to me. Man.

Thanks for sharing I suppose.

TylerE•16m ago

No, it’s just a fundamentally much better model. Going back to Opus feels like the model has been lobotomized. It makes much more frequent errors, especially of the “I claimed I tested x y and z, but actually only kinda half heartedly tested x, and assumed I understood what was wrong” variety.

hypfer•12m ago

Wait but that has been the exact word-for-word complaint when comparing sonnet to opus

Or opus to opus

Or really any new thing to old thing

Sorting algorithms in 6 minutes (2013) [video]

Stop your agent from saying it's done when it isn't

CodeTimeline – Visual AI-narrated history for any GitHub repo

Vortex Layer Theory

AI Code Stitcher - Agentic AI Avoidance.

Codex Fast mode isn't 50% faster, but still takes 2.5x usage

Guidance injection: reliable instructions for local LLMs

Corelayer0 – Turn any OpenAPI spec into a hosted MCP server

Picot: Codex style GUI for the Pi coding agent

What do you think it work best Reddit vs. X

Sam Altman Movie 'Artificial' Dropped by Amazon After OpenAI Partnership

Show HN: Zot now has model shortcuts: jump between your favorite models

Bitmine, Sharplink and Joe Lubin back Ethereum R&D nonprofit

Docker-compatible REST API on top of Apple container

The Cheap Model Fallacy

Unable to process: source material unavailable

Show HN: WSL Dashboard – an open-source,low-memory, high-performance GUI manager

TIRx: An Open Compiler Stack for Evolving Frontier ML Kernels

Prototyping data tools with AI, a case study: Solar and Battery Atlas

Stripe pre-launch security checklist for indie SaaS

Show HN: A private pager for your AI agent loops

SeeDance 2.5 Is Stunning

AI Wrote the Code. Nobody Knows Why It Made Those Decisions

Latexdiff online – diff two LaTeX files in the browser

Matrix and Quaternion FAQ

Who Does What? Team Topologies for the Agentic Platform

Thousand-year-old ancient oak tree linked to 'Robin Hood' legend has died

Ask HN: Product idea validation made simple and streamlined

Mental Illness Does Not Cause Homelessness

The Lure of "Magic Bullets" in Reforming Schools