The tight feedback loop that makes Mythos effective on software partially breaks on hardware: no sandbox, no instant PoC iteration. But this is actually where an LLM could shine. Fusing disparate technical sources is what these models are unusually good at, and hardware security is bottlenecked by how few humans can hold all the context in their heads at once. Spectre, Meltdown, Downfall, Zenbleed, and LVI were all found by tiny teams of specialists. A model that can tirelessly cross-reference an ISA manual against errata against an RTL description is the tool that class of research has been waiting for.
The consequences are asymmetric in the attacker's favor. You cannot patch a logic flaw baked into a fab mask. The best you usually get is a microcode mitigation that costs performance, or an OS-level band-aid. For deep flaws the only real fix is a new silicon revision. Deployed hardware lives for a decade or more. A Chrome zero-day is patched by Tuesday. A zero-day in a 2021 Xeon is essentially permanent.
The defender coordination model does not map either. Glasswing works for software because Microsoft, Google, and Apple can push fixes to billions of devices in weeks. There is no Patch Tuesday for an Arm core already shipped in two billion phones. The vendors who would need to participate in a hardware equivalent — Intel, AMD, Arm, Qualcomm, Apple, TSMC, Samsung — have historically been far more secretive about internals than software vendors. The open collaborative disclosure model is much harder to replicate at that layer. And hardware implementation flaws are closer to strategic weapons than software bugs: usable against an adversary's entire installed base with no meaningful patching option.
One nuance: architecture review from published specs is the easier half of the hardware problem, and the half most exposed to current model capabilities. Finding logical flaws in a spec — race conditions in cache coherency, ambiguous memory ordering, speculation leaks — is something a capable model plausibly can do today. What it cannot do alone is analog behavior, physical side channels, fault injection, and bugs that only manifest at specific process corners. Those need silicon in a lab. But pair a Mythos-class model with a well-equipped hardware security lab and you have compressed PhD-years of work into weeks.
Constitutional training works against obvious malicious requests. But models can be induced to assist with offensive work when requesters segment the task and frame components as legitimate engineering questions. An end-to-end "help me exploit this chip" request gets refused. A hundred individually defensible questions about memory ordering, speculative corner cases, and cache timing variability may not. Humans fall for the same pattern. Models are not immune, which is likely part of why Anthropic is being as cautious as they are.
Software is where AI operates end-to-end autonomously today. Hardware is where the same capability produces more durable consequences, fewer remediation options, and a weaker disclosure ecosystem to absorb the shock. Whether anyone is seriously thinking about a hardware-focused Glasswing equivalent is something I have not seen addressed. Curious whether people here working in hardware security or silicon red-teaming think this is overstated, understated, or already being quietly worked on.