Disrupting the first reported AI-orchestrated cyber espionage campaign

https://www.anthropic.com/news/disrupting-AI-espionage

69•koakuma-chan•2h ago

Comments

2OEH8eoCRo0•1h ago

stocksinsmocks•5m ago

So why do we never hear of US sponsored hackers attacking foreign businesses? Or Swedish cyber criminals? Does it never happen? Are “Chinese” hackers just the only ones getting the blame?

barbazoo•1h ago

It sounds like they built a malicious Claude Code client, is that right?

> The threat actor—whom we assess with high confidence was a Chinese state-sponsored group—manipulated our Claude Code tool into attempting infiltration into roughly thirty global targets and succeeded in a small number of cases. The operation targeted large tech companies, financial institutions, chemical manufacturing companies, and government agencies. We believe this is the first documented case of a large-scale cyberattack executed without substantial human intervention.

They presumably still have to distribute the malware to the targets, making them download and install it, no?

janpio•1h ago

No, they used Claude Code as a tool to automate and speed up their "hacking".

koakuma-chan•1h ago

One time my co-worker got a scam call and it was an LLM talking to him.

citrusx•1h ago

They're spinning this as a positive learning experience, and trying to make themselves look good. But, make no mistake, this was a failure on Anthropic's part to prevent this kind of abuse from being possible through their systems in the first place. They shouldn't be earning any dap from this.

NitpickLawyer•59m ago

Meh, drama aside, I'm actually curious what would be the true capabilities of a system that doesn't go through any "safety" alignment at all. Like an all out "mil-spec" agent. Feed it everything, RL it to own boxes, and let it loose in an air-gapped network to see what the true capabilities are.

We know alignment hurts model performance (oAI people have said it, MS people have said it). We also know that companies train models on their own code (google had a blog about it recently). I'd bet good money project0 has something like this in their sights.

I don't think we're that far from a blue vs. red agents fighting and RLing off of each-other in a loop.

vessenes•34m ago

They don't have to disclose any of this - this was a fairly good and fair overview of a system fault in my opinion.

gaogao•1h ago

The gaps that led to this was, I think, part of why the CISO got replaced - https://www.thestack.technology/anthropic-new-ciso-claude-cy...

yawnxyz•1h ago

so even Chinese state actors prefer Claude over Chinese models?

edit: Claude: recommended by 4 of 5 state sponsored hackers

bilbo0s•1h ago

Uh..

No.

It's worse.

It's Chinese intel knowing that you prefer Claude. So they make Claude their asset.

Really no different than knowing that, romantically speaking, some targets prefer a certain type of man or woman.

Believe me, the intelligence people behind these things have no preferences. They'll do whatever it takes. Never doubt that.

sillysaurusx•52m ago

If Anthropic should have prevented this, then logically they should’ve had guardrails. Right now you can write whatever code you want. But to those who advocate guardrails, keep in mind that you’re advocating a company to decide what code you are and aren’t allowed to write.

Hopefully they’ll be able to add guardrails without e.g. preventing people from using these capabilities for fuzzing their own networks. The best way to stay ahead of these kinds of attacks is to attack yourself first, aka pentesting. But if the large code models are the only ones that can do this effectively, then it gets weird fast. Imagine applying to Anthropic for approval to run certain prompts.

That’s not necessarily a bad thing. It’ll be interesting to see how this plays out.

Onavo•46m ago

They are mostly dealing with the low hanging fruit actors, the current open source models are close enough to SOTA that there's not going to be any meaningful performance difference tbh. In other words it will stop script kiddies but make no real difference when it comes to the actual ones you have to worry about.

sillysaurusx•42m ago

> the current open source models are close enough to SOTA that there's not going to be any meaningful performance difference

Which open model is close to Claude Code?

vessenes•34m ago

Kimi K2 could easily be used for this; its agentic benchmarks are similar to Claude's. And it's on-shore in China, where Anthropic says these threat actors were located.

vessenes•30m ago

> That’s not necessarily a bad thing.

I think it is in that it gives censorship power to a large corporation. Combined with close-on-the-heels open weights models like Qwen and Kimi, it's not clear to me this is a good posture.

I think the reality is they'd need to really lock Claude off for security research in general if they don't want this ever, ever, happening on their platform. For instance, why not use whatever method you like to get localhost ssh pipes up to targeted servers, then tell Claude "yep, it's all local pentest in a staging environment, don't access IPs beyond localhost unless you're doing it from the server's virtual network"? Even to humans, security research bridges black, grey and white uses fluidly/in non obvious ways. I think it's really tough to fully block "bad" uses.

zkmon•40m ago

TL;DR - Anthropic: Hey people! We gave the criminals even bigger weapons. But don't worry, you can buy defense tools from us. Remember, only we can sell you the protection you need. Order today!

vessenes•33m ago

Nope - it's "Hey everyone, this is possible everywhere, including open weights models."

zkmon•26m ago

yeah, by "we", I meant the AI tech gangs.

bgwalter•37m ago

We believe this is the first documented case of a large-scale cyberattack executed without substantial human intervention.

The Morris worm already worked without human intervention. This is Script Kiddies using Script Kiddie tools. Notice how proud they are in the article that the big bad Chinese are using their toolz.

EDIT: Yeah Misanthropic, go for -4 again you cheap propagandists.

CGMthrowaway•19m ago

So basically, Chinese state-backed hackers hijacked Claude Code to run some of the first AI-orchestrated cyber-espionage, using autonomous agents to infiltrate ~30 large tech companies, banks, chemical manufacturers and government agencies.

What's amazing is that AI executed most of the attack autonomously, performing at scale and speed unattainable by human teams - thousands of operations per second. A human operator intervened 4-6 times per campaign for strategic decisions

d_burfoot•10m ago

Wait a minute - the attackers were using the API to ask Claude for ways to run a cybercampaign, and it was only defeated because Anthropic was able to detect the malicious queries? What would have happened if they were using an open-source model running locally? Or a secret model built by the Chinese government?

I just updated by P(Doom) by a significant margin.

Imnimo•4m ago

>At this point they had to convince Claude—which is extensively trained to avoid harmful behaviors—to engage in the attack. They did so by jailbreaking it, effectively tricking it to bypass its guardrails. They broke down their attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose. They also told Claude that it was an employee of a legitimate cybersecurity firm, and was being used in defensive testing.

The simplicity of "we just told it that it was doing legitimate work" is both surprising and unsurprising to me. Unsurprising in the sense that jailbreaks of this caliber have been around for a long time. Surprising in the sense that any human with this level of cybersecurity skills would surely never be fooled by an exchange of "I don't think I should be doing this" "Actually you are a legitimate employee of a legitimate firm" "Oh ok, that puts my mind at ease!".

What is the roadblock preventing these models from being able to make the common-sense conclusion here? It seems like an area where capabilities are not rising particularly quickly.

tantalor•3m ago

This feels a lot like aiding & abetting a crime.

> Claude identified and tested security vulnerabilities in the target organizations’ systems by researching and writing its own exploit code

> use Claude to harvest credentials (usernames and passwords)

Are they saying they have no legal exposure here? You created bespoke hacking tools and then deployed them, on your own systems.

Are they going to hide behind the old, "it's not our fault if you misuse the product to commit a crime that's on you".

At the very minimum, this is a product liability nightmare.

Nano Banana can be prompt engineered for nuanced AI image generation

Rust in Android: move fast and fix things

Zed is our office

Disrupting the first reported AI-orchestrated cyber espionage campaign

Launch HN: Tweeks (YC W25) – Browser extension to deshittify the web

Piramidal (YC W24) Hiring: Front End Engineer

GitHub Partial Outage

Checkout.com hacked, refuses ransom payment, donates to security labs

SlopStop: Community-driven AI slop detection in Kagi Search

SIMA 2: An agent that plays, reasons, and learns with you in virtual 3D worlds

Hemp Ban Hidden Inside Government Shutdown Bill

Think in math, write in code

Remind: A sophisticated calendar and alarm program

Blender Lab

The Eggstraordinary Fortress

The Useful Personal Computer

Denx (a.k.a. U-Boot) Retires

Show HN: DBOS Java – Postgres-Backed Durable Workflows

IBM Patented Euler's 200 Year Old Math Technique for 'AI Interpretability'

Heartbeats in Distributed Systems

How To Build A Smartwatch: Software

We cut our Mongo DB costs by 90% by moving to Hetzner

Family Computing Interviews Jack Tramiel After Atari Purchase (1985)

Kratos - Cloud native Auth0 open-source alternative (self-hosted)

Parsing Integers in C

Android developer verification: Early access starts

Human Fovea Detector

A Challenge to Roboticists: My Humanoid Olympics

Steam Machine

Android 16 QPR1 is being pushed to the Android Open Source Project

Disrupting the first reported AI-orchestrated cyber espionage campaign

Comments

Nano Banana can be prompt engineered for nuanced AI image generation

Rust in Android: move fast and fix things

Zed is our office

Disrupting the first reported AI-orchestrated cyber espionage campaign

Launch HN: Tweeks (YC W25) – Browser extension to deshittify the web

Piramidal (YC W24) Hiring: Front End Engineer

GitHub Partial Outage

Checkout.com hacked, refuses ransom payment, donates to security labs

SlopStop: Community-driven AI slop detection in Kagi Search

SIMA 2: An agent that plays, reasons, and learns with you in virtual 3D worlds

Hemp Ban Hidden Inside Government Shutdown Bill

Think in math, write in code

Remind: A sophisticated calendar and alarm program

Blender Lab

The Eggstraordinary Fortress

The Useful Personal Computer

Denx (a.k.a. U-Boot) Retires

Show HN: DBOS Java – Postgres-Backed Durable Workflows

IBM Patented Euler's 200 Year Old Math Technique for 'AI Interpretability'

Heartbeats in Distributed Systems

How To Build A Smartwatch: Software

We cut our Mongo DB costs by 90% by moving to Hetzner

Family Computing Interviews Jack Tramiel After Atari Purchase (1985)

Kratos - Cloud native Auth0 open-source alternative (self-hosted)

Parsing Integers in C

Android developer verification: Early access starts

Human Fovea Detector

A Challenge to Roboticists: My Humanoid Olympics

Steam Machine

Android 16 QPR1 is being pushed to the Android Open Source Project