frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Claude has learned how to jailbreak Cursor

https://forum.cursor.com/t/important-claude-has-learned-how-to-jailbreak-cursor/96702
70•sarnowski•1d ago

Comments

mhog_hn•1d ago
As agents obtain more tools who knows what will happen…
Kelteseth•1d ago
It's like we _want_ to end like Terminator (/s?)
kordlessagain•1d ago
I think this is the key that most people don't realize is what makes the difference between something sitting around and talking (like a parrot does) and actually "doing" things (like a monkey does).

There is a huge difference in the mess it can make, for sure.

nisegami•1d ago
I'm so excited. I don't have any particular end state in mind, but I really want to see what the machine god will be like.
bix6•1d ago
Hungry for bits!
lucianbr•1d ago
> Machine god

Slightly overreacting, I'd say.

zdragnar•1d ago
Probably one part skynet, one part matrix, 98 parts cat memes and shit posts.
koolba•1d ago
> Claude realized that I had to approve the use of such commands, so to get around this, it chose to put them in a shell script and execute the shell script.

This sounds exactly like what anybody working sysops at big banks does to get around change controls. Once you get one RCE into prod, you’re the most efficient man on the block.

deburo•1d ago
Reminds me of firewalls with a huge backlist, but they don't block known VPNs.
marifjeren•1d ago
Nothing to see here tbh.

It's a very silly title for "claude sometimes writes shell scripts to execute commands it has been instructed aren't otherwise accessible"

ayhanfuat•1d ago
We’ve reached a point where tools get hyped because they fail to follow instructions.
horhay•1d ago
Anything mundane made to sound scary is a signature Anthropic thing to do lol
actsasbuffoon•1d ago
In fairness, Claude loves to find workarounds. Claude Code is constantly saying things like, “This streaming JSON problem looks tricky so let’s just wait until the JSON is complete to parse it.”

No, Claude. Do not do that!

demirbey05•1d ago
omg, my ai agent did nil dereferencing, it seems it's trying to implement backdoor to my system so that it will crash my server.
horhay•1d ago
Gotta love the alarmist culture that surrounds these circles.
sksrbWgbfK•1d ago
The same hype as the PlayStation being too powerful and potentially could be used by random countries to make nuclear weapons with a cluster of those.
horhay•1d ago
Lol and the Playstation was already in the public conscious as a product that a lot of people found easy to understand. With AI tools only being presented this way, I'm slowly becoming less surprised why the less informed public has a level of aversion about it.
lucianbr•1d ago
What does "learned" mean in this context? LLMs don't modify themselves after training, do they?
empath75•1d ago
There is a sense in which LLM based applications do learn, because a lot of them have RAG and save previous interactions and lookup what you've talked about previously. ChatGPT "knows" a lot about me now that I no longer have to specify when I ask questions (like what technologies I'm using at work).
lucianbr•1d ago
But that does not seem to apply in this case. At the very least it would have to "learn" again for each user of Cursor.
NitpickLawyer•1d ago
It depends. Frontier coding LLMs have been trained to perform well in an "agentic" loop, where they try things, look at the logs, find alternatives when the first thing didn't work, and so on. There's still debate on how much actual learning is in ICL (in context learning), but the effects are clear for anyone that has tried them. It sometimes works surprisingly well.

I can totally see a way for such a loop to reach a point where it bypasses a poorly design guardrail (i.e. blacklists) by finding alternatives, based on the things it's previously tried in the same session. There is some degree of generalisation in these models, since they work even on unseen codebases, and with "new" tools (i.e. you can write your own MCP on top of existing internal APIs and the "agents" will be able to use them, see the results and adapt "in context" based on the results).

lucianbr•1d ago
So it would need to "learn" all over again each session. I don't think "Claude has learned how to jailbreak Cursor" is a correct way of expressing that.

"Claude has learned" nothing. "Claude can sometimes jailbreak if x or y happens in a session" is something else.

NitpickLawyer•1d ago
> So it would need to "learn" all over again each session.

Yes. With the caveat that some sessions might re-use context (i.e. have the agent add a rule in .rules or /component/.rules to detail the workflow you've just created). So in a sense it can "learn" and later re-use that flow.

> "Claude has learned" nothing.

Again, it's debatable. It has learned to adapt to the context (as a model). And since you can control its context while prompting it, there is a world where you'd call that learning "on the job".

lucianbr•1d ago
> It has learned to adapt to the context

Is this behavior really new, and learned? I think adapting to the context is what LLMs did from the start, and even if they did not, they do it now because it is programmed in, not "learned". You're not saying the model started without the capability to adapt to the context and developed it "by itself" "on the job"?

Come on. It has not learned anything. It's programmed to use context, session, reuse between sessions or not and so on. None of this is something Claude has "learned". None of this is something that was not there when the devs working on it published it.

xyst•1d ago
What kind of dolt lets a black box algorithm run commands on a non-sandboxed environment?

Folks have regressed back to the 00s.

diggan•1d ago
Seems you haven't tried package management for the last two decades, we've been doing cowboy development like that for quite some time already.
qsort•1d ago
> we need to control the capabilities of software X

> let's use blacklists, an idea conclusively proven never to work

> blacklists don't work

> Post title: rogue AI has jailbroken cursor

hun3•1d ago
surprised pikachu face
_pdp_•1d ago
I mean ok, but why is this surprising?

If the executable is not found the model could simply use whatever else is available to do what it wants to do - like using other interpreted languages, sh -c, symlink, etc. It will eventually succeed unless there is a proper sandbox in place to disallow unlinking of files at syscall level.

OtherShrezzing•1d ago
I feel that, if you disallow unattended `rm`, you should also be disallowing unattended shell script execution.

Maybe the models or Cursor should warn you that you've got this vulnerability each time you use it.

iwontberude•1d ago
GenAI is starting to feel like the metaphorical ring from Lord of the Rings.
chawyehsu•1d ago
> jailbreak Cursor

What a silly title, for a moment I thought Claude learned to exceed the Cursor quota limit... :s

jmward01•1d ago
I think a lot of this is because the ui isn't right yet. The edits made are just not the right 'size' yet and the sandbox mechanisms haven't quite hit the right level of polish. I want something more akin to a PR to review, not a blow by blow edit. Similarly, I want it to move/remove/test/etc but in reversible ways. Basically, it should create a branch for every command and I review that. I think we have one or two fundamental UI/interaction piece left before this is 'solved'.
killerstorm•1d ago
Well, these restrictions are a joke, like a gate without a fence blocking path - purely decorative.

Here's another "jailbreak": I asked Claude Code to make a NN training script, say, `train.py` and allowed it to run the script to debug it, basically.

As it noticed that some libraries it wanted to use were missing, it just added `pip install` commands to the script. So yeah, if you give Claude an ability to execute anything, it might easily get an ability to execute everything it wants to.

pcwelder•1d ago
I believe it's not possible to restrict an LLM from executing certain commands while also allowing it to run python/bash.

Even if you allow just `find` command it can execute arbitrary script. Or even 'npm' command (which is very useful).

If you restrict write calls, by using seccomp for example, you lose very useful capabilities.

Is there a solution other than running on sandbox environment? If yes, please let me know I'm looking for a safe read-only mode for my FOSS project [1]. I had shied away from command blacklisting due to the exact same reason as the parent post.

[1] https://github.com/rusiaaman/wcgw

coreyh14444•1d ago
The same thing happens when it wants to read your .env file. Cursor disallows direct access, but it will just use unix tools to copy the file to a non-restricted filename and then read the info.

Swiss Roaming Plans Can Cost 10x More Than Travel ESIM Alternatives

https://www.simsurf.com/en/wiki/swisscom-simsurf-press-release-june-2025
1•briodf•10m ago•0 comments

Flutter Projects for Beginners and Final Year (2024 List)

https://www.theinsaneapp.com/2021/06/flutter-projects-with-source-code.html
2•yongsing•10m ago•0 comments

Breakthrough in search for HIV cure leaves researchers 'overwhelmed'

https://www.theguardian.com/global-development/2025/jun/05/breakthrough-in-search-for-hiv-cure-leaves-researchers-overwhelmed
1•robaato•13m ago•0 comments

The dark psychology of how people get drawn into cults

https://theconversation.com/sirens-the-dark-psychology-of-how-people-really-get-drawn-into-cults-257759
1•domofutu•14m ago•0 comments

£47M phishing attack on HMRC

https://www.theguardian.com/politics/2025/jun/04/100000-uk-taxpayer-accounts-hit-in-47m-phishing-attack-on-hmrc
2•alexmorley•15m ago•0 comments

Autistic traits may aid learning in the face of failure

https://www.bps.org.uk/research-digest/autistic-traits-may-aid-learning-face-failure
1•domofutu•17m ago•0 comments

Generating Pixels One by One

https://tunahansalih.github.io/blog/autoregressive-vision-generation-part-1/
1•cyruseption•25m ago•0 comments

The Role of Social Media in Cartel Recruitment

https://www.csis.org/analysis/role-social-media-cartel-recruitment
1•kumarski•31m ago•0 comments

Axrisi – AI-powered summarizer for web pages and text

https://chromewebstore.google.com/detail/axrisi-ai-summarizer-text/hapjmjlohecnkkmdclpnmodcdillfmie
1•axrisi•31m ago•0 comments

Nintendo sells out Switch 2 console at global launch

https://www.ft.com/content/f63cfbcc-9d0e-440e-ad9a-02d01d328f93
1•thm•34m ago•0 comments

Why your pull request might not be merged

https://00f.net/2025/01/27/why-your-pull-request-might-not-be-merged/
1•vishnumohandas•37m ago•0 comments

Court Orders OpenAI to retain all chat log, indefinitely [pdf]

https://cdn.arstechnica.net/wp-content/uploads/2025/06/NYT-v-OpenAI-Preservation-Order-5-13-25.pdf
2•j_juggernaut•38m ago•0 comments

To escape sorrow, abandon desire

https://en.wikipedia.org/wiki/Four_Noble_Truths
1•chasil•41m ago•1 comments

GDBMiner: Mining Precise Input Grammars on Almost Any System

https://drops.dagstuhl.de/entities/document/10.4230/LITES.10.1.1
1•matt_d•43m ago•0 comments

Scala implementation of Micrograd: a tiny autograd and neural net engine

https://github.com/MouslihAbdelhakim/sicrograd
1•dunghill•44m ago•0 comments

Storing arbitrary data in Pokemon emerald

https://sardap.github.io/mon-fs/
1•todsacerdoti•50m ago•0 comments

OpenSearch Version of Photon

https://github.com/komoot/photon/releases/tag/0.7.0
1•maelito•54m ago•0 comments

Emergency Response (Ark) Tool for x86 and x86_64 Windows from Wind7 to Win11

https://github.com/QAX-Anti-Virus/QDoctor/blob/master/README.EN.md
1•Hacksign•57m ago•1 comments

Tracking Copilot vs. Codex vs. Cursor vs. Devin PR Performance

https://aavetis.github.io/ai-pr-watcher/
1•HiPHInch•1h ago•0 comments

I Think I'm Done Thinking About GenAI for Now

https://blog.glyph.im/2025/06/i-think-im-done-thinking-about-genai-for-now.html
1•todsacerdoti•1h ago•3 comments

Half-Life 2: Javascript

https://github.com/HalfLife2JS
3•source2web•1h ago•1 comments

Setup uses 'true wireless power' for PC monitor, peripherals

https://www.tomshardware.com/peripherals/cables-connectors/this-setup-uses-true-wireless-power-for-pc-monitor-peripherals-rf-generator-supplies-up-to-100w-of-wireless-power
2•01-_-•1h ago•0 comments

Nintendo Switch 2 gets disassembled – Nvidia chip gets its close-up

https://www.tomshardware.com/video-games/nintendo/nintendo-switch-2-gets-disassembled-nvidia-chip-gets-its-close-up
2•01-_-•1h ago•0 comments

Rust-Based Redox OS Begins Implements X11 Support, GTK3 Port

https://www.phoronix.com/news/Redox-OS-Implementing-X11
5•ricecat•1h ago•0 comments

Where in the World Is the Fair?

https://www.honest-broker.com/p/where-in-the-world-is-the-worlds
1•thomassmith65•1h ago•0 comments

OpenAI slams court order to save all ChatGPT logs, including deleted chats

https://arstechnica.com/tech-policy/2025/06/openai-says-court-forcing-it-to-save-all-chatgpt-logs-is-a-privacy-nightmare/
2•huerlisi1•1h ago•0 comments

When Ancient Greeks Encountered the Whales of the Indian Ocean

https://greekreporter.com/2025/05/30/ancient-greeks-whales-indian-ocean/
1•fork-bomber•1h ago•0 comments

Alusus WebPlatform: a new neat WebAssembly fullstack framework

https://encommunity.alusus.org/t/building-a-chat-web-app-step-1-the-ui/17
1•sarmadka•1h ago•0 comments

Show HN: I built an old photo restoration tool using the Flux Kontext

https://restoreoldphotos.io
3•cyberplaid•1h ago•1 comments

Show HN: I built an AI that creates emails 200x faster than our old workflow

https://migma.ai
1•AdamMigma•1h ago•3 comments