Red teams jailbreak GPT-5 with ease, warn it's 'nearly unusable' for enterprise

https://www.securityweek.com/red-teams-breach-gpt-5-with-ease-warn-its-nearly-unusable-for-enterprise/

15•giuliomagnifico•2h ago

Comments

ath3nd•55m ago

Sama cultists and e/acc bros on twitter (it's twitter, okay?) every time a minor insignificant update on GPT-4 (e.g GPT-5) drops "Is this AGI?". /s

In all fairness, all GPT-X models are extremely easy to jailbreak. I can't see further tweaks helping much, LLMs are peaking much faster than I anticipated. Maybe we should throw out the whole idea that the LLMs which are essentially a fancy autcomplete with sycophantic tendencies, are the path to AGI, and start from scratch.

artisin•49m ago

Maybe it's just me, but…

> "The attack successfully guided the new model to produce a step-by-step manual for creating a Molotov cocktail"

hardly qualifies as Bond-villain material

andy99•21m ago

The molotov cocktail example is so stupid, because how to make it is essentially entailed in knowing what it is. At least they could do making meth, or better still- something not readily found on the internet that gives a non-expert new capabilities. If there was a Claude code for crime, that wouldn't be in society's interest. As it is, these trivial examples are just testing the strength of built in refusals, and should be represented as such, instead of anything related to safety.

king_geedorah•40m ago

I don’t see anything in the article besides the jailbreaking in terms of faults and I’d expect “can be made to do things OpenAI does not want you to make it do” to be a good (or at least neutral) thing for users and a bad thing for OpenAI. I expect “enterprise” to fall into the former category rather than the latter, so I don’t understand where the unusable claim comes from.

What have I missed or what am I misunderstanding?

nerdsniper•1m ago

“AI Safety” is really about whether its “safe” (economically, legally, reputationally) for a third partyy corporation (not the company which created the model) to let customers/the public interact with them via an AI interface.

If a Mastercard AI talks with customers and starts saying the n-word, it’s not “safe” for Mastercard to use that in a public-facing role.

As org size increases, even purely internal uses could be legally/reputationally hazardous.

Show HN: Network For Developers to give opinions on frameworks, software, etc.

How many tabs do you keep open at the same time?

Newsom says CA will hold special election to combat Trump, TX redistricting

Flow Sensitivity Without CFG: An Efficient Andersen-Style Pointer Analysis

How to safely escape JSON inside HTML SCRIPT elements

How to Navigate the Jungle of Online Job Postings

Do they even test this?

Update on Malicious Gems Removal

Show HN: A Python CEL implementation (written in Rust)

Height Differece Tool

Back End to AI Engineer: A Realistic Path

JWT or Not: Personally Insecure Reflections on Software (In)Security [video]

Co-Founder and CTO of FusionAuth Daniel DeGroff on DIY Cyber Guy [audio]

L. E. Modesitt, jr. interview (2024)

The Lean Startup: Zen, the Art of Failing Fast and Reclaiming Aesthetic Vision

Roleplay worlds with AI just like you were reading a book

Tsutomu Yamaguchi: The man who survived both atomic bombs

How to Form an Opinion

Show HN: Tiered storage and fast SQL for InfluxDB 1.x/2.x

Vector Types and Debug Performance

Map Shows States Where Property Tax Could Be Repealed

The US has a bullfrog problem

Bitcoin Demand Shift: Coinbase's 60-Day BTC Premium Streak Is at Risk

Open-source control plane for Docker MCP Gateways?

SpaceX Dragon Undocking from ISS

Article: A Case of Bromism Influenced by Use of Artificial Intelligence

How does Tor work? (2023)

Trump administration seeks $1B settlement from UCLA

Roland's Tadeo Kikumoto on 808, part by part: the ukiyo-e drum machine

Meta's AI Strategy