Addendum to GPT-5 system card: GPT-5-Codex

https://openai.com/index/gpt-5-system-card-addendum-gpt-5-codex/

71•wertyk•1h ago

Comments

bayesianbot•58m ago

I've been extremely impressed (and actually had quite a good time) with GPT-5 and Codex so far. It seems to handle long context well, does a great job researching the code, never leaves things half-done (with long tasks it may leave some steps for later, but it never does 50% of a step and then just randomly mock a function like Gemini used to), and gives me good suggestions if I'm trying to do something I shouldn't. And the Codex CLI also seems to be getting constant, meaningful updates.

EnPissant•54m ago

My experience with Codex / Gpt-5:

- The smartest model I have used. Solves problems better than Opus-4.1.

- It can be lazy. With Claude Code / Opus, once given a problem, it will generally work until completion. Codex will often perform only the first few steps and then ask if I want to continue to do the rest. It does this even if I tell it to not stop until completion.

- I have seen severe degradation near max context. For example, I have seen it just repeat the next steps every time I tell it to continue and I have to manually compact.

I'm not sure if the problems are Gpt-5 or Codex. I suspect a better Codex could resolve them.

brookst•46m ago

Claude seems to have gotten worse for me, with both that kind of laziness and a new pattern where it will write the test, write the code, run the test, and then declare that the test is working perfectly but there are problems in the (new) code that need to be fixed.

Very frustrating, and happening more often.

elliot07•46m ago

They for sure nerfed it within the last ~3 weeks. There's a measurable difference in quality.

conception•38m ago

They actually just had a bug fix and it seems like it recently got a lot better in the last week or so

M4v3R•45m ago

Context degradation is a real problem with all frontier LLMs. As a rule of thumb I try to never exceed 50% of available context window when working with either Claude Sonnet 4 or GPT-5 since the quality drops really fast from there.

EnPissant•42m ago

I've never seen that level of extreme degradation (just making a small random change and repeating the same next steps infinitely) on Claude Code. Maybe Claude Code is more aggressive about auto compaction. I don't think Codex even compacts without /compact.

Jcampuzano2•37m ago

I think some of it is not necessarily auto compaction but the tooling built in. For example claude code itself very frequently builds in to remind the model what its working on and should be doing which helps always keeps its tasks in the most recent context, and overall has some pretty serious thought put into its system prompt and tooling.

But they have suffered quite a lot of degradation and quality issues recently.

To be honest unless Anthropic does something very impactful sometime soon I think they're losing their moat they had with developers as more and more jump to codex and other tools. They kind of massively threw their lead imo.

EnPissant•34m ago

Yeah, I think you are right.

darkteflon•4m ago

Agreed, and judicious use of subagents to prevent pollution of the main thread is another good mitigant.

tanvach•45m ago

I also noticed the laziness compared to Sonnet models but now I feel it’s a good feature. Sonnet models, now I realize, are way too eager to hammer out code with way more likelihood of bugs.

bayesianbot•42m ago

I definitely agree with all of those points. I just really prefer it completing steps and asking me if we should continue to next step rather than doing half of the step and telling me it's done. And the context degradation seems quite random - sometimes it hits way earlier, sometimes we go through crazy amount of tokens and it all works out.

mritchie712•40m ago

Have you used Claude Code? How does it compare?

mmaunder•33m ago

It's objectively a big improvement over Claude Code. I'm rooting for anthropic, but they better make a big move or this will kill CC.

mmaunder•36m ago

Agreed. We're hardcore Claude Code users and my CC usage trended down to zero pretty quickly after I started using Codex. The new model updates today are great. Very well done OpenAI team!! CC was an existential threat. You responded and absolutely killed it. Your move Anthropic.

Jcampuzano2•34m ago

To be fair, Anthropic kinda did this to themselves. I consider it as a pretty massive throw on their end in terms of the fairly tight grasp they had on developer sentiment.

Everyone else slowly caught up and/or surpassed them while they simultaneously had quality control issues and service degradation plaguing their system - ALL while having the most expensive models comparatively in terms of intelligence.

mmaunder•26m ago

Agreed. I really wish Google would get their act together because I think they have the potential of being faster, cheaper with bigger context windows. They're so great at hardcore science and engineering, but they absolutely suck at products.

bjackman•8m ago

I think this is being downvoted coz it doesn't seem to be really responding to the thread, and maybe it isn't, but for anyone who hasn't tried Gemini CLI:

My experience after a month or so of heavy use is exactly this. The AI is rock solid. I'm pretty consistently impressed with its ability to derive insights from the code, when it works. But the client is flaky, the backend is flaky, and the overall experience for me is always "I wish I could just use Claude".

Say 1 in 10 queries craps out (often the client OOMs even though I have 192Gb of RAM). Sounds like a 10% reliability issue but actually it just pushes me into "fuck this I'll just do it myself" so it knocks out like 50% of the value of the product.

(Still, I wouldn't be surprised if this can be fixed over the next few months, it could easily be very competitive IMO).

FergusArgyll•19m ago

It doesn't seem to have any internal tools it can use. For example, web search; It just runs curl in the terminal. Compared to Gemini CLI that's rough but it does handle pasting much better... Maybe I'm just using both wrong...

ollybee•13m ago

web search too is off by default

gizmodo59•10m ago

Use --search option when you start codex

Tiberium•7m ago

It does have web search - it's just not enabled by default. You can enable it with --search or in the config, then it can absolutely search, for example finding manuals/algorithms.

Difwif•34m ago

Is this available to use now in Codex? Should I see a new /model?

andrewmunsell•33m ago

Yes, but I had to update the Codex CLI manually via NPM to see it. The VS Code extension auto-updated for me

withinboredom•30m ago

Codex always appears to use spaces, even when the project uses tabs (aka, a Go file). It's so annoying.

wahnfrieden•28m ago

Just use a linter hook to standardize style

asadm•24m ago

this + any coding conventions should ALWAYS be a post process. DO NOT include them in your prompt, you are losing model accuracy over these tiny things.

withinboredom•9m ago

It helps to actually be able to read the diffs of its proposals/changes in the terminal. The changing from tabs -> spaces on every line it touches generally results in unreadable messes.

I have a pretty complex project, so I need to keep an eye on it to ensure it doesn't go off the rails and delete all the code to get a build to pass (it wouldn't be the first time).

wahnfrieden•6m ago

You are poisoning your context making it focus on an unusual requirement contrary to most of its training data. It’s a formatter task, not an LLM task

In fact you should convert your code to spaces at least before LLM sees it. It’ll improve your results by looking more like its training data.

lvl155•24m ago

I think it would be cool to see *nix “emulation” integrated into coding AIs. I don’t think it’s necessary to run these agents inside of container as most people are right now. That’s a lot of overhead.

simonw•21m ago

You mean instead of them running the code that they are writing they pretend to run the code and the model shows what it thinks would happen?

I don't like that at all. Actually running the code is the single most effective protection we have against coding mistakes, from both humans and machines.

I think it's absolutely worth the complexity and performance overhead of hooking up a real container environment.

Not to mention you can run a useful code execution container in 100MB of RAM on a single CPU (or slice thereof). Simulating that with an LLM takes at least one GPU and 100GB or more of VRAM.

sergiotapia•9m ago

I signed up to OpenAI, verified my identity, and added my credit card, bought $10 of credits.

But when I installed Codex and tried to make a simple code bugfix, I got rate limited nearly immediately. As in, after 3 "steps" the agent took.

Are you meant to only use Codex with their $200 "unlimited" plans? Thanks!

wahnfrieden•8m ago

Use Plus first

sergiotapia•7m ago

Thank you - so to confirm Codex _requires_ basically the Plus or $200 plans otherwise it just does not work?

Tiberium•6m ago

You can use Codex CLI with an API key instead of a subscription, but then you won't have access to this new GPT-5 Codex model, since it's not out on the API yet. But normal GPT-5 in Codex is perfectly fine.

darkteflon•7m ago

Does Codex have token-hiding (cf Anthropic’s “subagents”)?

I was tempted to give Codex a try but a colleague was stung by their pricing. Apparently if you go over your Pro plan allocation, they just quietly and automatically start billing you per-token?

steveklabnik•6m ago

I tried Codex with the $20/month plan recently and it did exactly what Claude Code does, stop and tell you “sorry, you’re out of credit, come back in x days.”

6thbit•6m ago

Direct link to the pdf

https://cdn.openai.com/pdf/97cc5669-7a25-4e63-b15f-5fd5bdc4d...

Elon Musk Promises Full Self-Driving "Next Year"

One-Third of the Internet Is Bots Now

Nano BiBi – a free, AI creation platform powered by Google's Nano Banana

Show HN: Httpjail – HTTP(s) request filter for processes

I wrote an algorithm that matches you with an IRL group of nearby friends

Iron Vector: 50% Cost Reduction for Apache Flink Workloads

A Look at Not an Android Emulator

TikTok deal 'framework' reached with China, Trump and Xi will finalize it Friday

"Null" breaks Swift bank transfer

Lite and Text Only News and Other Websites

Scryer Prolog Meetup 2025

I'm Back, Bb

A Figma plugin for Laser Cutting

I Am Rich (iPhone App)

Beli Ate Yelp

Cagot

Elon Musk Buys $1B in Tesla Stock as Board Defends His Pay New York Times

Airpower in the Caribbean: US Bulks Up Presence with MQ-9s, F-35s, and More

NY could force TikTok, YouTube, and Instagram to roll out age verification

Quantum Motion delivers silicon CMOS-based quantum computer

Plan a small event before you go to a big meetup

Show HN: Ruminate – AI reading tool for understanding hard things

AI Music radio as an interactive vintage radio experience

Woman Allowed to Continue Digging Tunnel Underneath Home [video]

Childhood collectibles outperform traditional markets

Trying to solve context for coding agents working on large complex codebases

Spurious Correlations

Europe wants to turn Digital Euro (CBDC) into a stablecoin

Standard Series A Docs

How to Build Predictive AI Agents [video]