GPT-5-Codex

https://openai.com/index/introducing-upgrades-to-codex/

28•meetpateltech•1h ago

Comments

incomingpain•1h ago

Still waiting on codex cli to support lm studio.

Tiberium•1h ago

Only an 1.7% upgrade on SWE-Bench compared to GPT-5, but 33.9 vs 51.3% on their internal code refactoring benchmark. This seems like an Opus 4.1-like upgrade, which is nice to see and means they're serious about Codex.

alvis•1h ago

It's interest to see this quote: `for the bottom 10% of user turns sorted by model-generated tokens (including hidden reasoning and final output), GPT‑5-Codex uses 93.7% fewer tokens than GPT‑5`

It sounds like it can make simple tasks much more correct. It's impressive to me. Today coding agent tends to pretend they're working hard by generating lots of unnecessary code. Hope it's true

jumploops•46m ago

Interesting, the new model's prompt is ~half the size (10KB vs. 23KB) of the previous prompt[0][1].

SWE-bench performance is similar to normal gpt-5, so it seems the main delta with `gpt-5-codex` is on code refactors (via internal refactor benchmark 33.9% -> 51.3%).

As someone who recently used Codex CLI (`gpt-5-high`) to do a relatively large refactor (multiple internal libs to dedicated packages), I kept running into bugs introduced when the model would delete a file and then rewrite in (missing crucial or important details). My approach would have been to just the copy the file over and then make package-specific changes, so maybe better tool calling is at play here.

Additionally, they claim the new model is more steerable (both with AGENTS.md and generally). In my experience, Codex CLI w/gpt-5 is already a lot more steerable than Claude Code, but any improvements are welcome!

[0]https://github.com/openai/codex/blob/main/codex-rs/core/gpt_...

[1]https://github.com/openai/codex/blob/main/codex-rs/core/prom...

pants2•30m ago

Interestingly, "more steerable" can sometimes be a bad thing, as it will tend to follow your prompt to the letter even if that's against your interests. It requires better prompting and generally knowing what you're doing - might be worse for vibe-coders and better for experienced SWEs.

htrp•22m ago

think they're indexing here for professional work (people in the VSCode terminal)

tedsanders•17m ago

> SWE-bench performance is similar to normal gpt-5, so it seems the main delta with `gpt-5-codex` is on code refactors

SWE-bench is a great eval, but it's very narrow. Two models can have the same SWE-bench scores but very different user experiences.

Here's a nice thread on X about the things that SWE-bench doesn't measure:

https://x.com/brhydon/status/1953648884309536958

Topfi•8m ago

Purely anecdotal and subjective, but it does seem to do refactors with precise step-by-step guidance a bit faster (comparing GPT-5 Thinking (Medium) and GPT-5 Codex (Medium)), though adherence to prompts seems roughly equivalent between the two as of now, but I really feel they should consider a more broad naming convention.

New Claude Sonnet 3.7 was a bit of a blunder, but overall, Anthropic has their marketing in tight order compared to OpenAI. Claude Code, Sonnet, Opus, those are great, clear differentiating names.

Codex meanwhile can mean anything from a service for code reviews with Github integration to a series of dedicated models going back to 2021.

Also, while I do enjoy the ChatGPT app integration for quick on-the-go work made easier with a Clicks keyboard, I am getting more annoyed by the drift between Codex VSCode, Codex Website and Codex in the ChatGPT mobile app. The Website has a very helpful Ask button, which can also be used to launch subtasks via prompts written by the model, but such a button is not present in the VSCode plugin, despite subtasks being something you can launch from the VSCode plugin if you have used Ask via the website first. Meanwhile, the iOS app has no Ask button and no sub task support and neither the app, nor VSCode plugin show remote work done beyond abbreviations, whereas the web page does show everything. Then there are the differences between local and remote via VSCode and the CLI, ... To people not using Codex, this must sound insane and barely understandable, but it seems that is the outcome of spreading yourself across so many fields. CLI, dedicated models, VSCode plugin, mobile app, code review, web page, some like Anthropic only work on one or two, others like Augment three, but no one else does that much, for better and worse.

I like using Codex, but it is a mess with such massive potential that needs a dedicated team lead whose only focus is to untangle this mess, before adding more features. Alternatively, maybe interview a few power user on their actual day to day experience, those that aren't just in one, but are using multiple or all parts of Codex. There is a lot of insight to be gained from someone who has an overview off the entire product stack, I think. Sending out a questionnaire to top users would be a good start, I'd definitely answer.

Llama.cpp: Deterministic Inference Mode (CUDA): RMSNorm, MatMul, Attention

The 12-Month Bug: Why Your Side Project Is Still Just an Idea

Setting Up a Professional C++ Development Environment on Ubuntu

Stringwa.rs on GPUs: Databases and Bioinformatics

Mirai Variant "Gayfemboy" Infecting 15K+ Devices Daily – Mitigation Ideas?

Sleep strengthens muscle and bone by boosting growth hormone levels

Why you should care about the JDBC fetch size

The 4p Developer

Minimalist Minecraft server for memory-restrictive embedded systems

Quantum Motion Installs First CMOS-Fabricated Quantum Computer at UK NQCC

Mint

Is This a Scam Project?

China is quietly saving the world from climate change

A Slotted Hash Cons for Alpha Invariance

Show HN: Daestro – cloud agnostic compute workload orchestrator

Filtering After Shading with Stochastic Texture Filtering

Chromebook SuzyQ cable open hardware: simple closed debugging cable breakout PCB

Show HN: Ads-free Win98 minesweeper game vibe coded with Cursor

10,500 tokens/SEC per request on Nvidia hardware

Linking to text fragments with a bookmarklet – alexwlchan

Defiant nuns flee care home for their abandoned convent in the Alps

Canonical announces it will support and distribute Nvidia CUDA in Ubuntu

A Millennial's DVD Collection: I'm Returning to Physical Discs – NN/G

RFS for AI Alignment

SGS-1 – A SOTA foundation model for engineering CAD

Show HN: Indie Alternative to iOS 26 Call Screening

You can get Nvidia's CUDA on three popular enterprise Linux distros now

Kioxia Developing 100M IOPS SSD for Nvidia – Blocks and Files

The Age of the Super IC

Google faces lawsuit from publishers after they see a decline in their traffic