frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

GPT-5-Codex

https://openai.com/index/introducing-upgrades-to-codex/
124•meetpateltech•4h ago

Comments

incomingpain•4h ago
Still waiting on codex cli to support lm studio.
NitpickLawyer•1h ago
? Isn't lmstudio API openai compatible? Codex cli already supports 3rd party models, you have to edit the config yaml file, and you can add many model providers.
Tiberium•4h ago
Only an 1.7% upgrade on SWE-Bench compared to GPT-5, but 33.9 vs 51.3% on their internal code refactoring benchmark. This seems like an Opus 4.1-like upgrade, which is nice to see and means they're serious about Codex.
alvis•4h ago
It's interest to see this quote: `for the bottom 10% of user turns sorted by model-generated tokens (including hidden reasoning and final output), GPT‑5-Codex uses 93.7% fewer tokens than GPT‑5`

It sounds like it can make simple tasks much more correct. It's impressive to me. Today coding agent tends to pretend they're working hard by generating lots of unnecessary code. Hope it's true

bn-l•2h ago
This is my issue with gpt-5. If you use the low or medium reasoning it’s garbage. If you use high, it’ll think for up to five minutes on something dead simple.
srcreigh•28m ago
Can you be more specific about what type of code you're talking about, and what makes it garbage?

I'm happy with medium reasoning. My projects have been in Go, Typescript, React Dockerfiles stuff like that. The code almost always works, it's usually not "Clean code" though.

jumploops•3h ago
Interesting, the new model's prompt is ~half the size (10KB vs. 23KB) of the previous prompt[0][1].

SWE-bench performance is similar to normal gpt-5, so it seems the main delta with `gpt-5-codex` is on code refactors (via internal refactor benchmark 33.9% -> 51.3%).

As someone who recently used Codex CLI (`gpt-5-high`) to do a relatively large refactor (multiple internal libs to dedicated packages), I kept running into bugs introduced when the model would delete a file and then rewrite in (missing crucial or important details). My approach would have been to just the copy the file over and then make package-specific changes, so maybe better tool calling is at play here.

Additionally, they claim the new model is more steerable (both with AGENTS.md and generally). In my experience, Codex CLI w/gpt-5 is already a lot more steerable than Claude Code, but any improvements are welcome!

[0]https://github.com/openai/codex/blob/main/codex-rs/core/gpt_...

[1]https://github.com/openai/codex/blob/main/codex-rs/core/prom...

pants2•3h ago
Interestingly, "more steerable" can sometimes be a bad thing, as it will tend to follow your prompt to the letter even if that's against your interests. It requires better prompting and generally knowing what you're doing - might be worse for vibe-coders and better for experienced SWEs.
htrp•3h ago
think they're indexing here for professional work (people in the VSCode terminal)
jumploops•43m ago
Yes, given a similarly sparse prompt, Claude Code seems to perform "better" because it eagerly does things you don't necessarily know to ask

GPT-5 may underwhelm with the same sparse prompt, as it seems to do exactly what's asked, not more

You can still "fully vibe" with GPT-5, but the pattern works better in two steps:

1. Plan (iterate on high-level spec/PRD, split into actions)

2. Build (work through plans)

Splitting the context here is important, as any LLM will perform worse as the context gets more polluted.

tedsanders•3h ago
> SWE-bench performance is similar to normal gpt-5, so it seems the main delta with `gpt-5-codex` is on code refactors

SWE-bench is a great eval, but it's very narrow. Two models can have the same SWE-bench scores but very different user experiences.

Here's a nice thread on X about the things that SWE-bench doesn't measure:

https://x.com/brhydon/status/1953648884309536958

dwaltrip•1h ago
so annoying you cant read replies without an account nowadays
Tiberium•51m ago
Use Nitter, the main instance works but there are a lot of other instances as well.

https://nitter.net/brhydon/status/1953648884309536958

siva7•1h ago
So you're all saying suddenly codex cli w gpt 5 codex is better than claude code? Hard to believe
jumploops•47m ago
Not suddenly, it's been better since GPT-5 launched.

Prompting is different, but in a good way.

With Claude Code, you can use less prompting, and Claude will get token happy and expand on your request. Great for greenfield/vibing, bad for iterating on existing projects.

With Codex CLI, GPT-5 seems to handle instructions much more precisely. It won't just go off on it's own and do a bunch of work, it will do what you ask.

I've found that being more specific up-front gets better results with GPT-5, whereas with Claude, being more specific doesn't necessarily stop the eagerness of it's output.

As with all LLMs, you can't compare apples to oranges, so to clarify, my experiences are primarily with Typescript and Rust codebases.

srcreigh•37m ago
Codex CLI of course will sometimes do the wrong thing, or sometimes do something extra that you didn't intend for it to do.

It seems about half my sessions quickly become "why did you do that? rip __ out and just do ___". Then again, most of the other sessions involve Codex correctly inferring what I wanted without having to be so specific.

Topfi•3h ago
One major improvement I have seen today, even before I saw the announcement, was that the model is far more reliable in using the Task Completion interface to communicate what stage of the prompt is being implemented. Previously this was only shown sparingly (especially in the first few weeks) and if, it didn't properly tick tasks, simply jumping from the first to completion at the end. Now this works very reliably and I do like this improvement, but if I didn't know better, would have suspected this was merely the result of a system prompt change, considering GPT-5 adherence being very solid in my experience, this should have been fixable without a tuned model. Nevertheless, I like this improvement (arguably fix of a previously broken feature).

Beyond that, purely anecdotal and subjective, but this model does seem to do extensive refactors with semi precise step-by-step guidance a bit faster (comparing GPT-5 Thinking (Medium) and GPT-5 Codex (Medium)), though adherence to prompts seems roughly equivalent between the two as of now. In any case, I really feel they should consider a more nuanced naming convention.

New Claude Sonnet 3.7 was a bit of a blunder, but overall, Anthropic has their marketing in tight order compared to OpenAI. Claude Code, Sonnet, Opus, those are great, clear differentiating names.

Codex meanwhile can mean anything from a service for code reviews with Github integration to a series of dedicated models going back to 2021.

Also, while I do enjoy the ChatGPT app integration for quick on-the-go work made easier with a Clicks keyboard, I am getting more annoyed by the drift between Codex VSCode, Codex Website and Codex in the ChatGPT mobile app. The Website has a very helpful Ask button, which can also be used to launch subtasks via prompts written by the model, but such a button is not present in the VSCode plugin, despite subtasks being something you can launch from the VSCode plugin if you have used Ask via the website first. Meanwhile, the iOS app has no Ask button and no sub task support and neither the app, nor VSCode plugin show remote work done beyond abbreviations, whereas the web page does show everything. Then there are the differences between local and remote via VSCode and the CLI, ... To people not using Codex, this must sound insane and barely understandable, but it seems that is the outcome of spreading yourself across so many fields. CLI, dedicated models, VSCode plugin, mobile app, code review, web page, some like Anthropic only work on one or two, others like Augment three, but no one else does that much, for better and worse.

I like using Codex, but it is a mess with such massive potential that needs a dedicated team lead whose only focus is to untangle this mess, before adding more features. Alternatively, maybe interview a few power user on their actual day to day experience, those that aren't just in one, but are using multiple or all parts of Codex. There is a lot of insight to be gained from someone who has an overview off the entire product stack, I think. Sending out a questionnaire to top users would be a good start, I'd definitely answer.

brador•2h ago
Take off the guardrails and let humanity thrive.

It is inevitable.

ianbutler•2h ago
I just want the codex models in the API, I won’t touch them until then.

And before someone says it, I do happen to have my own codex like environment complete with development containers, browser, github integration, etc.

And I'm happy to pay a mint for access to the best models.

greyb•1h ago
They've said it's coming:

>For developers using Codex CLI via API key, we plan to make GPT‑5-Codex available in the API soon.

ianbutler•1h ago
I saw that, but soon doesn’t inspire confidence, and is easy to overlook if they don’t. They didn’t with the previous Codex model.
robotswantdata•2h ago
Codex CLI IDE just works, very impressed with the quality. If you tried it a while back and didn’t like it, try it again via the vscode extension generous usage included with plus.

Ditched my Claude code max sub for the ChatGPT pro $200 plan. So much faster, and not hit any limits yet.

steinvakt2•1h ago
Im using Cursor with the $20 plan and hit rate limits after 15 days (so im paying extra the rest of the month). What do you recommend I do?
robotswantdata•1h ago
You could get two plus accounts? Or maybe a business account with two seats?

The $200 pro feels good value personally.

poszlem•1h ago
Wait, what? They now allow claude code like subscription instead of the API too?
robotswantdata•1h ago
Yes for at least a month. Download the vscode extension and sign in with ChatGPT
Tiberium•1h ago
Yes, just do "codex login" and it'll use your ChatGPT subscription.
j45•1h ago
Has anyone hit any programming usage limits with the ChatGPT 5 Pro account?
robotswantdata•48m ago
None yet, feels unlimited. Huge repos too
trilogic•58m ago
Is so good that need advertising, LOL. (Anthropic enters chat)... Jokes apart, I guess it is good (cause I cant afford it to try). (Meta enters chat)... Deepseek R2 enters chat and drop mike. Anthtopic, Meta, OpenAi, Google, leaves chat and call Trump for another Dinner.

No really Jokes apart, ChatGPT and Trump are awesome, all works and keeps getting better by the day.

laidoffamazon•44m ago
I agree because I don’t want to get fired or get audited
stopachka•42m ago
Very impressive. I've been working on a shared background presence animation, and have been testing out Claude and Codex. (By shared presence, I mean imagine a page's background changing based on where everyone's cursor is)

Both were struggling yesterday, with Claude being a bit ahead. Their biggest problems came with being "creative" (their solutions were pretty "stock"), and they had trouble making the simulation.

Tried the same problem on Codex today. The design it came up with still felt a bit lackluster, but it did _a lot_ better on the simulation.

simianwords•38m ago
OpenAI is starting its new era of specialized models. Guess they gave up on a monolithic model approach
simianwords•33m ago
The code review thing might be my favorite UX for AI based development. Largely stays out of your way and provides good comments.

I’m imagining if it can navigate the codebase and modify tests - like add new cases or break the tests by changing a few lines. This can actually verify if the tests were doing actual assertions and being useful.

Thorough reviewing like this probably benefits me the most - more than AI assisted development.

klipklop•19m ago
From my observation of the past 2 weeks is that Claude Code is getting dramatically worse and super low usage quota's while OpenAI Codex is getting great and has a very generous usage quota in comparison.

For people that have not tried it in say ~1 month, give Codex CLI a try.

king_magic•3m ago
[delayed]

Hosting a website on a disposable vape

https://bogdanthegeek.github.io/blog/projects/vapeserver/
425•BogdanTheGeek•3h ago•319 comments

Addendum to GPT-5 system card: GPT-5-Codex

https://openai.com/index/gpt-5-system-card-addendum-gpt-5-codex/
112•wertyk•2h ago•67 comments

Wanted to spy on my dog, ended up spying on TP-Link

https://kennedn.com/blog/posts/tapo/
227•kennedn•5h ago•67 comments

React is winning by default and slowing innovation

https://www.lorenstew.art/blog/react-won-by-default/
107•dbushell•3h ago•115 comments

macOS Tahoe

https://www.apple.com/os/macos/
142•Wingy•4h ago•162 comments

PayPal to support Ethereum and Bitcoin

https://newsroom.paypal-corp.com/2025-09-15-PayPal-Ushers-in-a-New-Era-of-Peer-to-Peer-Payments,-...
292•DocFeind•7h ago•248 comments

Deaths are projected to exceed births in 2031

https://www.cbo.gov/publication/61390
23•johntfella•1h ago•11 comments

Launch HN: Trigger.dev (YC W23) – Open-source platform to build reliable AI apps

112•eallam•6h ago•44 comments

GPT-5-Codex

https://openai.com/index/introducing-upgrades-to-codex/
124•meetpateltech•4h ago•35 comments

Scryer Prolog Meetup 2025

https://hsd-pbsa.de/veranstaltung/scryer-prolog-meetup-2025/
16•aarroyoc•1h ago•1 comments

How big a solar battery do I need to store all my home's electricity?

https://shkspr.mobi/blog/2025/09/how-big-a-solar-battery-do-i-need-to-store-all-my-homes-electric...
211•FromTheArchives•8h ago•319 comments

CubeSats are fascinating learning tools for space

https://www.jeffgeerling.com/blog/2025/cubesats-are-fascinating-learning-tools-space
146•warrenm•7h ago•62 comments

Boring work needs tension

https://iaziz786.com/blog/boring-work-needs-tension/
77•iaziz786•5h ago•45 comments

How to self-host a web font from Google Fonts

https://blog.velocifyer.com/Posts/3,0,0,2025-8-13,+how+to+self+host+a+font+from+google+fonts.html
95•Velocifyer•6h ago•88 comments

GuitarPie: Electric Guitar Fretboard Pie Menus

https://andreasfender.com/publications.php
13•DonHopkins•6h ago•1 comments

How People Use ChatGPT [pdf]

https://cdn.openai.com/pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatgpt-usage-p...
13•nycdatasci•2h ago•0 comments

When Your Father Is a Magician, What Do You Believe?

https://thereader.mitpress.mit.edu/when-your-father-is-a-magician-what-do-you-believe/
8•pseudolus•3d ago•0 comments

Removing newlines in FASTA file increases ZSTD compression ratio by 10x

https://log.bede.im/2025/09/12/zstandard-long-range-genomes.html
219•bede•3d ago•83 comments

Turgot Map of Paris

https://en.wikipedia.org/wiki/Turgot_map_of_Paris
26•Michelangelo11•2d ago•6 comments

RustGPT: A pure-Rust transformer LLM built from scratch

https://github.com/tekaratzas/RustGPT
318•amazonhut•11h ago•158 comments

Asciinema CLI 3.0 rewritten in Rust, adds live streaming, upgrades file format

https://blog.asciinema.org/post/three-point-o/
252•ku1ik•5h ago•49 comments

The Mac App Flea Market

https://blog.jim-nielsen.com/2025/mac-app-flea-market/
301•ingve•14h ago•120 comments

Folks, we have the best π

https://lcamtuf.substack.com/p/folks-we-have-the-best
296•fratellobigio•14h ago•81 comments

Show HN: AI-powered web service combining FastAPI, Pydantic-AI, and MCP servers

https://github.com/Aherontas/Pycon_Greece_2025_Presentation_Agents
27•Aherontas•1d ago•6 comments

Researchers revive the pinhole camera for next-gen infrared imaging

https://phys.org/news/2025-09-revive-pinhole-camera-gen-infrared.html
27•wglb•3d ago•1 comments

A string formatting library in 65 lines of C++

https://riki.house/fmt
38•PaulHoule•5h ago•14 comments

Self-Assembly Gets Automated in Reverse of 'Game of Life'

https://www.quantamagazine.org/self-assembly-gets-automated-in-reverse-of-game-of-life-20250910/
39•kjhughes•3d ago•7 comments

Show HN: Blocks – Dream work apps and AI agents in minutes

https://blocks.diy
4•shelly_•1h ago•0 comments

California’s Alo Slebir unofficially broke the big wave surfing world record

https://www.sfgate.com/sports/article/alo-slebir-mavericks-big-wave-surf-record-21041864.php
45•danielmorozoff•2d ago•35 comments

GPT‑5-Codex and upgrades to Codex

https://simonwillison.net/2025/Sep/15/gpt-5-codex/
12•amrrs•2h ago•0 comments