Ask HN: Claude Opus 4.5 vs. GPT 5.1 Codex Max for coding. Worth the upgrade?

5•terabytest•1mo ago

I’m using gpt-5.1-codex-max comfortably for coding and hitting the weekly limit sometimes (but a few extra credits usually cover it).

I’ve heard Opus 4.5 might be better for coding. SWE-bench shows an 8% improvement but I'm having a hard time guessing what kind of effect that maps to in reality. For those who’ve switched, what changes have you seen, and how has it affected your work? Is the $100/month upgrade worth it?

Comments

chaidhat•1mo ago

You must have immense patience to daily drive codex. To be honest, I’ve observed better code quality from codex (in terms of separation of concerns, high cohesion loose coupling, etc.) but Opus has great quality at roughly 1/3rd of the speed. Try it on Cursor maybe then decide if you want to switch. I’m curious — have you tried gemini pro 3 and do you thibk deserve the hype?

dalmo3•1mo ago

I'm using Opus through Cursor, but not a heavy user so price is "the same".

Opus is so good I can actually give it a task and move my attention somewhere else. So although the model itself is much slower my general workflow is faster and less frustrating.

sourdoughness•1mo ago

Using Opus 4.5 through VScode/CoPilot gives so much better results than anything else I’ve tried that I kept paying when they briefly made it 3x token rate.

I really like the interaction flows better than Gemini 3 or Codex, though I can’t quite quantify why. The amount of explanation/supporting material in Opus’s output feels just right to me.

djinnrutger•1mo ago

I have been using VSCode / CoPilot with Opus 4.5 and it has been working the best of any of the system I have tried. Very happy with it so far. I never really got good results with GPT5.1. though 5.2 seems better...but not by alot, so I will stick with Opus 4.5 for now.

muzani•1mo ago

I'm fine with just Copilot.

Opus 4.5 has excellent tool use, meaning it can jump in and out of a broad undocumented codebase better. It can evaluate what the code is trying to do. It's perfect for PRs - caught things like people submitting code that looks right, but ended up running a poorly documented/incomplete method.

GPT codex just messes up a lot for me. Whatever I'm doing with it, it's not working. The plain GPT-5.2 is good overall, but it confidently makes mistakes and tell you that it's done.

If you have an excellent codebase, GPT 5.2 might actually work better. If you're not sure what you're doing or are using AI to find out how things work, then Opus 4.5 is great.

The Claude models are also very much behind in terms of UI and visuals.

Take note that a lot of the benchmarks are on Python. What I'm finding is all the major ones make mistakes, but they make mistakes differently. OpenAI and Anthropic tend to mimic one another for some reason, while Grok and Gemini tend to give very different answers.

otekengineering•1mo ago

I'm impressed with Opus 4.5. It's been useful working on firmware projects where earlier models were of negative value.

Here's an example of a one-shot output, the only change I made was Replace All 'battlezone'->'battleclone':

"build a clone of the classic arcade game battlezone using SVG graphics that are calculated on the fly for the required vector wireframe graphics"

https://omnispect.dev/battleclone00.html

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

Ask HN: Non AI-obsessed tech forums

Ask HN: Ideas for small ways to make the world a better place

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

AI Regex Scientist: A self-improving regex solver

Ask HN: Who wants to be hired? (February 2026)

Ask HN: Who is hiring? (February 2026)

Ask HN: Any International Job Boards for International Workers?

Tell HN: Another round of Zendesk email spam

Ask HN: Why LLM providers sell access instead of consulting services?

Ask HN: Is Connecting via SSH Risky?

Ask HN: What is the most complicated Algorithm you came up with yourself?

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

Ask HN: How does ChatGPT decide which websites to recommend?

Ask HN: Is it just me or are most businesses insane?

Ask HN: Mem0 stores memories, but doesn't learn user patterns

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

Ask HN: Is there anyone here who still uses slide rules?

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

Kernighan on Programming

We built a serverless GPU inference platform with predictable latency

Ask HN: How Did You Validate?

Ask HN: Does a good "read it later" app exist?

Ask HN: Cheap laptop for Linux without GUI (for writing)

Ask HN: Have you been fired because of AI?

Ask HN: Anyone have a "sovereign" solution for phone calls?

Test management tools for automation heavy teams

Ask HN: OpenClaw users, what is your token spend?

Ask HN: Has anybody moved their local community off of Facebook groups?

Ask HN: Are "provably fair" JavaScript games trustless?