GPT‑5.4 Mini and Nano

https://openai.com/index/introducing-gpt-5-4-mini-and-nano

83•meetpateltech•1h ago

Comments

machinecontrol•1h ago

What's the practical advantage of using a mini or nano model versus the standard GPT model?

aavci•1h ago

Cheaper. Every month or so I visit the models used and check whether they can be replaced by the cheapest and smallest model possible for the same task. Some people do fine tuning to achieve this too.

powera•1h ago

I've been waiting for this update.

For many "simple" LLM tasks, GPT-5-mini was sufficient 99% of the time. Hopefully these models will do even more and closer to 100% accuracy.

The prices are up 2-4x compared to GPT-5-mini and nano. Were those models just loss leaders, or are these substantially larger/better?

HugoDias•1h ago

For us, it was also pretty good, but the performance decreased recently, that forced us to migrate to haiku-4.5. More expensive but much more reliable (when anthropic up, of course).

throwaway911282•1h ago

they dont change the model weights (no frontier lab does). if you have evals and all prompts, tool calls the same, I'm curious how you are saying performance decreased..

HugoDias•1h ago

According to their benchmarks, GPT 5.4 Nano > GPT-5-mini in most areas, but I'm noticing models are getting more expensive and not actually getting cheaper?

GPT 5 mini: Input $0.25 / Output $2.00

GPT 5 nano: Input: $0.05 / Output $0.40

GPT 5.4 mini: Input $0.75 / Output $4.50

GPT 5.4 nano: Input $0.20 / Output $1.25

simianwords•1h ago

models are getting costlier but by performance getting cheaper. perhaps they don't see a point supporting really low performance models?

HugoDias•1h ago

I would be curious to know if from the enterprise / API consumption perspective, these low-performance models aren't the most used ones. At least it matches our current scenario when it comes to tokens in / tokens out. I'd totally buy the price increase if these are becoming more efficient though, consuming less tokens.

karmasimida•10m ago

Those are bigger models. The serving isn’t going to be cheaper.

Why expect cheaper then? The performance is also better

ryao•1h ago

I will be impressed when they release the weights for these and older models as open source. Until then, this is not that interesting.

simianwords•1h ago

why isn't nano available in codex? could be used for ingesting huge amount of logs and other such things

BoumTAC•1h ago

To me, mini releases matter much more and better reflect the real progress than SOTA models.

The frontier models have become so good that it's getting almost impossible to notice meaningful differences between them.

Meanwhile, when a smaller / less powerful model releases a new version, the jump in quality is often massive, to the point where we can now use them 100% of the time in many cases.

And since they're also getting dramatically cheaper, it's becoming increasingly compelling to actually run these models in real-life applications.

pzo•1h ago

they do are cheaper than SOTA but not getting dramatically cheaper but actually the opposite - GPT 5.4 mini is around ~3x more expensive than GPT 5.0 mini.

Similarly gemini 3.1 flash lite got more expensive than gemini 2.5 flash lite.

BoumTAC•55m ago

But they are getting dramatically better.

What's the point of a crazy cheap model if it's shit ?

I code most of the time with haiku 4.5 because it's so good. It's cheaper for me than buying a 23€ subscription from Anthropic.

philipkglass•38m ago

The crazy cheap models may be adequate for a task, and low cost matters with volume. I need to label millions of images to determine if they're sexually suggestive (this includes but is not limited to nudity). The Gemini 2.0 Flash Lite model is inexpensive and good at this. Gemini 2.5 Flash Lite is also good at this, but not noticeably better, and it costs more. When 2.0 gets retired this June my costs are going up.

brikym•48m ago

If you're doing something common then maybe there are no differences with SOTA. But I've noticed a few. GPT 5.4 isn't as good at UI work in svelte. Gemini tends to go off and implement stuff even if I prompt it to discuss but it's pretty good at UI code. Claude tends to find out less about my code base than GPT and it abuses the any type in typescript.

cbg0•58m ago

Based on the SWE-Bench it seems like 5.4 mini high is ~= GPT 5.4 low in terms of accuracy and price but the latency for mini is considerably higher at 254 seconds vs 171 seconds for GPT5.4. Probably a good option to run at lower effort levels to keep costs down for simpler tasks. Long context performance is also not great.

beklein•55m ago

As a big Codex user, with many smaller requests, this one is the highlight: "In Codex, GPT‑5.4 mini is available across the Codex app, CLI, IDE extension and web. It uses only 30% of the GPT‑5.4 quota, letting developers quickly handle simpler coding tasks in Codex for about one-third the cost." + Subagents support will be huge.

hyperbovine•43m ago

Having to invoke `/model` according to my perceived complexity of the request is a bit of a deal breaker though.

serf•39m ago

you use profiles for that [0], or in the case of a more capable tool (like opencode) they're more confusing referred to as 'agents'[1] , which may or may not coordinate subagents..

So, in opencode you'd make a "PR Meister" and "King of Git Commits" agent that was forced to use 5.4mini or whatever, and whenever it fell down to using that agent it'd do so through the preferred model.

For example, I use the spark models to orchestrate abunch of sub-agents that may or may not use larger models, thus I get sub-agents and concurrency spun up very fast in places where domain depth matter less.

[0]: https://developers.openai.com/codex/config-advanced#profiles [1]: https://opencode.ai/docs/agents/

miltonlost•51m ago

Does it still help drive people to psychosis and murder and suicide? Where's the benchmark for that?

system2•50m ago

I am feeling the version fatigue. I cannot deal with their incremental bs versions.

yomismoaqui•46m ago

Not comparing with equivalent models from Anthropic or Google, interesting...

Tiberium•33m ago

They did actually compare them in the tweet, see https://x.com/OpenAI/status/2033953592424731072

Direct image: https://pbs.twimg.com/media/HDoN4PhasAAinj_?format=png&name=...

casey2•46m ago

I googled all the testimonial names and they are all linked-in mouthpieces.

Tiberium•39m ago

I checked the current speed over the API, and so far I'm very impressed. Of course models are usually not as loaded on the release day, but right now:

- Older GPT-5 Mini is about 55-60 tokens/s on API normally, 115-120 t/s when used with service_tier="priority" (2x cost).

- GPT-5.4 Mini averages about 180-190 t/s on API. Priority does nothing for it currently.

- GPT-5.4 Nano is at about 200 t/s.

To put this into perspective, Gemini 3 Flash is about 130 t/s on Gemini API and about 120 t/s on Vertex.

This is raw tokens/s for all models, it doesn't exclude reasoning tokens, but I ran models with none/minimal effort where supported.

And quick price comparisons:

- Claude: Opus 4.6 is $5/$25, Sonnet 4.6 is $3/$15, Haiku 4.5 is $1/$5

- GPT: 5.4 is $2.5/$15 ($5/$22.5 for >200K context), 5.4 Mini is $0.75/$4.5, 5.4 Nano is $0.2/$1.25

- Gemini: 3.1 Pro is $2/$12 ($3/$18 for >200K context), 3 Flash is $0.5/$3, 3.1 Flash Lite is $0.25/$1.5

coder543•4m ago

[delayed]

6thbit•37m ago

Looking at the long context benchmark results for these, sounds like they are best fit for also mini-sized context windows.

Is there any harness with an easy way to pick a model for a subagent based on the required context size the subagent may need?

bananamogul•23m ago

They could call them something like “sonnet” and “haiki” maybe.

reconnecting•23m ago

All three ChatGPT models (Instant, Thinking, and Pro) have a new knowledge cutoff of August 2025.

Seriously?

zild3d•13m ago

whats surprising about that? most of the minor version updates from all the labs are post training updates / not changing knowledge cutoff

dpoloncsak•6m ago

Do you find the results vary based on whether it uses RAG to hit the internet vs the data being in the weights itself?

varispeed•16m ago

I stopped paying attention to GPT-5.x releases, they seem to have been severely dumbed down.

pscanf•11m ago

I quite like the GPT models when chatting with them (in fact, they're probably my favorites), but for agentic work I only had bad experiences with them.

They're incredibly slow (via official API or openrouter), but most of all they seem not to understand the instructions that I give them. I'm sure I'm _holding them wrong_, in the sense that I'm not tailoring my prompt for them, but most other models don't have problem with the exact same prompt.

Does anybody else have a similar experience?

nikanj•6m ago

Same, and I can't put my finger on the "why" either. Plus I keep hitting guard rails for the strangest reasons, like telling codex "Add code signing to this build pipeline, use the pipeline at ~/myotherproject as reference" and codex tells me "You should not copy other people's code signing keys, I can't help you with this"

tom1337•3m ago

Yea absolutely. I am using GPT 5.2 / 5.2 Codex with OpenCode and it just doesn't get what I am doing or looses context. Claude on the other side (via GitHub Copilot) has no problem and also discovers the repository on it's own in new sessions while I need to basically spoonfeed GPT. I also agree on the speed. Earlier today I tasked GPT 5.2 Codex with a small refactor of a task in our codebase with reasoning to high and it took 20 minutes to move around 20 files.

Huckle: Detect operational problems 30–90 days before they appear in metrics

Show HN: A minimalist dungeon-crawler card game built with Deno

Missiles a Month vs. 7 Interceptors – Why Centcom Shifted to Factories

Toaster Settings: AI Agents and Classical French Cooking Techniques [video]

The Sky Tonight

Padel Chess – tactical simulator for padel

How OpenClaw's Memory System Works

Show HN: Build a knowledge graph from unstructured text in Python

I built a free site that can tell you if your hardware can run a model

PgBeam, a globally distributed PostgreSQL proxy

Words on Words on Words

Syntaqlite: High-fidelity devtools that SQLite deserves

Show HN: Flotilla – An orchestrator for persistent agent fleets on Apple Silicon

Show HN: I can no longer afford the silicon. Here is my autonomous HPC agent

When Science Goes Agentic

Java 26 is here, and with it a solid foundation for the future

The Los Angeles Aqueduct Is Wild

Consent.txt – compile one AI policy into robots.txt, AIPREF, and headers

Women are being abandoned by their partners on hiking trails

Show HN: Chrome extension that hijacks any site's own API to modify it

Reducing quarantine delay 83% using Genetic Algorithms for playbook optimization

Node.js blocks PR from dev because he used Claude Code to create it

Python 3.15's JIT is now back on track

Remote Control for Agents

Danger Coffee: Mold-Free Remineralized Coffee Replaces What Regular Coffee Takes

Building a dry-run mode for the OpenTelemetry collector

LotusNotes

Austin draws another billionaire as Uber co-founder joins California exodus

Deep Data Insights for Polymarket Traders

Show HN: A simple dream to fit in every traveler's pocket

GPT‑5.4 Mini and Nano

Comments

Huckle: Detect operational problems 30–90 days before they appear in metrics

Show HN: A minimalist dungeon-crawler card game built with Deno

Missiles a Month vs. 7 Interceptors – Why Centcom Shifted to Factories

Toaster Settings: AI Agents and Classical French Cooking Techniques [video]

The Sky Tonight

Padel Chess – tactical simulator for padel

How OpenClaw's Memory System Works

Show HN: Build a knowledge graph from unstructured text in Python

I built a free site that can tell you if your hardware can run a model

PgBeam, a globally distributed PostgreSQL proxy

Words on Words on Words

Syntaqlite: High-fidelity devtools that SQLite deserves

Show HN: Flotilla – An orchestrator for persistent agent fleets on Apple Silicon

Show HN: I can no longer afford the silicon. Here is my autonomous HPC agent

When Science Goes Agentic

Java 26 is here, and with it a solid foundation for the future

The Los Angeles Aqueduct Is Wild

Consent.txt – compile one AI policy into robots.txt, AIPREF, and headers

Women are being abandoned by their partners on hiking trails

Show HN: Chrome extension that hijacks any site's own API to modify it

Reducing quarantine delay 83% using Genetic Algorithms for playbook optimization

Node.js blocks PR from dev because he used Claude Code to create it

Python 3.15's JIT is now back on track

Remote Control for Agents

Danger Coffee: Mold-Free Remineralized Coffee Replaces What Regular Coffee Takes

Building a dry-run mode for the OpenTelemetry collector

LotusNotes

Austin draws another billionaire as Uber co-founder joins California exodus

Deep Data Insights for Polymarket Traders

Show HN: A simple dream to fit in every traveler's pocket