OpenAI dropped the price of o3 by 80%

https://twitter.com/sama/status/1932434606558462459

490•mfiguiere•1d ago

Comments

minimaxir•1d ago

...how? I'd understand a 20-30% price drop from infra improvements for a model as-is, but 80%?

I wonder if "we quantized it lol" would classify as false advertising for modern LLMs.

tofof•1d ago

Presumably because the price was about 5x higher to begin with than any the competitors at the same tier of performance? Perhaps it's better to get paid anything at all than to just lose 100% of the customers.

drexlspivey•1d ago

Deepseek made a few major innovations allowing them to achieve major compute efficiency and then published them. My guess is that OpenAI just implemented these themselves.

vitaflo•1d ago

Wouldn’t surprise me. And even with this price cut it’s still 4x more expensive than Deepseek R1 is.

ilaksh•1d ago

Maybe because they also are releasing o3-pro.

MallocVoidstar•1d ago

Note that they have not actually dropped the price yet: https://x.com/OpenAIDevs/status/1932463601119637532

> We’ll post to @openaidevs once the new pricing is in full effect. In $10… 9… 8…

There is also speculation that they are only dropping the input price, not the output price (which includes the reasoning tokens).

sunaookami•1d ago

I think that was a joke. New pricing is already in place:

Input: $2.00 / 1M tokens

Cached input: $0.50 / 1M tokens

Output: $8.00 / 1M tokens

https://openai.com/api/pricing/

Now cheaper than gpt-4o and same price as gpt-4.1 (!).

rvnx•1d ago

It is slower though

MallocVoidstar•1d ago

No, people had tested it after Altman's announcement and had confirmed that they were still being billed at the original price. And I checked the docs ~1h after and they still showed the original price.

The speculation of only input pricing being lowered was because yesterday they gave out vouchers for 1M free input tokens while output tokens were still billed.

runako•1d ago

> Now cheaper than gpt-4o and same price as gpt-4.1 (!).

This is where the naming choices get confusing. "Should" o3 cost more or less than GPT-4.1? Which is more capable? A generation 3 of tech intuitively feels less advanced than a 4.1 of a (similar) tech.

jacob019•1d ago

Do we know parameter counts? The reasoning models have typically been cheaper per token, but use more tokens. Latency is annoying. I'll keep using gpt-4.1 for day-to-day.

koakuma-chan•1d ago

o3 is a reasoning model, GPT-4.1 is not. They are orthogonal.

runako•1d ago

My quibble is with naming choices and differentiating. Even here they are confusing:

- o4 is reasoning

- 4o is not

They simply do not do a good job of differentiating. Unless you work directly in the field, it is likely not obvious what is the difference between "our most powerful reasoning model" and "our flagship model for complex tasks."

"Does my complex task need reasoning or not?" seems to be how one would choose. (What type of task is complex but does not require any reasoning?) This seems less than ideal!

koakuma-chan•1d ago

This is true, and I believe apps automatically route requests to appropriate models for normie users.

agsqwe•1d ago

thinking models produce a lot of internal output tokens making them more expensive than non-reasoning models for similar prompt and visible output lengths

vitaflo•1d ago

Still 4x more expensive than Deepseek R1 tho.

teaearlgraycold•1d ago

Personally I've found these bigger models (o3/Claude 4 Opus) to be disappointing for coding.

apwell23•1d ago

i found them all disappointing in their own ways. Atleast deepseek models actually listen to what i say instead of ignoring me doing their own thing like a toddler.

rvnx•1d ago

Opus is really great but through Claude Code. If you used Cursor or RooCode it could be normal that you get disappointed

bitpush•1d ago

This matches my experience, but cant explain it. Do you know what's going on?

eunoia•1d ago

My understanding is context size. Companies like Cursor are trying to minimize the amount of context sent to the models to keep their own costs down. Claude Code seems to send a lot more context with every request and that seems to make the difference.

supermdguy•1d ago

Just guessing, but the new Opus was probably RL tuned to work better with Claude Code's tool calls

jedisct1•1d ago

I got the opposite experience. Not with Opus (too expensive), but with Sonnet. I got things done way more efficiently when using Sonnet with Roo than with Claude Code.

rgbrenner•1d ago

same. i ran a few tests ($100 worth of api calls) with opus 4 and didn’t see any difference compared to sonnet 4 other than the price.

also no idea why he thinks roo is handicapped when claude code nerfs the thinking output and requires typing “think”/think hard/think harder/ultrathink just to expand the max thinking tokens.. which on ultrathink only sets it at 32k… when the max in roo is 51200 and it’s just a setting.

rvnx•22h ago

I think I could share a trick that could help:

From my experience (so not an ultimate truth) Claude is not so great at taking the decision for planning by its own: it dives immediately into coding.

If you ask it to think step-by-step it still doesn’t do it but Gemini 2.5 Pro is good at that planning but terrible at actual coding.

So you can use Gemini as planner and Claude as programmer and you get something decent on RooCode.

This “think wisely” that you have to repeat 10x in the prompt is absolutely true

rgbrenner•17h ago

I think you misread my comment. I wasn't asking for help. I get consistent good output from Sonnet 4 using RooCode, without needing Gemini for planning.

Edit: I think I know where our miscommunication is happening...

The "think"/"ultrathink" series of magic words are a claudecode specific feature used to control the max thinking tokens in the request. For example, in claude code, saying "ultrathink" sets the max thinking tokens to 32k.

On other clients these keywords do nothing. In Roo, max thinking tokens is a setting. You can just set it to 32k, and then that's the same as saying "ultrathink" in every prompt in claudecode. But in Roo, I can also setup different settings profiles to use for each mode (with different max thinking token settings), configure the mode prompt, system prompt, etc. No magic keywords needed.. and you have full control over the request.

Claude Code doesn't expose that level of control.

behnamoh•1d ago

how do we know it's not a quantized version of o3? what's stopping these firms from announcing the full model to perform well on the benchmarks and then gradually quantizing it (first at Q8 so no one notices, then Q6, then Q4, ...).

I have a suspicion that's how they were able to get gpt-4-turbo so fast. In practice, I found it inferior to the original GPT-4 but the company probably benchmaxxed the hell out of the turbo and 4o versions so even though they were worse models, users found them more pleasing.

esafak•1d ago

Are there any benchmarks that track historical performance?

behnamoh•1d ago

good question, and I don't know of any, although it's a no brainer that someone should make it.

a proxy to that may be the anecdotal evidence of users who report back in a month that model X has gotten dumber (started with gpt-4 and keeps happening, esp. with Anthro and OpenAI models). I haven't heard such anecdotal stories about Gemini, R1, etc.

SparkyMcUnicorn•1d ago

Aider has one, but it hasn't been updated in months. People kept claiming models were getting worse, but the results proved that they weren't.

esafak•1d ago

https://aider.chat/docs/leaderboards/by-release-date.html

__mharrison__•1d ago

Updated yesterday... https://aider.chat/docs/leaderboards/

vitaflo•1d ago

That Deepseek price is always hilarious to see in these charts.

SparkyMcUnicorn•1d ago

That's not the one I'm referring to. See my other comments or your sibling comment.

benterix•1d ago

> users found them more pleasing.

Some users. For me the drop was so huge it became almost unusable for the things I had used it for.

behnamoh•1d ago

Same here. One of my apps straight out stopped working because the gpt-4o outputs were noticeably worse than the gpt-4 that I built the app based on.

lispisok•1d ago

I swear every time a new model is released it's great at first but then performance gets worse over time. I figured they were fine-tuning it to get rid of bad output which also nerfed the really good output. Now I'm wondering if they were quantizing it.

nabla9•1d ago

It seems that least Google is overselling their compute capacity.

You pay monthly fee, but Gemini is completely jammed 5-6 hours when North America is working.

baq•1d ago

Gemini is simply that good. I’m trying out Claude 4 every now and then and go back to Gemini to fix its mess…

fasterthanlime•1d ago

Funny, I have the exact opposite experience! I use Claude to fix Gemini’s mess.

symfoniq•1d ago

Maybe LLMs just make messes.

hgomersall•1d ago

I heard that, but I'm getting consistent garbage from Gemini.

dayjah•1d ago

For code? Use the context7 mcp.

energy123•1d ago

Gemini is the best model in the world. Gemini is the worst web app in the world. Somehow those two things are coexisting. The web devs in their UI team have really betrayed the hard work of their ML and hardware colleagues. I don't say this lightly - I say this after having paid attention to critical bugs, more than I can count on one hand, that persisted for over a year. They either don't care or are grossly incompetent.

thorum•1d ago

Try AI Studio if you haven’t already: https://aistudio.google.com/

koakuma-chan•1d ago

https://ai.dev

nabla9•1d ago

Well said.

Google is best in pure AI research, both quality and volume. They have sucked at productization for years. Not not just AI but other products as well. Real mystery.

energy123•1d ago

I don't understand why they can't just make it fast and go through the bug reports from a year ago and fix them. Is it that hard to build a box for users to type text into without it lagging for 5 seconds or throwing a bunch of errors?

baq•2m ago

If it doesn’t make sense, it makes sense. Nobody will get their promo by ‘fixing bugs’.

edzitron•1d ago

When you say "jammed," how do you mean?

solfox•1d ago

I have seen this behavior as well.

mhitza•1d ago

That was my suspicion when I first deleted my account, when it felt the output got worse in ChatGPT and I found highly suspicious when I saw an errand davinci model keyword in the chatgpt url.

Now I'm feeling similarly with their image generation (which is the only reason I created a paid account two months ago, and the output looks more generic by default).

beering•1d ago

Are you able to quantify how quickly your perception gets skewed by how long you use the models?

mhitza•13h ago

I can't quantity it for my past experience, that was more than a year ago, and I wasn't using ChatGPT daily at the time either.

This time around it felt pretty stark. I used ChatGPT to create at most 20 different image compositions. And after a couple of good ones at first, it felt worse after. One thing I've noticed recently is that when working on vector art compositions, the results start more simplistic, and often enough look like clipart thrown together. This wasn't my experience first time around. Might be temperature tweaks, or changes in their prompt that lead to this effect. Might be some random seed data they use, who knows.

Tiberium•1d ago

I've heard lots of people say that, but no objective reproducible benchmarks confirm such a thing happening often. Could this simply be a case of novelty/excitement for a new model fading away as you learn more about its shortcomings?

85392_school•1d ago

I think it's an illusion. People have been claiming it since the GPT-4 days, but nobody's ever posted any good evidence to the "model-changes" channel in Anthropic's Discord. It's probably just nostalgia.

herval•1d ago

there's definitely measurements (eg https://hdsr.mitpress.mit.edu/pub/y95zitmz/release/2 ) but I imagine they're rare because those benchmarks are expensive, so nobody keeps running them all the time?

Anecdotally, it's quite clear that some models are throttled during the day (eg Claude sometimes falls back to "concise mode" - with and without a warning on the app).

You can tell if you're using Windsurf/Cursor too - there are times of the day where the models constantly fail to do tool calling, and other times they "just work" (for the same query).

Finally, there's cases where it was confirmed by the company, like Gpt-4o's sycopanth tirade that very clearly impacted its output (https://openai.com/index/sycophancy-in-gpt-4o/)

drewnick•1d ago

I feel this too. I swear some of the coding Claude Code does on weekends is superior to the weekdays. It just has these eureka moments every now and then.

herval•1d ago

Claude has been particularly bad since they released 4.0. The push to remove 3.7 from Windsurf hasn’t helped either. Pretty evident they’re trying to force people to pay for Claude Code…

Trusting these LLM providers today is as risky as trusting Facebook as a platform, when they were pushing their “opensocial” stuff

Deathmax•1d ago

Your linked article is specifically comparing two different versioned snapshots of a model and not comparing the same model across time.

You've also made the mistake of conflating what's served via API platforms which are meant to be stable, and frontends which have no stability guarantees, and are very much iterated on in terms of the underlying model and system prompts. The GPT-4o sycophancy debacle was only on the specific model that's served via the ChatGPT frontend and never impacted the stable snapshots on the API.

I have never seen any sort of compelling evidence that any of the large labs tinkers with their stable, versioned model releases that are served via their API platforms.

herval•1d ago

Please read it again. The article is clearly comparing gpt4 to gpt4, and gpt3.5 to gpt3.5, in march vs june 2023

Deathmax•1d ago

I did read it, and I even went to their eval repo.

> At the time of writing, there are two major versions available for GPT-4 and GPT-3.5 through OpenAI’s API, one snapshotted in March 2023 and another in June 2023.

openaichat/gpt-3.5-turbo-0301 vs openaichat/gpt-3.5-turbo-0613, openaichat/gpt-4-0314 vs openaichat/gpt-4-0613. Two _distinct_ versions of the model, and not the _same_ model over time like how people like to complain that a model gets "nerfed" over time.

glitch253•1d ago

Cursor / Windsurf's degraded functionality is exactly why I created my own system:

https://github.com/mpfaffenberger/code_puppy

Kranar•1d ago

I used to think the models got worse over time as well but then I checked my chat history and what I noticed isn't that ChatGPT gets worse, it's that my standards and expectations increase over time.

When a new model comes out I test the waters a bit with some more ambitious queries and get impressed when it can handle them reasonably well. Over time I take it for granted and then just expect it to be able to handle ever more complex queries and get dissappointed when I hit a new limit.

echelon•1d ago

Re-run your historical queries, or queries that are similarly shaped.

throwaway314155•1d ago

Sounds like a _whole_ thing.

sakesun•1d ago

They could cache that :)

echelon•16h ago

That would make for a very interesting timing attack.

bobxmax•1d ago

My suspicion is it's the personalization. Most people have things like 'memory' on, and as the models increasingly personalize towards you, that personalization is hurting quality rather than helping it.

Which is why the base model wouldn't necessarily show differences when you benchmarked them.

cainxinth•1d ago

I assumed it was because the first week revealed a ton of safety issues that they then "patched" by adjusting the system prompt, and thus using up more inference tokens on things other than the user's request.

colordrops•1d ago

It's probably less often quantizing and more often adding more and more to their hidden system prompt to address various issues and "issues", and as we all know, adding more context sometimes has a negative effect.

tshaddox•1d ago

Yeah, it’s almost certainly hallucination (by the human user).

JamesBarney•1d ago

I'm pretty sure this is just a psychological phenomenon. When a new model is released all the capabilities the new model has that the old model lacks are very salient. This makes it seem amazing. Then you get used to the model, push it to the frontier, and suddenly the most salient memories of the new model are it's failures.

There are tons of benchmarks that don't show any regressions. Even small and unpublished ones rarely show regressions.

JoshuaDavid•1d ago

I suspect what's happening is that lots of people have a collection of questions / private evals that they've been testing on every new model, and when a new model comes out it sometimes can answer a question that previous models couldn't. So that selects for questions where the new model is at the edge of its capabilities and probably got lucky. But when you come up with a new question, it's generally going to be on the level of the questions the new model is newly able to solve.

Like I suspect if there was a "new" model which was best-of-256 sampling of gpt-3.5-turbo that too would seem like a really exciting model for the first little bit after it came out, because it could probably solve a lot of problems current top models struggle with (which people would notice immediately) while failing to do lots of things that are a breeze for top models (which would take people a little bit to notice).

beering•1d ago

It’s easy to measure the models getting worse, so you should be suspicious that nobody who claims this has scientific evidence to back it up.

risho•1d ago

Quantization is a massive efficiency gain for near negligible drop in quality. If the tradeoff is quantization for an 80 percent price drop I would take that any day of the week.

behnamoh•1d ago

> for near negligible drop in quality

Hmm, that's evidently and anecdotally wrong:

https://github.com/ggml-org/llama.cpp/discussions/4110

spiderice•1d ago

You may be right that the tradeoff is worth it, but it should be advertised as such. You shouldn't think you're paying for full o3, even if they're heavily discounting it.

code_biologist•22h ago

I would like the option to pay for the unquantized version. For creative or story writing (D&D campaign materials and such) quantization seems to end up in much weaker word selection and phrasing. There are small semantic missteps that break the illusion the LLM understands what it's writing. I find it jarring and deeply immersion breaking. I'd prefer prototype prompts on a cheaper quantized version, but I want to be able to spend 50 cents an API call to get golden output.

CSMastermind•1d ago

This is almost certainly what they're doing and rebranding the original o3 model as "o3-pro"

behnamoh•1d ago

> rebranding the original o3 model as "o3-pro"

interesting take, I wouldn't be surprised if they did that.

anticensor•1d ago

-pro models appear to be a best-of-10 sampling of the original full size model

Szpadel•1d ago

how do you sample it behind the scenes? usually best of X means you generate X outputs and you choose best result.

if you could do this automatically, it would be game changer as you could run top 5 best models in parallel and select best answer every time

but it's not practical because you are the bottleneck as you have to read all 5 solutions and compare them

joshstrange•1d ago

I think the idea is they use another/same model to judge all the results and only return the best one to the user.

anticensor•20h ago

I think the idea is they just feed each to the RLHF reward model used to train the model and return the most rewarded answer.

anticensor•1d ago

> if you could do this automatically, it would be game changer as you could run top 5 best models in parallel and select best answer every time

remember they have access to the RLHF reward model, against which they can evaluate all N outputs and have the most "rewarded" answer picked and sent

spott•1d ago

I believe it is a majority vote kinda thing, rather than a best single result.

tedsanders•1d ago

Nope, not what we’re doing.

o3 is still o3 (no nerfing) and o3-pro is new and better than o3.

If we were lying about this, it would be really easy to catch us - just run evals.

(I work at OpenAI.)

bn-l•1d ago

Not quantized?

tedsanders•1d ago

Not quantized. Weights are the same.

If we did change the model, we'd release it as a new model with a new name in the API (e.g., o3-turbo-2025-06-10). It would be very annoying to API customers if we ever silently changed models, so we never do this [1].

[1] `chatgpt-4o-latest` being an explicit exception

ant6n•1d ago

It was definitely annoying when o1 disappeared over night, my impression is that was better at some tasks than o3.

thegeomaster•1d ago

Google could at least learn something from this attitude, given their recent 03-25 -> 05-06 model alias switcharoo with 0 notice :)

johnb231•1d ago

That is a preview / beta model with no expectation of stability. Google did nothing wrong there. No one should be using a preview model in production.

thegeomaster•1d ago

Hard disagree. Of course technically they didn't do anything explicitly against the public guidance (the checks and balances would never let them), but naming a model with a date very strongly implies immutability.

It's the same logic of why UB in C/C++ isn't a license to do whatever the compiler wants. We're humans and we operate on implications, common-sense assumptions and trust.

johnb231•1d ago

The model is labelled as Preview. There are no guarantees of stability or availability for Preview models. Not intended for production workloads.

https://cloud.google.com/products?hl=en#product-launch-stage...

"At Preview, products or features are ready for testing by customers. Preview offerings are often publicly announced, but are not necessarily feature-complete, and no SLAs or technical support commitments are provided for these. Unless stated otherwise by Google, Preview offerings are intended for use in test environments only. The average Preview stage lasts about six months."

refulgentis•1d ago

There hasn't been a non-preview Gemini since...November? The previews are the same as everyone else's release cadance, "preview" is just a magic wand that meant the Launchcal (google's internal signoff tool, i.e. "wave will never happen again) needs less signoffs. Then it got to the point date-pinned models were getting swapped in, in the name of doing us a favor, and it's a...novel idea, we can both agree at the least.

I bet someone at Google would be a bit surprised to see someone jumping to legalese to act like this...novelty...is inherently due to the preview status, and based on anything more than a sense that there's no net harm done to us if it costs the same and is better.

I'm not sure they're wrong.

But it also leads to a sort of "nobody knows how anything works because we have 2^N configs and 5 bits" - for instance, 05-06 was also upgraded to 06-05. Except it wasn't, if you sent variable thinking to 05-06 after upgrade it'd fail. (and don't get me started on the 5 different thinking configurations for Gemini 2.5 flash thinking vs. gemini 05-06 vs. 06-05 and 0 thinking)

johnb231•1d ago

I honestly have no idea what you are trying to say.

It's a preview model - for testing only, not for production. Really not that complicated.

refulgentis•1d ago

So you don't have anything to contribute beyond, and aren't interested in anything beyond, citing of terms?

Why are you in the comments section of a engineering news site?

(note: beyond your, excuse me while I'm direct now, boorish know-nothing reply, the terms you are citing have nothing to do with the thing people are actually discussing around you, despite your best efforts. It doesn't say "we might swap in a new service, congrats!", nor does it have anything to say about that. Your legalese at most describes why they'd pull 05-06, not forward 05-06 to 06-05. This is a novel idea.)

johnb231•1d ago

This case was simply a matter of people not understanding the terms of service. There is nothing more to be said. It's that simple. The "engineers" should know that before deploying to prod. Basic competence.

And I mean I genuinely do not understand what you are trying to say. Couldn't parse it.

refulgentis•1d ago

John, do you understand that the thing you're quoting says "We reserve the right to pull things", not "We reserve the right to swap in a new service"?

Do you understand that even if it did say that, that wasn't true either? It was some weird undocumentable half-beast?

I have exactly your attitude about their cavalier use of preview for all things Gemini, and even people's use of the preview models.

But I've also been on this site for 15 years and am a bit wow'd by your interlocution style here -- it's quite rare to see someone flip "the 3P provider swapped the service on us!" into "well they said they could turn it off, of course you should expect it to be swapped for the first time ever!" insert dull sneer about the quality of other engineers

johnb231•1d ago

How is this so hard to understand? It's a preview service for testing only, not intended for production.

I am done with this thread. We are going around in circles.

refulgentis•1d ago

Well, no. Well, sure. You're done, but we're not going in circles. It'd just do too much damage to you to have to answer the simple question "Where does the legalese say they can swap in a new service?", so you have to pretend this is circular and just all-so-confusing, de facto, we have to pretend it is confusing and/or obviously wrong to use any Gemini 2+ at all.

It's a cute argument, as I noted, I'm emotionally sympathetic to it even, it's my favorite "get off my lawn." However, I've also been on the Internet long enough to know you write back, at length, when people try anti-intellectualism and why-are-we-even-talking-about-this as interaction.

johnb231•1d ago

https://cloud.google.com/terms/service-terms

"b. Disclaimer. PRE-GA OFFERINGS ARE PROVIDED “AS IS” WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES OR REPRESENTATIONS OF ANY KIND. Pre-GA Offerings (i) may be changed, suspended or discontinued at any time without prior notice to Customer and (ii) are not covered by any SLA or Google indemnity. Except as otherwise expressly indicated in a written notice or Google documentation, (A) Pre-GA Offerings are not covered by TSS, and (B) the Data Location Section above will not apply to Pre-GA Offerings."

lcnPylGDnU4H9OF•17h ago

> And I mean I genuinely do not understand what you are trying to say. Couldn't parse it.

It’s always worth considering that this may be your problem. If you still don’t get it, the only valuable reply is one which asks a question. Also, including “it’s not that complicated” only serves to inflame.

0xbadcafebee•1d ago

There's a very large gulf between "what makes sense to Google" and "what makes sense to Human Beings". I have so many rants about Google's poor treatment of "customers" that they feel like Oracle to me now. Like every time I use them, I'm really just falling prey to my own misguided idea that this time I won't get screwed over.

johnb231•1d ago

The users aren't random "human beings" in this case. They are professional software developers who are expected to understand the basics. Deploying that model into production shows a lack of basic competence. It is clearly marked "preview" and is for test only.

0xbadcafebee•18h ago

That may be true, but it doesn't make the customer's claims not true. What Google did was counter-intuitive. That's a fact. Pointing at some fine print and saying "uhh actually, technically it's your stupid human brain is the problem, not us! we technically are allowed to do anything we want, just look at the fine print!!" does not make things better. We are human beings; we are flawed. That much should be obvious to any human organization. If you don't know how to make things that don't piss off human beings, the problem isn't with the humans.

If the "preview release" you were using was v0.3, and suddenly it started being v0.6 without warning, that would be insane. The only point of providing a version number is to give people an indicator of consistency. The datestamp is a version number. If they didn't want us to expect consistency, they should not have given it a version number. That's the whole point of rolling release branches, they have no version. You don't have "v2.0" of a rolling release, you just have "latest". They fucked up by giving it a datestamp.

This is an extremely old and well-known problem with software interfaces. Either you version it or you don't. If you do version it, and change it, you change the version, and give people dependent on the old version some time to upgrade. Otherwise it breaks things, and that pisses people off. The alternative is not versioning it, which is a signal that there is no consistency to be expected. Any decent software developer should have known all this.

And while I'm at it: what's with the name flip-flopping? In 2014, GCP issued a PR release explaining It was no longer using "Preview", but "Alpha" and "Beta" (https://cloudplatform.googleblog.com/2014/10/new-release-pha...). But the link you showed earlier says "Alpha" and "Beta" are now deprecated. But no PR release? I guess that's our bad for not constantly reading the fine print and expecting it to revert back to something from 11 years ago.

linsomniac•1d ago

>we'd release it as a new model with a new name

Speaking of a new name. I'll donate the API credits to run a "choose a naming scheme for AI models that isn't confusing AF" for OpenAI.

MattDaEskimo•1d ago

What's with the dropped benchmark performance compared to the original o3 release? It was disappointing to not see o4-mini on it as well

refulgentis•1d ago

What dropped benchmark performance?

MattDaEskimo•16h ago

o3 scores noticeably worse on benchmarks compared to its original announcement benchmarks

refulgentis•14h ago

Any link / source / anything? You got quite an opportunity here, OpenAI employee claiming there's no difference and you got something that shows there is.

MattDaEskimo•13h ago

Yes, the original announcement for o3 and o4-mini:

https://openai.com/index/introducing-o3-and-o4-mini/

o3 scored 91.6 on AIME 2024. 83.3 on GPQA

o4-mini scored 93.4, 81.4 GPQA

Then, the new announcement

https://help.openai.com/en/articles/6825453-chatgpt-release-...

o3 scored 90 on AIME 2024, 81 GPQA

o4-mini wasn't measured

---

Codeforces is the same, but they have a footnote that they're using a different dataset due to saturation, but still have no grounding model to compare with

energy123•1d ago

Is it o3 (low), o3 (medium) or o3 (high)? Different model names have crept into the various benchmarks over the last few months.

tedsanders•1d ago

o3 is a model, and reasoning effort (high/medium/low) is a parameter that goes into the model.

o3 pro is a different thing - it’s not just o3 with maximum remaining effort.

fragmede•21h ago

Could someone there maybe possibly use, oh I dunno, ChatGPT and come up with some better product names?

tauntz•21h ago

Why's it called o3 then if it's a different thing? There's already a rather extreme amount of confusion with the model names and it's not clear _at all_ which model would be "the best" in terms of response quality.

Here's the current state with version numbers as far as I can piece it together (using my best guess at naming of each component of the version identifier. Might be totally wrong tho):

1) prefix (optional): "gpt-", "chatgpt-"

2) family (required): o1, o3, o4, 4o, 3.5, 4, 4.1, 4.5,

3) quality? (optional): "nano", "mini", "pro", "turbo"

4) type (optional): "audio", "search"

5) lifecycle (optional): "preview", "latest"

6) date (optional): 2025-04-14, 2024-05-13, 1106, 0613, 0125, etc (I assume the last ones are a date without a year for 2024?)

7) size (optional): "16k"

Some final combinations of these version number components are as small as 1 ("o3") or as large as 6 ("gpt-4o-mini-search-preview-2024-12-17").

Given this mess, I can't blame people assuming that the "best" model is the one with the "biggest" number, which would rank the model families as: 4.5 (best) > 4.1 > 4 > 4o > o4 > 3.5 > o3 > o1 (worst).

rat9988•18h ago

o3 and o3-pro aren't the same thing still makes sense though.

energy123•17h ago

My guess is this comes from an org structure where you have multiple "pods" working on different research. Who comes up with the next shippable model and when that happens is kind of random and the chaotic naming system comes from that. It's just my speculation and could be wildly wrong.

tedsanders•14h ago

o3 pro is based on o3 and its style and outputs will be quite similar to o3.

As an analogy, think of it like this:

o3-low ~ Ford Mustang with the accelerator gently pressed

o3-medium ~ Ford Mustang with the accelerator pressed

o3-high ~ Ford Mustang with the accelerator heavily pressed

o3 pro ~ Ford Mustang GT

Even though a Mustang GT is a different car than a Mustang, you don’t give it a totally different name (eg Palomino). The similarity in name signals it has a lot of the same characteristics but a souped up engine. Same for o3 pro.

Fun fact: before GPT-4, we had a unified naming scheme for models that went {modality}-{size}-{version}, which resulted in names like text-davinci-002. We considered launching GPT-4 as something like text-earhart-001, but since everyone was calling it GPT-4 anyway, we abandoned that system to use the name GPT-4 that everyone had already latched onto. Kind of funny how our original unified naming scheme made room for 999 versions, but we didn't make it past 3.

Edit: When I say the Mustang GT is a different car than a Mustang - I mean it literally. If you bought a Mustang GT and someone delivered a Mustang with a different trim, you wouldn't say "great, this is just what I ordered, with the same features/behavior/value." That we call it a different trim is a linguistic choice to signal to consumers that it's very similar, and built on the same production line, but comes with a different engine or different features. Similar to o3 pro.

stonogo•14h ago

This analogy might work better if the Mustang GT weren't, in fact, the same car as the Mustang. It's just a trim level, not a different car.

dwohnitmok•13h ago

Can you elaborate on what you mean that o3 pro is a GT? In particular I don't understand how to reconcile what you're saying that o3 pro is in some way fundamentally different from o3 (albeit based on o3) with this tweet:

> As o3-pro uses the same underlying model as o3, full safety details can be found in the o3 system card.

https://x.com/OpenAI/status/1932530423911096508

tedsanders•12h ago

Yeah, I totally get the confusion here. Unfortunately I can't give the recipe behind our models, so there's going to be some irreducible blurriness here, but the following statements are all true:

- o3 pro is based on o3

- o3 pro uses the same underlying model as o3

- o3 pro is similar to o3, but is a distinct thing that's smarter and slower

- o3 pro is not o3 with longer reasoning

In my analogy, o3 pro vs o3 is more than just an input parameter (e.g., not just the accelerator input) but less than a full difference in model (e.g., Ford Mustang vs F150). It's in between, kind of like car trim with the same body but a stronger engine. Imperfect analogy, and I apologize if this doesn't feel like it adds any clarity. At the end of the day, it doesn't really matter how it works - what matters is if people find it worth using.

csomar•1d ago

I think the parent-parent poster has explained why we can't trust you (and work on OpenAI doesn't help they way you think it does).

I didn't read the ToS, like everyone else, but my guess is that degrading model performance at peak times will be one of the things that can slip through. We are not suggesting you are running a different model but that you are quantizing it so that you can support more people.

This can't happen with Open weight models where you put the model, allocate the memory and run the thing. With OpenAI/Claude, we don't know the model running, how large it is, what it is running on, etc... None of that is provided and there is only one reason that I can think of: to be able to reduce resources unnoticed.

rfoo•21h ago

An (arbitrarily) quantized model is a totally different model, compared to the original.

Reubachi•16h ago

I'm not totally sure how you at this point in your online presence associate someone stating their job as a "brag" and not what it really is, providing transparency/disclosure before stating their thoughts.

This is HN and not reddit.

"I didn't read the ToS, like everyone else, but my guess..."

Ah, there it is.

fastball•23h ago

Anecdotal, but about a week ago I noticed a sharp drop in o3 performance. For many tasks I will compare Gemini 2.5 Pro with o3, running the same prompt in both. Generally for my personal use o3 and G2.5P have been neck-and neck over the last months, with responses I have been very happy with.

However starting from a week ago, the o3 responses became noticeably worse, with G2.5P staying about the same (in terms of what I've come to expect from the two models).

This alongside the news that you guys have decreased the price of o3 by 80% does really make it feel like you've quantized the model or knee-capped thinking or something. If you say it is wholly unchanged I'll believe you, but not sure how else to explain the (admittedly subjective) performance drop I've experienced.

IanCal•19h ago

Are you sure you're using the same models? G2.5P updated almost exactly a week ago.

fastball•3h ago

G2.5P might've updated, but that's not the model I noticed a difference. o3 seemed noticeably dumber in isolation, not just compared to G2.5P.

But yes, perhaps the answer is that about a week ago I started asking subconsciously harder questions, and G2.5P handled them better because it had just been improved, while o3 had not so it seemed worse. Or perhaps G2.5P has always had more capacity than o3, and I wasn't asking hard enough questions to notice a difference before.

fny•19h ago

Unrelated: Can you all come up with a better naming scheme for your models? I feel like this is a huge UX miss.

o4-mini-high o4-mini o3 o3-pro gpt-4o

Oy.

meta_ai_x•15h ago

Just because you work at openAI doesn't mean you know everything about openAI especially as strategic as nerfing models to save costs

mliker•1d ago

Where are you getting this information? What basis do you have for making this claim? OpenAI, despite its public drama, is still a massive brand and if this were exposed, would tank the company's reputation. I think making baseless claims like this is dangerous for HN

beering•1d ago

I think Gell-Mann amnesia happens here too, where you can see how wrong HN comments are on a topic you know deeply, but then forget about that when reading the comments on another topic.

ants_everywhere•1d ago

Is this what happened to Gemini 2.5 Pro? It used to be very good, but it's started struggling on basic tasks.

The thing that gets me is it seems to be lying about fetching a web page. It will say things are there that were never on any version of the page and it sometimes takes multiple screenshots of the page to convince it that it's wrong.

SparkyMcUnicorn•1d ago

The Aider discord community has proposed and disproven the theory that 2.5 Pro became worse, several times, through many benchmark runs.

It had a few bugs here or there when they pushed updates, but it didn't get worse.

ants_everywhere•1d ago

Gemini is objectively exhibiting new behavior with the same prompts and that behavior is unwelcome. It includes hallucinating information and refusing to believe it's wrong.

My question is not whether this is true (it is) but why it's happening.

I am willing to believe the aider community has found that Gemini has maintained approximately equivalent performance on fixed benchmarks. That's reasonable considering they probably use a/b testing on benchmarks to tell them whether training or architectural changes need to be reverted.

But all versions of aider I've tested, including the most recent one, don't handle Gemini correctly so I'm skeptical that they're the state of the art with respect to bench-marking Gemini.

SparkyMcUnicorn•1d ago

Gemini 2.5 Pro is the highest ranking model on the aider benchmarks leaderboard.

For benchmarks, either Gemini writes code that adheres to the required edit format, builds successfully, and passes unit tests, or it doesn't.

I primarily use aider + 2.5 pro for planning/spec files, and occasionally have it do file edits directly. Works great, other than stopping it mid-execution once in a while.

code_biologist•22h ago

My use case is mostly creative writing.

IMO 2.5 Pro 03-25 was insanely good. I suspect it was also very expensive to run. The 05-06 release was a huge regression in quality, most people saying it was a better coder and a worse writer. They tested a few different variants and some were less bad then others, but overall it was painful to lose access to such a good model. The just released 06-05 version seems to be uniformly better than 05-06, with far fewer "wow this thing is dumb as a rock" failure modes, but it still is not as strong as the 03-25 release.

Entirely anecdotally, 06-05 seems to exactly ride the line of "good enough to be the best, but no better than that" presumably to save costs versus the OG 03-25.

In addition, Google is doing something notably different between what you get on AI Studio versus the Gemini site/app. Maybe a different system prompt. There have been a lot of anecdotal comparisons on /r/bard and I do think the AI Studio version is better.

jstummbillig•1d ago

You can just give it a go for very little money (in Windsurf it's 1x right now), and see what it does. There is no room for conspiracy here, because you can simple look at what it does. If you don't like it, so won't others, and then people will not use it. People are obviously very capable of (collectively) forming opinions on models, and then vote with their wallet.

resters•1d ago

It's probably optimized in some way, but if the optimizations degrade performance, let's hope it is reflected in various benchmarks. One alternative hypothesis is that it's the same model, but in the early days they make it think "harder" and run a meta-process to collect training data for reinforcement learning for use on future models.

SparkyMcUnicorn•1d ago

It's a bit dated now, but it would be cool if people submitted PRs for this one: https://aider.chat/docs/leaderboards/by-release-date.html

__mharrison__•1d ago

Dated? This was updated yesterday https://aider.chat/docs/leaderboards/

SparkyMcUnicorn•1d ago

My link is to the benchmark results _over time_.

The main leaderboard page that you linked to is updated quite frequently, but it doesn't contain multiple benchmarks for the same exact model.

carter-0•1d ago

An OpenAI researcher claims it's the exact same model on X: https://x.com/aidan_mclau/status/1932507602216497608

segmondy•1d ago

you don't, so run your own model.

EnPissant•1d ago

The API lists o3 and o3-2025-04-16 as the same thing with the same price. The date based models are set in stone.

hyperknot•1d ago

I got 700+ tokens/sec on o3 after the announcement, I suspect it's very much a quantized version.

https://x.com/hyperknot/status/1932476190608036243

dist-epoch•1d ago

Or maybe they just brought online much faster much cheaper hardware.

az226•1d ago

Or they are using a speedy add-on decoder.

zackangelo•1d ago

Is that input tokens or output tokens/s?

beering•1d ago

Do you also have numbers on intelligence before and after?

Bjorkbat•1d ago

Related, when o3 finally came out ARC-AGI updated their graph because it didn’t perform nearly as well as the version of o3 that “beat” the benchmark.

https://arcprize.org/blog/analyzing-o3-with-arc-agi

beering•1d ago

The o3-preview test was with very expensive amounts of compute, right? I remember it was north of $10k so makes sense it did better

Bjorkbat•18h ago

Point remains though, they crushed the benchmark using a specialized model that you’ll probably never have access to, whether personally or through a company.

They inflated expectations and then released to the public a model that underperforms

throwaway314155•16h ago

They revealed the price points for running those evaluations. IIRC the "high" level of reasoning cost tens of thousands of dollars if not more. I don't think they really inflated expectations. In fact a lot of what we learned is that ARC-AGI probably isn't a very good AGI evaluation (it claims to not be one, but the name suggests otherwise).

ctoth•1d ago

From the announcement email:

> Today, we dropped the price of OpenAI o3 by 80%, bringing the cost down to $2 / 1M input tokens and $8 / 1M output tokens.

> We optimized our inference stack that serves o3—this is the same exact model, just cheaper.

smusamashah•1d ago

Hw about testing same input vs output with same seed on different dates. If its a different model it will return different output.

zomnoys•1d ago

Isn’t this not true since these models run with a non-zero temperature?

smusamashah•1d ago

You can set the temperature too.

luke-stanley•1d ago

I think the API has some special IDs to check for reproducibility of the environment.

tedsanders•1d ago

It's the same model, no quantization, no gimmicks.

In the API, we never make silent changes to models, as that would be super annoying to API developers [1]. In ChatGPT, it's a little less clear when we update models because we don't want to bombard regular users with version numbers in the UI, but it's still not totally silent/opaque - we document all model updates in the ChatGPT release notes [2].

[1] chatgpt-4o-latest is an exception; we explicitly update this model pointer without warning.

[2] ChatGPT Release Notes document our updates to gpt-4o and other models: https://help.openai.com/en/articles/6825453-chatgpt-release-...

(I work at OpenAI.)

az226•1d ago

Even classic GPT-4 from March 2023 was quantized to 4.5 bits.

rfoo•21h ago

I don't work for OAI so obviously I can't say for them. But we don't do this.

We don't make hobbyist mistakes of randomly YOLO trying various "quantization" methods that only happen after all training and claim it a day, at all. Quantization was done before it went live.

visiondude•1d ago

always seemed to me that efficient caching strategies could greatly reduce costs… wonder if they cooked up something new

xmprt•1d ago

How are LLMs cached? Every prompt would be different so it's not clear how that would work. Unless you're talking about caching the model weights...

koakuma-chan•1d ago

A lot of the prompt is always the same: the instructions, the context, the codebase (if you are coding), etc.

amanda99•1d ago

You would use a KV cache to cache a significant chunk of the inference work.

biophysboy•1d ago

Do you mean that they provide the same answer to verbatim-equivalent questions, and pull the answer out of storage instead of recalculating each time? I've always wondered if they did this.

koakuma-chan•1d ago

The prompt may be the same but the seed is different every time.

biophysboy•1d ago

Could you not cache the top k outputs given a provided input token set? I thought the randomness was applied at the end by sampling the output distribution.

Traubenfuchs•1d ago

I bet there is a set of repetitive single, or two, question user requests that makes out a sizeable amount of all requests. The models are so expensive to run, 1% would be enough. Much less than 1%. To make it less obvious they probably have a big set of response variants. I don't see how they would not do this.

They probably also have cheap code or cheap models that normalize requests to increase cache hit rate.

xmprt•1d ago

Using KV in the caching context is a bit confusing because it usually means key-value in the storage sense of the word (like Redis), but for LLMs, it means the key and value tensors. So IIUC, the cache will store the results of the K and V matrix multiplications for a given prompt and the only computation that needs to be done is the Q and attention calculations.

hadlock•1d ago

I've asked it a question not in it's dataset three different ways and I see the same three sentences in the response, word for word, which could imply it's caching the core answer. I hadn't previously seen this behavior before this last week.

beering•4h ago

Isn’t the simpler explanation that if you ask the same question, there’s a chance you would get the same answer?

In this case you didn’t even get the same answer, you only happened to have one sentence in the answer match.

HugoDias•1d ago

This document explains the process very well. It’s a good read: https://platform.openai.com/docs/guides/prompt-caching

xmprt•1d ago

That link explains how OpenAI uses it, but doesn't really walk through how it's any faster. I thought the whole point of transformers was that inference speed no longer depended on prompt length. So how does caching the prompt help reduce latency if the outputs aren't being cached.

> Regardless of whether caching is used, the output generated will be identical. This is because only the prompt itself is cached, while the actual response is computed anew each time based on the cached prompt

singron•14h ago

> I thought the whole point of transformers was that inference speed no longer depended on prompt length

That's not true at all and is exactly what prompt caching is for. For one, you can at least populate the attention KV Cache, which will scale with the prompt size. It's true that if your prompt is larger than the context size, then the prompt size no longer affects inference speed since it essentially discards the excess.

catlifeonmars•1d ago

> OpenAI routes API requests to servers that recently processed the same prompt,

My mind immediately goes to rowhammer for some reason.

At the very least this opens up the possibility of some targeted denial of service

xmprt•1d ago

Later they mention that they have some kind of rate limiting because if over ~15 requests are being processed per minute, the request will be sent to a different server. I guess you could deny cache usage but I'm not sure what isolation they have between different callers so maybe even that won't work.

catlifeonmars•1d ago

15 requests/min is pretty low. Depending on how large the fleet is you might end up getting load balanced to the same one and if it’s round robin then it would be deterministic

catlifeonmars•23h ago

So the doc mentions you can influence the cache key by passing an optional user parameter. It’s unclear from the doc whether the user parameter is validated or if you can just provide an arbitrary string.

tasuki•1d ago

> Every prompt would be different

No? Eg "how to cook pasta" is probably asked a lot.

candiddevmike•1d ago

It's going to be a race to the bottom, they have no moat.

rvnx•1d ago

Especially now that they are second in the race (behind Anthropic) and lot of free-to-download and free-to-use models are now starting to be viable competitors.

Once new MacBooks and iPhones have enough memory onboard this is going to be a disaster for OpenAI and other providers.

mattnewton•1d ago

I'm not sure they're scared of Anthropic - they're doing great work but afaict running into some scaling issues and really focused on winning over developers at the moment.

If I was OpenAI (or Anthropic for that matter) I would remain scared of Google, who is now awake and able to dump Gemini 2.5 pro on the market at costs that I'm not sure people without their own hardware can compete with, and with the infrastructure to handle everyone switching to them tomorrow.

itomato•1d ago

Codex Research Preview appeared in my account in the early AM.

piuantiderp•1d ago

Google is going to lap them. The hardware muscle they have has not even started flexing

koakuma-chan•1d ago

What do you mean, Google is number 1

aerhardt•1d ago

OpenAI are second in the race to Anthropic in some benchmarks (maybe?), but OpenAI still dwarves Anthropic in distribution and popularity.

ratedgene•1d ago

That's slowly changing. I know some relatively non-tech savvy young people using things like Claude for various reasons, so people are exploring options.

jstummbillig•1d ago

Very, very slowly.

OpenAI vs Anthropic on Google Trends

https://trends.google.com/trends/explore?date=today%203-m&q=...

ChatGPT vs Claude on Google Trends

https://trends.google.com/trends/explore?date=today%203-m&q=...

rvnx•1d ago

This is such a big difference, thank you for sharing it, I didn't expect the gap to be _that_ huge

sndean•1d ago

I wonder how much of this is brand name? Like Kleenex. Non-tech people might not search for LLM, generative AI, etc. ChatGPT may just be what people have heard of. I’m assuming OpenAI has a large advantage over Anthropic, and the name helps, but I bet the name is exaggerating the difference here a bit. Not everyone buys Kleenex branded Kleenex.

jstummbillig•22h ago

You are not going to find Claude when googling for ChatGPT

jdprgm•1d ago

While mac unified ram inference is great for prosumers+ I really don't foresee Apple making 128GB+ options affordable enough to be attractive for inference for the general public. iPhone even less so considering the latest is only at 8GB. Meanwhile the best model sizes will just keep growing.

paxys•1d ago

Third behind Anthropic/Google. People are too quick to discount mindshare though. For the vast majority of the world's population AI = LLM = ChatGPT, and that itself will keep OpenAI years ahead of the competition as long as they don't blunder away that audience.

slashdev•1d ago

Thrid for coding, after Anthropic, and Gemini, which was leading last I checked.

joshuajooste05•1d ago

There was an article on here a week or two ago on batch inference.

Do you not think that batch inference gives at least a bit of a moat whereby unit costs fall with more prompts per unit of time, especially if models get more complicated and larger in the future?

minimaxir•1d ago

Batch inference is not exclusive to OpenAI.

mrweasel•1d ago

My understanding was that OpenAI couldn't make money at their previous price point, and I don't think operation and training cost have gone down sufficiently to make up for those short comings. So how are they going to make money by lowering the price by 80%?

I get the point is to be the last man standing, and poaching customers by lowering the price, and perhaps attract a few people who wouldn't have bought a subscription at the higher price. I just question how long investors can justify pouring money into OpenAI. OpenAI is also the poster child for modern AI, so if they fail the market will react badly.

Mostly I don't understand Silicon Valley venture capital, but dumping price, making wild purchases for investor money and mostly only leading on branding, why isn't this a sign that OpenAI is failing?

simonw•1d ago

OpenAI's Adam Groth credits "engineers optimizing inferencing" for the price drop: https://twitter.com/TheRealAdamG/status/1932440328293806321

That seems likely to me, all of the LLM providers have been consistently finding new optimizations for the past couple of years.

m3kw9•1d ago

LLM inferencing is race to the bottom but the service layers on top isn’t. People always pay much more for convenience, those are the thing OpenAI focuses on and is harder to replicate

Szpadel•1d ago

for sure they are no longer clear winners, but they try to be just barely on top of others.

right now new Gemini surpassed their o3 (barely) in benchmarks for significantly less money so they cut pricing to be still competitive.

I bet they didn't released o4 not because it's not competitive, but because they are doing Nvidia game: release new product that is just enough better to convince people to buy it. so IMO they are holding full o4 model to have something to release after competition release something better that their top horse

ninetyninenine•1d ago

You know. because LLMs can only be built by corporations... but because they're so easy to build, I see the price going down massively thanks to competition. Consumers benefit because all the companies are trying to out run each other.

codr7•1d ago

And then they all go out of business, since models cost a fortune to build, and their fan club is left staring at their computers trying to remember how to do anything without getting it served on a silver plate.

merth•1d ago

Investors pouring money, its probably impossible to go out of business, at least for the big ones, until investors realise this is wrong hill to die on.

codr7•1d ago

Which they will eventually; so the point stands, no matter how unpopular with the AI excusers out there.

wrsh07•1d ago

I expect they don't go out of business: at worst they don't start their next training run quite as aggressively and instead let their new very good model be profitable for a minute

Many many companies are currently thrilled to pay the current model prices for no performance improvement for 2-3 years

We still have so many features to build on top of current capabilities

croes•1d ago

Easy doesn’t mean cheap.

They need lots of energy and customers don’t pay much, if they pay at all

briian•1d ago

Exactly,

The developers of AI models do have a moat, the cost of training the model in the first place.

It's 90% of the low effort AI wrappers with little to no value add who have no moat.

koakuma-chan•1d ago

OpenAI dropped the price by so much that the server also went down.

pbasista•1d ago

Is the price drop really the reason for their recent outage?

Or is the price drop an attempt to cover up bad news about the outage with news about the price drop?

johanyc•1d ago

> Or is the price drop an attempt to cover up bad news about the outage with news about the price drop?

This makes no sense. No way a global outage will get less coverage than the price drop.

Also the earliest sign of price drop is this tweet 20 hrs ago (https://x.com/OpenAIDevs/status/1932248668469445002), which is earlier than the earliest outage reports 13hrs ago on https://downdetector.com/status/openai/

koakuma-chan•1d ago

> No way a global outage will get less coverage than the price drop.

Have you seen today's outage on any news outlet? I have not. Is there an HN thread?

johanyc•1d ago

https://www.forbes.com/sites/tylerroush/2025/06/10/chatgpt-o...

biophysboy•1d ago

I don't know if this is OpenAI's intention, but the little message "you've reached your usage limit!" is actively disincentivizing me from subscribing. For my purposes, the free model is more than good enough; the difference before and after is negligible. I honestly wouldn't pay a dollar.

That said, I'm absolutely willing to hear people out on "value-adds" I am missing out on; I'm not a knee-jerk hater (For context, I work with large, complex & private databases/platforms, so its not really possible for me to do anything but ask for scripting suggestions).

Also, I am 100% expecting a sad day when I'll be forced to subscribe, unless I want to read dick pill ads shoehorned in to the answers (looking at you, YouTube). I do worry about getting dependent on this tool and watching it become enshittified.

Traubenfuchs•1d ago

> "you've reached your usage limit!"

Just switch to a competitors free offering. There are enough to cycle through not to be hindered by limits. I wonder how much money I have cost those companies by now?

How anyone believes there is any moat for anyone here is beyond me.

wrsh07•1d ago

I expect the answer is <$1 as someone who shares a discord server with a friend where we egregiously ping the models

wrsh07•1d ago

o3 is so good it's worth paying for a minute (just for plus) just to see what it's like

I've never used anything like it. I think new Claude is similarly capable

lvl155•1d ago

Google has been catching up. Funny how fast this space is evolving. Just a few months ago, it was all about DeepSeek.

bitpush•1d ago

Many would say Google's Gemini models are SOTA, although Claude seems to be doing well with coding tasks.

snarf21•1d ago

Gemini has been better than Claude for me on a coding project. Claude kept telling me it update some code but the update wasn't in the output. Like, I had to re-prompt just for updated output 5 times in a row.

jacob019•1d ago

I break out Gemini 2.5 pro when Claude gets stuck, it's just so slow and verbose. Claude follows instructions better and seems to better understand it's role in agentic workflows. Gemini does something different with the context, it has a deeper understanding of the control flow and can uncover edge case bugs that Claude misses. o3 seems better at high level thinking and planning, questioning if it should it be done and whether the challenge actually matches the need. They're kind of like colleagues with unique strengths. o3 does well with a lot of things, I just haven't used it as much because of the cost. Will probably use it more now.

johan914•1d ago

I have been using Google’s models the past couple months, and was surprised to see how sycophantic chatGPT is now. It’s not just at the start or end of responses, it’s interspaced within the markdown, with little substance. Asking it to change its style makes it overuse technical terms.

resource_waste•1d ago

Deepseek was exciting because you could download their model. They are seemingly 3rd place and have been since Gemini 2.5.

Squarex•21h ago

I would put them on the fourth after Google, OpenAI and Anthropic. Still the best open weight llm.

ookdatnog•22h ago

If the competition boils down to who has access to the largest amount of high quality data, it's hard to see how anyone but Google could win in the end: through Google Books they have scans of tens of millions of books, and published books are the highest quality texts there are.

itake•18h ago

I've been learning vietnamese. Unfortunately, a lot of social media (reddit, fb, etc) has a new generation of language. The younger generation uses so much abbreviations and acronyms, ChatGPT and Google Translate can't keep up.

I think if you're goal is to have properly written langauge using older writing styles, then you're correct.

ookdatnog•15h ago

I don't think it's simply a stylistic matter: it seems reasonable to assume that text in books tends to have higher information density, and contains longer and more complicated arguments (when compared to text obtained from social media posts, blogs, shorter articles, etc). If you want models that appear more intelligent, I think you need them to train on this kind of high-quality content.

The fact that these tend to be written in an older writing style is to me incidental. You could rewrite all your college text books in contemporary social media slang and I would still consider them high-quality texts.

malshe•8h ago

I have observed that DeepSeek hallucinates a lot more than others for the same task. Anyone else experienced it?

lxgr•1d ago

Is there also a corresponding increase in weekly messages for ChatGPT Plus users with o3?

In my experience, o4-mini and o4-mini-high are far behind o3 in utility, but since I’m rate-limited for the latter, I end up primarily using the former, which has kind of reinforced the perception that OpenAI’s thinking models are behind the competition altogether.

el_benhameen•1d ago

My usage has also reflected the pretty heavy rate limits on o3. I find o4-mini-high to be quite good, but I agree that I would much rather use o3. Hoping this means an increase in the limits.

sagarpatil•1d ago

Before: 50 messages per week Now: 100 messages per week

lxgr•1d ago

That’s already been the case for a few weeks though, right? and it’s up from 50, whereas a price reduction by 80% would correspond to 5x the quota extrapolating linearly.

johnnyApplePRNG•23h ago

Agreed 100% o3 is great but the rate limit window and the quota itself both render it almost useless for more than one off fixes.

It's great with those, however!

sunaookami•21h ago

200 per week now: https://x.com/kevinweil/status/1932565467736027597

coffeecoders•1d ago

Despite the popular take that LLMs have no moat and are burning cash, I find OpenAI's situation really promising.

Just yesterday, they reported an annualized revenue run rate of 10B. Their last funding round in March valued them at 300B. Despite losing 5B last year, they are growing really fast - 30x revenue with over 500M active users.

It reminds me a lot of Uber in its earlier years—fast growth, heavy investment, but edging closer to profitability.

rgavuliak•1d ago

I don't think the no moat approach makes sense. In a world where more an more content and interaction is done with and via LLMs, the data of your users chatting with your LLM is a super valuable dataset.

bitpush•1d ago

The problem is your costs also scale with revenue. Ideally you want to have control costs as you scale (the first you build is expensive, but as you make more your costs come down).

For OpenAI, the more people use the product, the same you spend on compute unless they can supplement it with another ways of generating revenue.

I dont unfortunately think OpenAI will be able to hit sustained profitability (see Netflix for another example)

Legend2440•1d ago

>(see Netflix for another example)

Netflix has been profitable for over a decade though? They reported $8.7 billion in profit in 2024.

amazingamazing•1d ago

They increased prices and are not selling a pure commodity tho

aizk•1d ago

> sustained profitability (see Netflix for another example)

What? Netflix is incredibly profitable.

bitpush•1d ago

Probably a bad example from my part, but also because of increasing the costs and offering a tier with ads. I was mostly talking about the Netflix as it was originally concieved. "Give access to unlimited content at a flat fee", which didnt scale pretty well.

whiplash451•1d ago

Isn't this exactly what they offer today?

tptacek•1d ago

All costs are not equal. There is a classic pattern of dogfights for winner-take-most product categories where the long term winner does the best job of acquiring customers at the expense of things like "engineering to reduce costs". I have no idea how the AI space is going to shake out, but if I had to pick between OpenAI's mindshare in the broadest possible cohort of users vs. best/most efficient model, I'd pick the customers.

Obviously, lots of nerds on HN have preferences for Gemini and Claude, and having used all three I completely get why that is. But we should remember we're not representative of the whole addressable market. There were probably nerds on like ancient dial-up bulletin boards explaining why Betamax was going to win, too.

awongh•1d ago

We don't even know yet if the model is the product though, and if OpenAI is the company that will make the AI product/model, (chat that keeps expanding into other functionalities and capabilities) or will it be 10,000 companies using the OpenAI models. (well, it's probably both, but in what proportion of revenue)

tptacek•1d ago

Right, but it might not even matter if all the competitors are in the ballpark of the final product/market fit and OpenAI holds a commanding lead in customer acquisition.

Again: I don't know. I've got no predictions. I'm just saying that the logic where OpenAI is outcompeted on models themselves and thus automatically lose does not hold automatically.

TZubiri•1d ago

Unlike Uber or whatsapp, there's no network effect. Don't think this is a winner takes all market, there was an article where we had this discussion earlier. Players who get a small market share are immediately profitable proportional to the market share (given a minimum size is exceeded.)

Magmalgebra•1d ago

Anyone concerned about cost should remember that those costs are dropping exponenentially.

Similarly, nearly all AI products but especially OpenAI are heavily _under_ monetized. OpenAI is an excellent personal shopper - the ad revenue that could be generated from that rivals Facebook or Google.

smelendez•1d ago

It wouldn't surprise me if they try, but ironically if GPT is a good personal shopper, it might make it harder to monetize with ads because people will trust the bot's organic responses more than the ads.

You could override its suggestions with paid ones, or nerf the bot's shopping abilities so it doesn't overshadow the sponsors, but that will destroy trust in the product in a very competitive industry.

You could put user-targeted ads on the site not necessarily related to the current query, like ads you would see on Facebook, but if the bot is really such a good personal shopper, people are literally at a ChatGPT prompt when they see the ads and will use it to comparison shop.

whiplash451•1d ago

Alternative: let users reduce their monthly bill by accepting a sponsored answer with a dedicated button in the UI

(with many potential variants)

simonw•1d ago

"... as you make more your costs come down"

I'd say dropping the price of o3 by 80% due to "engineers optimizing inferencing" is a strong sign that they're doing exactly that.

asadotzler•1d ago

You trust their PR statements?

TZubiri•1d ago

It's not a PR statement, it's a change in price. Literally putting money where the mouth is.

theappsecguy•1d ago

Or they are trying to gobble up market share because Anthropic has been much better than OpenAI

petesergeant•1d ago

Providers are exceptionally easy to switch. There's no moat for enterprise-level usage. There's no "market share" to gobble up because I can change a line in my config, run the eval suite, and switch immediately to another provider.

This is marginally less true for embedding models and things you've fine-tuned, but only marginally.

Davidzheng•1d ago

o3 probably used to have a HUGE profit margin on inference, so I'd say it's unclear how much optimo was done;

programjames•1d ago

I find it pretty plausible they got an 80% speedup just by making optimized kernels for everything. Even when GPUs say they're being 100% utilized, there are so many improvements to be made, like:

- Carefully interleaving shared memory loading with computation, and the whole kernel with global memory loading.

- Warp shuffling for softmax.

- Avoiding memory access conflicts in matrix multiplication.

I'm sure the guys at ClosedAI have many more optimizations they've implemented ;). They're probably eventually going to design their own chips or use photonic chips for lower energy costs, but there's still a lot of gains to be made in the software.

Davidzheng•23h ago

yes I agree that it is very plausible. But it's just unclear whether it is more of a business decision or a real downstream effect of engineering optimizations (which I assume are happening everyday at OA)

simonw•1d ago

Seems more likely to me then them deciding to take a sizable loss on inference by dropping prices by 80% for no reason.

Optimizing serving isn't unlikely: all of the big AI vendors keep finding new efficiencies, it's been an ongoing trend over the past two years.

bitpush•1d ago

This is my sense as well. You dont drop 80% on a random Tuesday based on scale, you do it with an explicit goal to get market share at the expense of $$.

lossolo•1d ago

> "engineers optimizing inferencing"

They finally implemented DeepSeek open source methods for fast inference?

marsten•1d ago

You raise a good point that this isn't a low marginal cost business like software, telecom, or (most of) the web. Efficiency will be a big advantage for companies that can achieve it, in part because it will let them scale to new AI use cases.

With the race to get new models out the door, I doubt any of these companies have done much to optimize cost so far. Google is a partial exception – they began developing the TPU ten years ago and the rest of their infrastructure has been optimized over the years to serve computationally expensive products (search, gmail, youtube, etc.).

ACCount36•1d ago

The bulk of AI costs are NOT in inference. They're in R&D and frontier training runs.

The more inference customers OpenAI has, the easier it is for them to reach profitability.

ToucanLoucan•1d ago

I mean sure, it's very promising if OpenAI's future is your only metric. It gets notably darker if you look at the broader picture of ChatGPT (and company)'s impact on our society.

* We have people uploading tons of zero-effort slop pieces to all manner of online storefronts, and making people less likely to buy overall because they assume everything is AI now

* We have an uncomfortable community of, to be blunt, actual cultists emerging around ChatGPT, doing all kinds of shit from annoying their friends and family all the way up to divorcing their spouses

* Education is struggling in all kinds of ways due to students using (and abusing) the tech, with already strained administrations struggling to figure out how to navigate it

Like yeah if your only metric is OpenAI's particular line going up, it's looking alright. And much like Uber, it's success seems to be corrosive to the society in which it operates. Is this supposed to be good news?

arealaccount•1d ago

Dying for a reference on the cult stuff, a quick search didn’t provide anything interesting.

ToucanLoucan•1d ago

Scroll through the ChatGPT subreddit right now and tell me there isn't a TON of people in there who are legitimately unwell. Reads like the back page notes of a dystopian novel.

arandomhuman•1d ago

I think this is less caused by ChatGPT/LLMs and more of a phenomenon in social media circles where people flock to "the thing" and have poor social skills and mental health generally speaking.

wizzwizz4•1d ago

https://futurism.com/chatgpt-mental-health-crises, which references the more famous https://www.rollingstone.com/culture/culture-features/ai-spi... but is a newer article.

xz0r•1d ago

The article links to a forum post which kind of explains how engagement is maximised https://community.openai.com/t/uncovering-the-intent-behind-...

wizzwizz4•20h ago

Poetic, but I don't think that really explains anything.

xz0r•14h ago

Ever thought about how there's a magnetic quality to mirrors that keeps us looking? I see GPT in a similar light, it functions as a mirror, reflecting aspects of our reality.

MangoToupe•1d ago

In addition to what the parent commenter was likely referring to, there are also the Zizians: https://en.wikipedia.org/wiki/Zizians

SlowTao•1d ago

Yes but in a typical western business sense they are merely optimizing for user engadgement and profits. What happens to society a decade from now because of all the slop being produced, that is not their concern. Facebook is just about connecting friends right, totally wont become a series of information moats and bubbles controlled by the algorithms...

A great communicator on the risks of AI being to heavily intergrated into society is Zak Stein. As someone who works in education, they are see first hand how people are becoming dependent on this stuff rather than any kind of self improvement. The people who are just handing over all their thinking to the machine. It is very bizarre and I am seeing it in my personal experience a lot more over the last few months.

BugheadTorpeda6•19h ago

I absolutely agree. I find it abhorrent.

seydor•1d ago

their moat is leaky because llm prices will be dropping forever and the only viable model will be a free model. Eventually everyone will catch up.

Plus there is the thing that "thinking models" can't really solve complex tasks / aren't really as good as they are believed to be .

Zaheer•1d ago

I would wager most of their revenue is from the subscriptions - both consumer and business. That pricing is detached from the API pricing. The heavy emphasis on applications more recently is because they realize this as well.

therealdrag0•1d ago

As an anecdote they have first mover advantage on me. I pay monthly but mostly because it’s good enough and I can’t be bothered to try a bunch out and switch. But if the dust settles and prices drop i would be motivated to switch. How much that matters maybe depends if their revenue comes from app users or API plans. And first mover only works once. Now they maybe coasting on name recognition, but otherwise new users maybe load balanced among all the options.

jillesvangurp•23h ago

The moat is increasingly becoming having access to billions needed to finance the infrastructure needed to serve billions. That's why Google is still in the game. They have that and they are very good at massive scale and have some cost advantages there.

OpenAI is very good at this as well because of their brand name. For many people ChatGPT is all they know. That's the one that's in the news. That's the one everybody keeps talking about. They have many millions of paying users at this point.

This is a non trivial moat. If you can only be successful by not serving most of the market for cost reasons, then you can't be successful. It's how Google has been able to guard its search empire for a quarter century. It's easy to match what they do algorithmically. But then growing from a niche search engine that has maybe a few tens of thousands of users (e.g. Kagi) to Google scale serving essentially most of this planet (minus some fire walled countries like Russia and China), is a bit of a journey.

So Google rolling out search integration is a big deal. It means they are readying themselves for that scale and will have billions of users exposed to this soon.

> Their last funding round in March valued them at 300B. Despite losing 5B last year, they are growing really fast

Yes, they are valued based on world+dog needing agentic AIs and subscribing to the extent of tens or hundreds of dollars/month. It's going to outstrip revenue things like MS Office in its prime.

5B loss is peanuts compared to that. If they weren't burning that, their ambition level would be too low.

Uber now has a substantial portion of the month. They have about 3-4 billion revenue per month. A lot of cost obviously. But they managed 10B profit last year. And they are not done growing yet. They were overvalued at some point and then they crashed, but they are still there and it's a pretty healthy business at this point and that reflects in their stock price. It's basically valued higher now than at the time of the Softbank investment pre-IPO. Of course a lot of stuff needed to be sorted out for that to happen.

unraveller•1d ago

I have no moat and I must make these GPUs scream.

blueblisters•1d ago

This is the best model out there, priced level or lesser than Claude and Gemini

They’re not letting the competition breathe

Davidzheng•1d ago

Gemini is close (if not better) so it just makes sense no? o3-pro might be ahead of pack tho

blueblisters•23h ago

o3 does better especially if you use the api (not ChatGPT)

dorianjp•14h ago

appreciate this, the faster we get to cheap commoditization, the better

seydor•1d ago

when the race to the bottom reaches the bottom, the foundation model companies will be bought by ... energy companies. You 'll be paying for AI with your electricity bill

paxys•1d ago

It'll be the opposite. Large tech companies are already running their own power plants.

andyferris•1d ago

Yes - it’s common in traditional industries too. In my home town the aluminum refinery bought the power station to improve reliability (I should add - through upgrades not screwing over the rest of the power users).

ramesh31•1d ago

Anthropic will need to follow suit with Opus soon. It is simply too expensive for anything by an order of magnitude.

sagarpatil•1d ago

Have they ever decreased the price in the past? I don’t remember.

madebywelch•1d ago

They could drop the price 100% and I still wouldn't use it, so long as they're retaining my data.

simonw•1d ago

Sounds like you want their Zero Data Retention plan: https://platform.openai.com/docs/guides/your-data#zero-data-...

(It's "contact us" pricing, so I have no idea how much that would set you back. I'm guessing it's not cheap.)

scudsworth•1d ago

it doesn't seem like this would supercede a court order

tech234a•1d ago

Actually it does according to https://openai.com/index/response-to-nyt-data-demands/

otterley•9h ago

The court order doesn't require OpenAI to modify their software. ZDR is implemented through a separate API with separate endpoints that never retained data in the first place.

sschueller•1d ago

Has anyone noticed that OpenAI has become "lazy"? When I ask questions now it will not give me a complete file or fix. Instead it tells me what I should do and I need to ask a second or third time to just do the thing I asked.

I don't see this happening with for example deepseek.

Is it possible they are saving on resources by having it answer that way?

tedsanders•1d ago

Yeah, our models are sometimes too lazy. It’s not intentional, and future models will be less lazy.

When I worked at Netflix I sometimes heard the same speculation about intentionally bad recommendations, which people theorized would lower streaming and increase profit margins. It made even less sense there as streaming costs are usually less than a penny. In reality, it’s just hard to make perfect products!

(I work at OpenAI.)

ukblewis•1d ago

Please be careful about the alternative. I’ve seen o3 doing excessive tool calls and research for relatively simple problems.

Hard_Space•1d ago

After the last few weeks, where o3 seems desperate to do tool searches or re-crunch a bad gen even though I only asked a question about it, I assumed that the policy is to burn through credits at the fastest possible rate. With this price change, I don't know what's happening now...

Nextgrid•12h ago

Are they actually profitable? A policy to burn through credits only makes sense if they're making a profit on each token - otherwise it would be counterproductive.

jillesvangurp•23h ago

Yep, it defaults to doing a web search even when that doesn't make sense.

Example, I asked it to write something. And then I asked it to give me that blob of text in markdown format. So everything it needed was already in the conversation. That took a whole minute of doing web searches and what not.

I actually dislike using o3 for this reason. I keep the default to 4o. But sometimes I forget to switch back and it goes off boiling the oceans to answer a simple question. It's a bit too trigger happy with that. In general all this version and model soup is impossible to figure out for non technical users. And I noticed 4o is now sometimes starting to do the same. I guess, too many users never use the model drop down.

TZubiri•1d ago

but maybe you are saying that because you are a CIA plant that's trying to make the product bad because of complex reasons.

takes tinfoil hat off

Oh, nvm, that makes sense.

thimabi•1d ago

Can you share what are the main challenges OpenAI has been facing in terms of increasing access to top-tier and non-lazy models?

anshumankmr•1d ago

That was a problem in GPT 4 Turbo as well...

jazzyjackson•1d ago

IMO its just that the models are very nondeterministic, and people get very different kinds of responses from it. I met a number of people who tried it when it first came out and it was just useless so they stopped trying it, other people (including me) got gobsmacking great responses and it felt like AGI was around the corner, but after enough coin flips your luck runs out and you get some lazy responses. Some people have more luck than others and wonder why everyone around them says it's trash.

anshumankmr•1d ago

GPT4-Turbo had some major "laziness" problems, like really major ones. I posted about this a year back.https://news.ycombinator.com/item?id=39985596#39987726

I am not saying they haven't improved the laziness problem, but it does happen anecdotally. I even got similar sort of "lazy" responses for something I am building with gemini-2.5-flash.

0x1ceb00da•1d ago

I think it's good. The model will probably make some mistake at first. Not doing the whole thing and just telling the user the direction it's going in gives us a chance to correct its mistakes.

TillE•1d ago

Had a fun experience the other day asking "make a graph of [X] vs [Y]" (some chemistry calculations), and the response was blah blah blah explain explain "let me know if you want a graph of this!" Yeah ok thanks for offering.

csomar•1d ago

I don't think that's laziness but maybe agent tuning.

polskibus•1d ago

Is this a reaction to Apple paper showing that reasoning models don’t really reason?

anothermathbozo•1d ago

Why would that be?

dragandj•15h ago

If these kids could read, they would be very upset.

nikcub•1d ago

fyi the price drop has been updated in Cursor:

https://x.com/cursor_ai/status/1932484008816050492

BeetleB•1d ago

Why does OpenAI require me to verify my "organization" (which requires my state issued ID) to use o3?

bearjaws•1d ago

Prevent Deepseek R2 being trained on it

piskov•1d ago

If only there were people with multiple passports or, I don’t know, Kyrgyzstan.

How exactly will passport check prevent any training?

At most this will block API access to your average Ivan, not a state actor

BeetleB•1d ago

Yeah, I just don't see myself using o3 when I have Gemini-2.5 Pro. I don't recall if Google Cloud verified my ID in the past, though. Still, no need to let yet another organization have my data if I'm not getting something better in return.

jjani•1d ago

> I don't recall if Google Cloud verified my ID in the past, though

It generally does not. No idea if there are edge cases where it does, but that's definitely not the norm for the average user.

ivanmontillam•1d ago

I'm an average Ivan, and I got access.

yyhhsj0521•16h ago

It's most likely for regulation compliance, instead of a sincere attempt to block anyone from training on them.

valleyer•1d ago

Don't bother anyway. There are lots of cases of people trying and failing to go through the process, and there is no way to try a second time.

https://community.openai.com/t/session-expired-verify-organi...

https://community.openai.com/t/callback-from-persona-id-chec...

https://community.openai.com/t/verification-issue-on-second-...

https://community.openai.com/t/verification-not-working-and-...

https://community.openai.com/t/organization-verfication-fail...

https://community.openai.com/t/help-organization-could-not-b...

https://community.openai.com/t/to-verify-an-organization-acc...

BeetleB•1d ago

Yikes! Indeed, I won't bother.

OutOfHere•1d ago

o3 is very much needed in VSCode GitHub CoPilot for Ask/Edit/Agent modes. It is sorely missing there.

janstice•19h ago

Sure is - and o3 is missing from the OpenAI models that Azure is serving, which I suspect isn’t a coincidence - if OpenAI has some secret sauce that lets them undercut resellers this might shake up agreements for a bit.

alliao•1d ago

it used to take decades of erosion to make google search a hot mess, now that everything's happening in light speed, we get days for AI models to decay to the point of hot mess again..

godelski•1d ago

For those wondering

  Yesterday:               Today
  -------------           -------------
  Price                   Price
  Input:                  Input:
  $10.00 / 1M tokens      $2.00 / 1M tokens
  Cached input:           Cached input:
  $2.50 / 1M tokens       $0.50 / 1M tokens
  Output:                 Output:
  $40.00 / 1M tokens      $8.00 / 1M tokens

https://archive.is/20250610154009/https://openai.com/api/pri...

https://openai.com/api/pricing/

JojoFatsani•1d ago

O3 is really good. I haven’t had the same results with o4 unfortunately

muzani•23h ago

It's one of the most unfortunate naming conventions

vessenes•13h ago

Wait, you have access to o4? All I see is o4-mini, a distill of o4. I would not expect that to beat o3/o3-pro.

34679•1d ago

I'd like to offer a cautionary tale that involves my experience after seeing this post.

First, I tried enabling o3 via OpenRouter since I have credits with them already. I was met with the following:

"OpenAI requires bringing your own API key to use o3 over the API. Set up here: https://openrouter.ai/settings/integrations"

So I decided I would buy some API credits with my OpenAI account. I ponied up $20 and started Aider with my new API key set and o3 as the model. I get the following after sending a request:

"litellm.NotFoundError: OpenAIException - Your organization must be verified to use the model `o3`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate."

At that point, the frustration was beginning to creep in. I returned to OpenAI and clicked on "Verify Organization". It turns out, "Verify Organization" actually means "Verify Personal Identity With Third Party" because I was given the following:

"To verify this organization, you’ll need to complete an identity check using our partner Persona."

Sigh I click "Start ID Check" and it opens a new tab for their "partner" Persona. The initial fine print says:

"By filling the checkbox below, you consent to Persona, OpenAI’s vendor, collecting, using, and utilizing its service providers to process your biometric information to verify your identity, identify fraud, and conduct quality assurance for Persona’s platform in accordance with its Privacy Policy and OpenAI’s privacy policy. Your biometric information will be stored for no more than 1 year."

OK, so now, we've gone from "I guess I'll give OpenAI a few bucks for API access" to "I need to verify my organization" to "There's no way in hell I'm agreeing to provide biometric data to a 3rd party I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder. How do I get my $20 back?"

nateburke•1d ago

I don't recall Persona being in the mix last year when I signed up. Interesting development.

gwhr•1d ago

I think it was a recent thing [1], but I thought they were only considering it

[1] https://techcrunch.com/2025/04/13/access-to-future-ai-models...

conradev•1d ago

I was more excited by the process, like, there exists a model out there so powerful it requires KYC

which, after using it, fair! It found a zero day

__float•1d ago

I think they're probably more concerned about fake accounts and people finding ways to get free stuff.

abeindoria•1d ago

What free stuff? It requires a paid API.

DrammBA•1d ago

With no intention to tarnish your pure world view, paid services with low registration requirements are ideal for account laundering and subscription fraud with stolen credit cards

conradev•1d ago

I actually think they’re worried about foreign actors using it for…

- generating synthetic data to train their own models

- hacking and exploitation research

etc

gscott•22h ago

China is training their AI models using ChatGPT. They want to stop or slow that down.

olalonde•18h ago

Why? It seems counterproductive given OpenAI's mission statement: "We are building safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome."

stavros•16h ago

Because OpenAI's actual mission statement is "money, and lots of it".

conradev•12h ago

Yeah. I think they need money to deliver AGI

hiatus•1d ago

> which, after using it, fair! It found a zero day

Source?

sothatsit•1d ago

Recently, Sean Heelan wrote a post "How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation". It might be what they are referring to.

Link: https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-...

conradev•21h ago

Yep, that’s the one!

AstroBen•1d ago

KYC requirement + OpenAI preserving all logs in the same week?

mycall•1d ago

I think KYC has been beaten by AI agents according to RepliBench [0] as obtaining compute requires KYC which has a high success rate in the graphic.

[0] https://www.aisi.gov.uk/work/replibench-measuring-autonomous...

infecto•1d ago

KYC has been around for a few months I believe. Whenever they released some of the additional thought logs you had to be verified.

jjani•1d ago

OpenAI introduced this with the public availability of o3, so no.

It's also the only LLM provider which has this.

What OpenAI has that the others don't is SamA's insatiable thirst for everyone's biometric data.

34679•1d ago

Follow-up. I pasted the above into ChatGPT because I genuinely wanted to know how to get my 20 bucks back. That request is in and they say I'll get a response in 2-3 days. I then asked it "What are the potential risks of agreeing to their terms?" Here's the response:

Agreeing to Persona’s terms, especially for biometric identity verification, involves both privacy and long-term data security risks. Here’s a clear breakdown of the main risks you should be aware of: 1. Biometric Data Collection

Risk: Biometric identifiers (like facial recognition, voiceprints, etc.) are extremely sensitive and irreplaceable if compromised.

    What they collect: Persona may collect a selfie, video, and metadata, and extract biometric templates from those for facial comparison and liveness detection.

    If leaked or abused: Unlike passwords, you can't change your face. A future data breach or misuse could lead to permanent identity compromise.

 2. Data Storage & Retention

Risk: Persona says biometric data is kept for up to one year, but:

    You’re relying on their internal policies, not a legal guarantee.

    There’s no technical detail on how securely it’s stored or whether it’s encrypted at rest.

 Worst-case scenario: Poorly secured biometric templates could be stolen, reused, or matched against other data sets by bad actors or governments.
 3. Third-Party Sharing and Surveillance Risks

Risk: Your biometric and ID data may be shared with subprocessors (partners/vendors) that you haven’t explicitly vetted.

    Persona may transfer your data to cloud providers (like AWS, GCP), verification specialists, or fraud prevention services.

    Depending on jurisdiction, data could be subject to subpoenas, surveillance laws, or government backdoors (especially in the U.S.).

 4. Consent Ambiguity & Future Use

Risk: The fine print often includes vague consent for "quality assurance", "model improvement", or "fraud detection".

    This opens the door to retraining algorithms on your biometric data—even if anonymized, that's still a use of your body as data.

    Their privacy policy may evolve, and new uses of your data could be added later unless you opt out (which may not always be possible).

Should You Agree?

Only if:

    You absolutely need the service that requires this verification.

    You’re aware of the privacy tradeoff and are okay with it.

    You trust that Persona and its partners won’t misuse your biometric data—even a year down the line.

If you’re uneasy about this, you’re not alone. Many developers and privacy advocates refuse to verify with biometrics for non-critical services, and companies like OpenAI are increasingly facing criticism for requiring this.

stevage•1d ago

Huh, I never considered using ChatGPT to read terms of serivce for me. Nice idea.

leetrout•1d ago

I actually contacted the California AG to get a refund from another AI company after they failed to refund me.

The AG office followed up and I got my refund. Worth my time to file because we should stop letting companies get away with this stuff where they show up with more requirements after paying.

Separately they also do not need my phone number after having my name, address and credit card.

Has anyone got info on why they are taking everyone’s phone number?

jazzyjackson•1d ago

(having no insider info:) Because it can be used as a primary key ID across aggregated marketing databases including your voting history / party affiliation, income levels, personality and risk profiles etc etc etc. If a company wants to, and your data hygiene hasn't been tip top, your phone number is a pointer to a ton of intimate if not confidential data. Twitter was fined $150 million for asking for phone numbers under pretense of "protecting your account" or whatever but they actually used it for ad targeting.

>> Wednesday's 9th Circuit decision grew out of revelations that between 2013 and 2019, X mistakenly incorporated users' email addresses and phone numbers into an ad platform that allows companies to use their own marketing lists to target ads on the social platform.

>> In 2022, the Federal Trade Commission fined X $150 million over the privacy gaffe.

>> That same year, Washington resident Glen Morgan brought a class-action complaint against the company. He alleged that the ad-targeting glitch violated a Washington law prohibiting anyone from using “fraudulent, deceptive, or false means” to obtain telephone records of state residents.

>> X urged Dimke to dismiss Morgan's complaint for several reasons. Among other arguments, the company argued merely obtaining a user's phone number from him or her doesn't violate the state pretexting law, which refers to telephone “records.”

>> “If the legislature meant for 'telephone record' to include something as basic as the user’s own number, it surely would have said as much,” X argued in a written motion.

https://www.mediapost.com/publications/article/405501/None

azinman2•1d ago

OpenAI doesn’t (currently) sell ads. I really cannot see a world where they’re wanting to sell ads to their API users only? It’s not like you need a phone number to use ChatGPT.

To me the obvious example is fraud/abuse protection.

hnaccount_rng•1d ago

They don’t need to. It’s totally sufficient that they can correlate your chat history with your identity. That makes other identifiers more valuable, if they can extract your interests

brookst•19h ago

It’s a good conspiracy theory, but of course it’s scoped to only ChatGPT users who are also developers and using specifically the o3 model via API. So if it is a conspiracy, it’s a fairly non-ambitious one.

cmenge•18h ago

The typical use case of an API is not that you personally use it. I have hundreds of clients all go through my API key, and in most cases they themselves are companies who have n clients.

lcnPylGDnU4H9OF•18h ago

> It’s not like you need a phone number to use ChatGPT.

I’m pretty sure you do. Claude too. The only chatbot company I’ve made an account with is Mistral specifically because a phone number was not a registration requirement.

ax0ar•17h ago

They also require it now.

prmoustache•18h ago

The fact they don't sell ads doesn't mean they are not in the business of selling users data to third parties.

Also Netflix wasn't initially selling ads and there you have after increasing the price of their plans drastically in the last few years the ad supported subscription is probably the #1 plans because most people aren't willing to shed 15 to 25usd/€ every month to watch content that is already littered with ads.

fsmv•17h ago

If you sell ads you're actually incentivised not to sell data because then your competitors would be able to reach your users without paying you

ethbr1•9h ago

You're incentivized not to sell targeting data, but you're very incentivized to collect and maintain as much of it as you can, and then offer access using it as a service.

So, at the end of your day, company X has an overdetailed profile of you, rather than each advertiser. (And also, at least in the US, can repackage and sell that data into various products if it chooses)

KomoD•18h ago

> It’s not like you need a phone number to use ChatGPT.

When I signed up I had to do exactly that.

hshdhdhj4444•17h ago

They may not sell ads.

They may still buy data from ad companies and store credit cards, etc.

Many of them link users based on phone number.

azinman2•17h ago

But to do what with api users? Most api users won’t be individuals…

KomoD•11h ago

I bet there's way more individuals than companies that use the API

jazzyjackson•15h ago

You're thinking ads are to advertise products. Ads are to modify behavior to make you more likely to buy products.

ChatGPT has the capacity to modify behavior more subtly than any advertising ever devised. Aggregating knowledge on the person on the other end of the line is key in knowing how to nudge them toward the target behavior. (Note this target behavior may be how to vote in an election, or how to feel about various hot topics.)

ethbr1•9h ago

> Aggregating knowledge on the person on the other end of the line is key in knowing how to nudge them toward the target behavior.

It also, as Google learned, enables you to increase your revenue per placement. Advertisers will pay more for placement with their desired audience.

codedokode•15h ago

Obvious goal is to know the identity of users.

godelski•6h ago

  > To me the obvious example is fraud/abuse protection.

Phones are notorious for spam...

Seriously. How can the most prolific means of spam be used to prevent fraud and abuse? (Okay, maybe email is a little more prolific?) Like have you never received a spam call or text? Obviously fraudsters and abusers know how to exploit those systems... it can't be more obvious...

azinman2•3h ago

It costs money to get a phone number. It’s about friction, not elimination.

What would you do instead?

sgarland•17h ago

Tangential: please do not use a phone number as a PK. Aside from the nightmare of normalizing them, there is zero guarantee that someone will keep the same number.

ponector•17h ago

Even better: phone numbers are redistributed after some time of inactivity.

godelski•6h ago

Also fun fact, people mistype and mistranscribe data. Some people even... put down fake phone numbers because they don't want to be tracked!

I would think in a world where we constantly get spam calls and texts that people would understand that a phone number is not a good PKI. I mean we literally don't answer calls from unknown numbers because of this. How is it that we can only look at these things in one direction but not the other?

pembrook•16h ago

Source: have dealt with fraud at scale before.

Phone number is the only way to reliably stop MOST abuse on a freemium product that doesn't require payment/identity verification upfront. You can easily block VOIP numbers and ensure the person connected to this number is paying for an actual phone plan, which cuts down dramatically on bogus accounts.

Hence why even Facebook requires a unique, non-VOIP phone number to create an account these days.

I'm sure this comment will get downvoted in favor of some other conspiratorial "because they're going to secretly sell my data!" tinfoil post (this is HN of course). But my explanation is the actual reason.

I would love if I could just use email to signup for free accounts everywhere still, but it's just too easily gamed at scale.

LexGray•16h ago

On the flip side it makes a company seem sparklingly inept when they use VOIP as a method to filter valid users. I haven’t done business with companies like Netflix or Uber because I don’t feel like paying AT&T a cut for identity verification. There are plenty of other methods like digital licenses which are both more secure and with better privacy protections.

pembrook•16h ago

I wish we could all agree on a better way of auth -- but unfortunately this is all we have. Asking normal people to do anything outside of phone number or email (or 'login with [other account based on phone number or email]' for auth is basically impossible.

exceptione•16h ago

Maybe they should look into a non-freemium business model. But that won't happen because they want to have as much personal data as possible.

- Parent talks about a paid product. If they wants to burn tokens, they are going to pay for it.

- Those phone requirements do not stop professional abusers, organized crime nor state sponsored groups. Case in point: twitter is overrun by bots, scammers and foreign info-ops swarms.

- Phone requirements might hinder non-professional abusers at best, but we are sidestepping the issue if those corporations deserve that much trust to compel regular users to sell themselves. Maybe the business model just sucks.

pembrook•15h ago

I don't like requiring phone numbers either, but saying OpenAI shouldn't do freemium model for hottest tech product of this century (AI) is a fundamental misunderstanding of how humans and the world works.

Also, if they don't do freemium they're getting way more valuable information about you than just a phone number.

jimmydorry•14h ago

What part of this thread relates to freemium? Use of the API requires tokens that are paid for. General use of the AI via the web interface does not require a phone number.

Only requiring the phone number for API users feels needlessly invasive and is not explained by a vague "countering fraud and abuse" for a paid product...

AnthonyMouse•14h ago

> I'm sure this comment will get downvoted in favor of some other conspiratorial "because they're going to secretly sell my data!" tinfoil post (this is HN of course). But my explanation is the actual reason.

Your explanation is inconsistent with the link in these comments showing Twitter getting fined for doing the opposite.

> Hence why even Facebook requires a unique, non-VOIP phone number to create an account these days.

Facebook is the company most known for disingenuous tracking schemes. They just got caught with their app running a service on localhost to provide tracking IDs to random shady third party websites.

> You can easily block VOIP numbers and ensure the person connected to this number is paying for an actual phone plan, which cuts down dramatically on bogus accounts.

There isn't any such thing as a "VOIP number", all phone numbers are phone numbers. There are only some profiteers claiming they can tell you that in exchange for money. Between MVNOs, small carriers, forwarding services, number portability, data inaccuracy and foreign users, those databases are practically random number generators with massive false positive rates.

Meanwhile major carriers are more than happy to give phone numbers in their ranges to spammers in bulk, to the point that this is now acting as a profit center for the spammers and allowing them to expand their spamming operations because they can get a large number of phone numbers those services claim aren't "VOIP numbers", use them for spamming the services they want to spam, and then sell cheap or ad-supported SMS service at a profit to other spammers or privacy-conscious people who want to sign up for a service they haven't used that number at yet.

SheinH•12h ago

The discussion wasn't about freemium products though. Someone mentioned that they paid 20 bucks for OpenAI's API already and then they were asked for more verification.

giancarlostoro•12h ago

Thank you for this comment… a relative of mine spent a ton of money on an AI product that never came a license he cannot use. I told him to contact his states AG just in case.

jiggawatts•1d ago

This is in part "abuse prevention"[1] and in part marketing. Making customers feel like they're signing up to access state secrets makes the models seem more "special". Sama is well known to use these SV marketing tricks, like invite-only access, waiting lists, etc to psychologically manipulate users into thinking they're begging for entry to an exclusive club instead of just swiping a credit card to access an API.

Google tried this with Google Plus and Google Wave, failed spectacularly, and have ironically stopped with this idiotic "marketing by blocking potential users". I can access Gemini Pro 2.5 without providing a blood sample or signing parchment in triplicate.

[1] Not really though, because a significant percentage of OpenAI's revenue is from spammers and bulk-generation of SOE-optimised garbage. Those are valued customers!

paulcole•1d ago

HN Don’t Hate Marketing Challenge

Difficulty: Impossible

miki123211•1d ago

Gemini doesn't give you reasoning via API though, at least as far as I'm aware.

jiggawatts•1d ago

Works for me?

Maybe you’re thinking of deep research mode which is web UI only for now.

jjani•1d ago

If by reasoning you mean showing CoT, Gemini and OA are the same in this regard - neither provides it, not through the UI nor through the API. The "summaries" both provide have zero value and should be treated as non-existent.

Anthropic exposes reasoning, which has become a big reason to use them for reasoning tasks over the other two despite their pricing. Rather ironic when the other two have been pushing reasoning much harder.

finebalance•22h ago

Google exposes their reasoning. You can use their new gemini python sdk to get thought traces.

jjani•16h ago

Google does not expose their reasoning any more. They give "thought summaries" which provide effectively zero value. [1][2]

[1] https://ai.google.dev/gemini-api/docs/thinking#summaries [2] https://discuss.ai.google.dev/t/massive-regression-detailed-...

charliebwrites•1d ago

Doesn’t Sam Altman own a crypto currency company [1] that specifically collects biometric data to identify people?

Seems familiar…

[1] https://www.forbes.com/advisor/investing/cryptocurrency/what...

jjani•1d ago

GP did mention this :)

> I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder

93po•18h ago

the core tech and premise doesnt collect biometric data, but biometric data is collected for training purposes with consent and compensation. There is endless misinformation (willfully and ignorantly) around worldcoin but it is not, at its core, a biometric collection company

malfist•18h ago

Collecting biometrics for training purposes is still collecting biometrics.

coderatlarge•1d ago

this reminds me of how broadcom maintains the “free” tier of vmware.

teruakohatu•1d ago

Can you explain? Is it not actually free?

coderatlarge•16h ago

there are so many non-functional websites and signups required to get to the end of the rainbow that any sane person quits well before getting to any freely distributed software, if, in fact, there still is some.

5Qn8mNbc2FNCiVV•1d ago

This feels eerily similar to a post I've read a within the last month. Either I'm having a deja vu, it's a coincidence that the same exact story is mentioned or theres something else going on

bgwalter•1d ago

What should be going on? A regular Google search for "openai persona verify organization" shows withpersona.com in the second search result.

5Qn8mNbc2FNCiVV•1d ago

Yeah ok guess I misremembered it a bit but I was curious too and found the previous one I've thought of: https://news.ycombinator.com/item?id=43795406

Barbing•23h ago

Good eye! In this case, complaints are numerous. See a web search for:

openai persona verification site:community[.]openai[.]com

e.g. a thread with 36 posts beginning Apr 13:

"OpenAI Non-Announcement: Requiring identity card verification for access to new API models and capabilities"

But always good to be on look out for shenanigans :)

Retric•1d ago

This is OpenAI’s fairly dystopian process, so the exact same thing happens to lots of people.

verisimi•19h ago

It's a concerted attempt to de-anonymise the internet. Corporate entities are jostling for position as id authorities.

xboxnolifes•12h ago

This is just the process for OpenAI. It's the same process I went through as well.

bratao•1d ago

You are even luck to be able to verify. Mine give me an error about "Session expired" for months!! Support do not reply.

Marsymars•1d ago

Oh I also recently got locked out of my linkedin account until I supply data to Persona.

(So I’m remaining locked out of my linkedin account.)

fakedang•1d ago

As someone not in the US, I do a straight nope out whenever I see a Persona request. I advise everyone else to do the same. Afaik, it's used by LinkedIn and Doordash too.

ddtaylor•1d ago

I also am using OpenRouter because OpenAI isn't a great fit for me. I also stopped using OpenAI because they expire your API credits even if you don't use them. Yeah, it's only $10, but I'm not spending another dime with them.

cedws•1d ago

After how long do they expire?

zeograd•1d ago

IIRC, 1 year

0xdeafbeef•20h ago

Same for anthropic

bonki•21h ago

I wonder if they do this everywhere, in certain jurisdictions this is illegal.

cactusplant7374•20h ago

That is so sleezy.

johnnyyyy•18h ago

then you shouldn’t use OpenRouter. ToS: 4.2 Credit Expiration; Auto Recharge OpenRouter reserves the right to expire unused credits three hundred sixty-five (365) days after purchase

numlocked•16h ago

Hi - I'm the COO of OpenRouter. In practice we don't expire the credits, but have to reserve the right to, or else we have a uncapped liability literally forever. Can't operate that way :) Everyone who issues credits on a platform has to have some way of expiring them. It's not a profit center for us, or part of our P&L; just a protection we have to have.

mitthrowaway2•16h ago

If you're worried about the unlimited liability, how about you refund the credits instead of expiring them?

diggan•15h ago

Seems like a weird question to ask OpenRouter Inc, a for-profit company.

Really a shame OpenAI left their non-profit (and open) roots, could have been something different but nope, the machine ate them whole.

azemetre•13h ago

Why is it a bad thing to ask for a company to do right by their paid customers? This type of policy absolutely causes the company to lose more business in the future because it shows customers that they don't care about customers.

I never heard of OpenRouter prior to this thread, but will now never use them and advocate they never be used either.

mitthrowaway2•13h ago

A fair refund policy is not in conflict with a company being for-profit. I (and it seems many others) would be much less inclined to buy credits from a company that will expire them if I don't use it, and more inclined to buy credits from a service that will refund them if I end up not using it. Once I've bought them I'm more likely to use them. And in addition to reducing that purchasing friction and gaining market share, they can get the time-value of the money between when I bought the credits and when they eventually refund them.

Enlightened self-interest is when you realize that you win by being good to your customers, instead of treating customer service like a zero-sum game.

kfrane•13h ago

From my experience with billing systems it is usually not possible to refund a transaction after 6 or 12 months.

gotimo•13h ago

even possible with a some of them, but even in that case they're usually not "refunding" as much as they're just "making a new transaction for the same anount the other way" which does the same at the surface until reversals, voids or rejections happen and it all becomes a mess.

carstenhag•13h ago

Why only 365 days? Would be way fairer and still ok for you (if it's such a big issue) to expire them after 5 years.

otterley•9h ago

Out of curiosity, what makes you different from a retailer or restaurant that has the same problem?

csomar•1d ago

> How do I get my $20 back?

Contact support and ask for a refund. Then a charge back.

cess11•1d ago

I suspect their data collection might not be legal in the EU.

https://withpersona.com/legal/privacy-policy

To me it looks like an extremely aggressive data pump.

wqaatwt•1d ago

There are stories about e.g. Hetzner requiring all sorts of data from people who want to open/verify accounts so perhaps not. Might just be an anti “money laundering” thing. Especially if the credit card company ends up refunding everything..

7bit•23h ago

What stories? Can you back up that claim with some sources please?

TiredOfLife•22h ago

https://www.reddit.com/r/hetzner/search?q=id

7bit•19h ago

That's not a source and it is not my responsibility to backup the claims you made. That is yours. If you don't have any sources, and admit to just saying things that are not probable, I can also live with that.

wqaatwt•15h ago

What’s the source on OpenAI doing the same? How is it anymore legitimate in anyway?

Which kind of would make the entire “discussion” moot and pointless

wut42•11h ago

Hetzner is famously notorious for this, but not enough for publications to pick up this. So by your definitions, YEARS of people talking about their experiences with this is nothing?

zuppy•19h ago

Infomaniak did request my personal id or passport for creating a company account. I'm not going to provide any extra source for you, this is what happened to me, you can either believe it or not.

askl•22h ago

Crazy, I already gave up registering for chatgpt because they asked for my phone number. I'm not giving that to any random startup.

gloosx•22h ago

>ID Check

Just send them a random passport photo from the Internet, what's the deal? Probably they are just vibe-verifying the photo with "Is it legit passport?" prompt anyways.

sneak•22h ago

It requires video and an app. They are collecting facial biometrics.

gloosx•22h ago

App? So you cannot verify without a mobile phone?

sneak•12h ago

You can’t sign up in the first place without a mobile phone number. They require a phone number and block VoIP services.

gloosx•11h ago

Real mobile phone number to receive a code is 10¢. Maybe a bit more, but certainly not more than a dollar.

_joel•22h ago

I think modern face verification has moved on, it's been video in all my encounters.

gloosx•22h ago

still no real human is involved, as they mention their verification is automated and prohabilistic — which is especially funny to hear in context of verification. Im pretty sure even a kid can go around it, e.g. on the video showing a photo of a person holding his passport which you can find online.

tethys•21h ago

No. You have to turn your head, and stuff. Also, even if this would work, they allow only one verification per person per 90 days.

gloosx•11h ago

>one verification per person per 90 days.

this is absurd, how do they define "person"? On the internet I can be another person from another country in a minute, another minute I will be a different person from a different country.

_joel•19h ago

No, that's not how it works.

gloosx•11h ago

So your saying it's not just feeding your video to the AI model and blindly trusting it's outcome? Any evidence how it works then?

sneak•22h ago

Yeah, same. I am a paying API customer but I am not doing biometric KYC to talk to a bot.

baq•22h ago

Meanwhile the FSB and Mossad happily generate fake identities on demand.

romanovcode•22h ago

The whole point of identity verification is for the same Mossad to gather your complete profile and everything else they can from OpenAI.

Since Mossad and CIA is essentially one organization they already do it, 100%.

belter•19h ago

With all this plus the saving of all chats they can't operate on the EU. But they do ....

verisimi•19h ago

> OK, so now, we've gone from "I guess I'll give OpenAI a few bucks for API access" to "I need to verify my organization" to "There's no way in hell I'm agreeing to provide biometric data to a 3rd party I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder. How do I get my $20 back?"

This should be illegal. How many are going to do the same as you, but then think that the effort/time/hassle they would waste to try to get their money back would not be worth it? At which point you've effectively donated money to a corp that implements anti-consumer anti-patterns.

righthand•18h ago

Thank you for reminding me why I’ll never touch these LLM services.

shmoogy•18h ago

I was excited about trying o3 for my apps but I'm not doing this validation.. thanks for the heads up.

predkambrij•17h ago

Interesting, it works for me through openrouter, without configured openai integration. Although, I have openai account and did verification with openai before. Conspiricy theory would say that they are exchanging PII so openrouter knows who am I :)

exceptione•16h ago

Welcome to tech dystopia.

Hello Human Resource, we have all your data, please upload your bio-metric identity, as well as your personal thoughts.

Building the next phase of a corporate totalitarian state, thank you for your cooperation.

_345•15h ago

o3 is really powerful. I understand it tbh. They don't want scammers and abusers easily accessing it

stevev•1d ago

It was only a matter of time considering Deepseek R1’s recent release. OpenAI’s competitor is an open-source product that offers similar quality at a tenth of the cost. Now they’re just trying to prevent customers from leaving.

mrcwinn•1d ago

Only at HN can the reaction to an 80% price drop be a wall of criticism.

alternatex•23h ago

"80% price drop" is just a title. The wall of criticism is for the fine-print.

xboxnolifes•12h ago

The wall of criticism is all wild speculation, not fine print.

beering•4h ago

What in the fine print are we criticising? Most of the negative comments make no reference to any fine print on their website.

sagarpatil•1d ago

Meanwhile Apple: Liquid Glass

boyka•1d ago

80%? So this is either same Trump style "art of the deal" with setting unreasonable pricing in the first place or desperately needing customers?

mythz•22h ago

I've been turned off with OpenAI and have been actively avoiding using any of their models for a while, luckily this is easy to do given the quality of Sonnet 4 / Gemini Pro 2.5.

Although I've always wondered how OpenAI could get away with o3's astronomical pricing, what does o3 do better than any other model to justify their premium cost?

jstummbillig•20h ago

It's just a highly unoptimized space. There is very little market consolidation at this point, everyone is trying things out that lead to wildly different outcomes and processes and costs, even though in the end it's always just a bunch of utf-8 characters. o3 was probably just super expensive to run, and now, apparently, it's not anymore and can beat sonnet/opus 4 on pricing. It's fairly wild.

jsnider3•13h ago

Very few customers pick the model based on cost, for many ChatGPT is the only one they know of.

hu3•13h ago

> Very few customers pick the model based on cost.

What? 3 ou of 4 companies I consulted for that started using AI for coding marked cost as an important criteria. The 4th one has virtually infinite funding so they just don't care.

jsnider3•13h ago

> 3 out of 4 companies I consulted for that started using AI for coding marked cost as an important criteria.

And those aren't average customers.

monster_truck•18h ago

Curious that the number of usages for plus users remained the same. I don't think they're actually doing anything material to lower the cost by a meaningful amount. It's just margin they've always had, and they cut it because magistral is pretty incredible for being completely free

maxcomperatore•14h ago

groq is better

jsnider3•12h ago

No.

ucha•12h ago

Can we know for sure that the price drop is accompanied by a change in the model such as quantization?

On twitter, some people say that some models perform better at night when there is a less demand which allows them to serve a non-quantized model.

Since the models are only available through API and there is no test to check which version of the model is served, it's hard to know what we're buying...

Expanding Racks [video]

Chatterbox TTS

Microsoft Office migration from Source Depot to Git

AOSP project is coming to an end

DNS4EU, an EU-based DNS resolution service

Show HN: Eyesite - experimental website combining computer vision and web design

Research suggests Big Bang may have taken place inside a black hole

Show HN: Spark, An advanced 3D Gaussian Splatting renderer for Three.js

The hunt for Marie Curie's radioactive fingerprints in Paris

Plants hear their pollinators, and produce sweet nectar in response

V-JEPA 2 world model and new benchmarks for physical reasoning

How I Program with Agents

How long it takes to know if a job is right for you or not

My Cord-Cutting Adventure

Show HN: Ikuyo a Travel Planning Web Application

Unveiling the EndBOX – A microcomputer prototype for EndBASIC

TV Fool: See OTA channels you can receive

Congratulations on creating the one billionth repository on GitHub

Bypassing GitHub Actions policies in the dumbest way possible

OpenAI o3-pro

James Florio Turned Patrick Dougherty's Sculptures into Stellar Photography

Shaped (YC W22) Is Hiring

Show HN: RomM – An open-source, self-hosted ROM manager and player

The curious case of shell commands, or how "this bug is required by POSIX" (2021)

The Canadian C++ Conference

Show HN: S3mini – Tiny and fast S3-compatible client, no-deps, edge-ready

Fine-tuning LLMs is a waste of time

Firefox OS's story from a Mozilla insider not working on the project (2024)

EchoLeak – 0-Click AI Vulnerability Enabling Data Exfiltration from 365 Copilot

DeskHog, an open-source developer toy

Expanding Racks [video]

Chatterbox TTS

Microsoft Office migration from Source Depot to Git

AOSP project is coming to an end

DNS4EU, an EU-based DNS resolution service

Show HN: Eyesite - experimental website combining computer vision and web design

Research suggests Big Bang may have taken place inside a black hole

Show HN: Spark, An advanced 3D Gaussian Splatting renderer for Three.js

The hunt for Marie Curie's radioactive fingerprints in Paris

Plants hear their pollinators, and produce sweet nectar in response

V-JEPA 2 world model and new benchmarks for physical reasoning

How I Program with Agents

How long it takes to know if a job is right for you or not

My Cord-Cutting Adventure

Show HN: Ikuyo a Travel Planning Web Application

Unveiling the EndBOX – A microcomputer prototype for EndBASIC

TV Fool: See OTA channels you can receive

Congratulations on creating the one billionth repository on GitHub

Bypassing GitHub Actions policies in the dumbest way possible

OpenAI o3-pro

James Florio Turned Patrick Dougherty's Sculptures into Stellar Photography

Shaped (YC W22) Is Hiring

Show HN: RomM – An open-source, self-hosted ROM manager and player

The curious case of shell commands, or how "this bug is required by POSIX" (2021)

The Canadian C++ Conference

Show HN: S3mini – Tiny and fast S3-compatible client, no-deps, edge-ready

Fine-tuning LLMs is a waste of time

Firefox OS's story from a Mozilla insider not working on the project (2024)

EchoLeak – 0-Click AI Vulnerability Enabling Data Exfiltration from 365 Copilot

DeskHog, an open-source developer toy

OpenAI dropped the price of o3 by 80%

Comments