frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

GPT-5.2

https://platform.openai.com/docs/guides/latest-model
98•atgctg•40m ago

Comments

sfmike•23m ago
Everything is still based on 4 4o still right? is a new model training just too expensive? They can consult deepseek team maybe for cost constrained new models.
verdverm•21m ago
Apparently they have not had a successful pre training run in 1.5 years
fouronnes3•16m ago
I want to read a short scify story set in 2150 about how, mysteriously, no one has been able to train a better LLM for 125 years. The binary weights are studied with unbelievably advanced quantum computers but no one can really train a new AI from scratch. This starts cults, wars and legends and ultimately (by the third book) leads to the main protagonist learning to code by hand, something that no human left alive still knows how to do. Could this be the secret to making a new AI from scratch, more than a century later?
armenarmen•11m ago
I’d read it!
Wowfunhappy•17m ago
I thought whenever the knowledge cutoff increased that meant they’d trained a new model, I guess that’s completely wrong?
brokencode•7m ago
Typically I think, but you could pre-train your previous model on new data too.

I don’t think it’s publicly known for sure how different the models really are. You can improve a lot just by improving the post-training set.

elgatolopez•11m ago
Where did you get that from? Cutoff date says august 2025. Looks like a newly pretrained model
catigula•10m ago
The irony is that Deepseek is still running with a distilled 4o model.
zamadatix•23m ago
https://openai.com/index/introducing-gpt-5-2/
system2•23m ago
"Investors are putting pressure, change the version number now!!!"
exe34•21m ago
I'm quite sad about the S-curve hitting us hard in the transformers. For a short period, we had the excitement of "ooh if GPT-3.5 is so good, GPT-4 is going to be amazing! ooh GPT-4 has sparks of AGI!" But now we're back to version inflation for inconsequential gains.
verdverm•18m ago
2025 is the year most Big AI released their first real thinking models

Now we can create new samples and evals for more complex tasks to train up the next gen, more planning, decomp, context, agentic oriented

OpenAI has largely fumbled their early lead, exciting stuff is happening elsewhere

ToValueFunfetti•4m ago
Take this all with a grain of salt as it's hearsay:

From what I understand, nobody has done any real scaling since the GPT-4 era. 4.5 was a bit larger than 4, but not as much as the orders of magnitude difference between 3 and 4, and 5 is smaller than 4.5. Google and Anthropic haven't gone substantially bigger than GPT-4 either. Improvements since 4 are almost entirely from reasoning and RL. In 2026 or 2027, we should see a model that uses the current datacenter buildout and actually scales up.

Xiol•22m ago
Yawn.
josalhor•21m ago
From GPT 5.1 Thinking:

ARC AGI v2: 17.6% -> 52.9%

SWE Verified: 76.3% -> 80%

That's pretty good!

verdverm•16m ago
We're also in benchmark saturation territory. I heard it speculated that Anthropic emphasizes benchmarks less in their publications because internally they don't care about them nearly as much as making a model that works well on the day-to-day
quantumHazer•11m ago
Seems pretty false if you look at the model card and web site of Opus 4.5 that is… (check notes) their latest model.
poormathskills•16m ago
For a minor version update (5.1 -> 5.2) that's a way bigger improvement than I would have guessed.
catigula•10m ago
Yes, but it's not good enough. They needed to surpass Opus 4.5.
minimaxir•9m ago
Note that GPT 5.2 newly supports a "xhigh" reasoning level, which could explain the better benchmarks.

It'll be noteworthy to see the cost-per-task on ARC AGI v2.

causal•7m ago
That ARC AGI score is a little suspicious. That's a really tough for AI benchmark. Curious if there were improvements to the test harness because that's a wild jump in general problem solving ability for an incremental update.
egeres•20m ago
It baffles me to see these last 2 announcements (GPT 5.1 as well) devoid of any metrics, benchmarks or quantitative analyses. Could it be because they are behind Google/Anthropic and they don't want to admit it?

(edit: I'm sorry I didn't read enough on the topic, my apologies)

zamadatix•19m ago
This isn't the announcement, it's the developer docs intro page to the model - https://openai.com/index/introducing-gpt-5-2/. Still doesn't answer cross-comparison, but at least has benchmark metrics they want to show off.
fulafel•18m ago
So GDPval is OpenAI's own benchmark. PDF link: https://arxiv.org/pdf/2510.04374
mattas•13m ago
Are benchmarks the right way to measure LLMs? Not because benchmarks can be gamed, but because the most useful outputs of models aren't things that can be bucketed into "right" and "wrong." Tough problem!
Sir_Twist•9m ago
[delayed]
olliepro•5m ago
Do you have a better way to measure LLMs? Measurement implies quantitative evaluation... which is the same as benchmarks.
k2xl•12m ago
The ARC AGI 2 bump to 52.9% is huge. Shockingly GPT 5.2 Pro does not add too much more (54.2%) for the increase cost.
zug_zug•11m ago
For me the last remaining killer feature of ChatGPT is the quality of the voice chat. Do any of the competitors have something like that?
FrasiertheLion•10m ago
Try elevenlabs
bigyabai•10m ago
Qwen does.
Robdel12•7m ago
I have found Claude‘s voice chat to be better. I only recently tried it because I liked ChatGPTs enough, but I think I’m going to use Claude going forward. I find myself getting interrupted by ChatGPT a lot whenever I do use it.
Tiberium•9m ago
The only table where they showed comparisons against Opus 4.5 and Gemini 3:

https://x.com/OpenAI/status/1999182104362668275

https://i.imgur.com/e0iB8KC.png

JanSt•9m ago
The benchmarks are very impressive. Codex and Opus 4.5 are really good coders already and they keep getting better.

No wall yet and I think we might have crossed the threshold of models being as good or better than most engineers already.

GDPval will be an interesting benchmark and I'll happily use the new model to test spreadsheet (and other office work) capabilities. If they can going like this just a little bit further, much of the office workers will stop being useful.... I don't know yet how to feel about this.

Great for humanity probably but but for the individuals?

dandiep•7m ago
Still no GPT 5.x fine tuning?

I emailed support a while back to see if there was an early access program (99.99% sure the answer is yes). This is when I discovered that their support is 100% done by AI and there is no way to escalate a case to a human.

jazzyjackson•6m ago
Containment breach is going to occur from a disgruntled customer convincing the customer service bot it needs to get a hold of a supervisor
orliesaurus•7m ago
I told all my friends to upgrade or they're not my friends anymore /s
ImprobableTruth•7m ago
An almost 50% price increase. Benchmarks look nice, but 50% more nice...?
sigmar•6m ago
Are there any specifics about how this was trained? Especially when 5.1 is only a month old. I'm a little skeptical of benchmarks these days and wish they put this up on llmarena
johnsutor•5m ago
https://platform.openai.com/docs/models/gpt-5.2 More information on the price, context window, etc.
gkbrk•5m ago
Is this the "Garlic" model people have been hyping? Or are we not there yet?

Base UI

https://base-ui.com
1•handfuloflight•1m ago•0 comments

Google's GenTabs turn browser tabs into interactive apps

https://blog.google/technology/google-labs/gentabs-gemini-3/
1•py4•2m ago•0 comments

The Component Gallery

https://component.gallery/
1•weakfish•3m ago•0 comments

How to build a personal webpage from scratch

https://rutar.org/writing/how-to-build-a-personal-webpage-from-scratch/
1•fanf2•3m ago•0 comments

When Accuracy Meets Parallelism in Diffusion Language Models

http://66.42.62.31:1313/blogs/text-diffusion/
1•snyhlxde•4m ago•1 comments

Show HN: Bring screencasts into your editor with CodeMic

https://CodeMic.io/#hn
1•seansh•4m ago•1 comments

Flock cameras remained active after officials asked to be turned off

https://therecord.media/flock-safety-cameras-remained-active-after-cities-asked-turned-off
3•ghouse•6m ago•1 comments

ToGo – Python bindings for TG (Fast point-in-polygon)

https://github.com/mindflayer/togo
1•mindflayer•7m ago•1 comments

Anthropic donates MCP to the Linux Foundation for open and accessible AI

https://aaif.io/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation-aaif-...
1•Santosh83•7m ago•0 comments

Simple teflon coating boosts hydrogen production efficiency by 40%

https://techxplore.com/news/2025-12-simple-teflon-coating-boosts-hydrogen.html
1•geox•9m ago•0 comments

Light-bending ferroelectric controls blue and UV could transform chipmaking

https://phys.org/news/2025-12-material-blue-ultraviolet-advanced-chipmaking.html
1•westurner•9m ago•0 comments

Anthropic's Vision Advantage Is a Lot Like Apple's from the 2010s

https://danielmiessler.com/blog/anthropics-vision-advantage
1•wavelander•9m ago•0 comments

AI Can Write Your Code. It Can't Do Your Job

https://terriblesoftware.org/2025/12/11/ai-can-write-your-code-it-cant-do-your-job/
2•speckx•10m ago•0 comments

Why GPT-5.2 is our model of choice for Augment Code Review

https://www.augmentcode.com/blog/why-gpt-5-2-is-our-model-of-choice-for-augment-code-review
7•knes•10m ago•3 comments

AI Agent Security: A curated list of tools for red teaming and defense

https://github.com/ProjectRecon/awesome-ai-agents-security
1•ProjectRecon•10m ago•1 comments

100% Local LLM. Mistral Vibe vs. Opencode. A Claude Code Alternative? [video]

https://www.youtube.com/watch?v=WKBzcpU88zo
1•grigio•11m ago•0 comments

Most used programming languages in 2025

https://devecosystem-2025.jetbrains.com
1•birdculture•14m ago•0 comments

Bionetta: Efficient Client-Side Zero-Knowledge Machine Learning Proving

https://arxiv.org/abs/2510.06784
1•badcryptobitch•15m ago•1 comments

System76 Launches Pop _OS 24.04 LTS with Cosmic Desktop

https://www.phoronix.com/news/System76-Ships-Pop-OS-24.04
3•mikece•15m ago•0 comments

Show HN: CyberCage – Security platform for AI tools and MCP servers

https://cybercage.io/
4•ziyasal•16m ago•2 comments

Show HN: Free Security audit that checks what other tools miss

https://domainoptic.com/
1•renbuilds•16m ago•0 comments

Tool UI

https://www.tool-ui.com
1•handfuloflight•17m ago•0 comments

Protocolo Flux: RBS Financiada Por Canon Cósmico (ZKP Roadmap

https://paquinobr-svg.github.io/manifiesto-flux/
1•ProtocoloFLUX•17m ago•0 comments

Rivian goes big on autonomy, with custom silicon, Lidar, and a hint at robotaxis

https://techcrunch.com/2025/12/11/rivian-goes-big-on-autonomy-with-custom-silicon-lidar-and-a-hin...
4•ryan_j_naughton•17m ago•0 comments

Comparing AI Agents to Cybersecurity Professionals in Real-World Pen Testing

https://arxiv.org/abs/2512.09882
1•littlexsparkee•19m ago•1 comments

Marco Rubio bans Calibri font at State Department for being too DEI

https://techcrunch.com/2025/12/10/marco-rubio-bans-calibri-font-at-state-department-for-being-too...
3•rbanffy•21m ago•0 comments

Hyper-Scalers Are Using CXL to Lower the Impact of DDR5 Supply Constraints

https://www.servethehome.com/hyper-scalers-are-using-cxl-to-lower-the-impact-of-ddr5-supply-const...
1•rbanffy•23m ago•0 comments

Over 10k Docker Hub images found leaking credentials, auth keys

https://www.bleepingcomputer.com/news/security/over-10-000-docker-hub-images-found-leaking-creden...
3•todsacerdoti•24m ago•0 comments

Maybe AI is a regular platform shift

https://frontierai.substack.com/p/maybe-ai-is-a-regular-platform-shift
1•cgwu•25m ago•0 comments

GovSignals is solving government procurement using Trigger.dev

https://trigger.dev/customers/govsignals-customer-story
1•semicognitive•26m ago•0 comments