The gap between open weights LLMs and closed source LLMs

https://blog.doubleword.ai/frontier-os-llm

77•kkm•2h ago

Comments

samat•1h ago

Article confuses open source models with open weights models.

Not the same thing.

It’s used right in the articles body, but title is misleading.

NitpickLawyer•1h ago

Literally no one cares. There are "full" open certified GMO free grass fed training data blah blah models. Apertus, Olmo, etc. No one cares. For all intents and purposes people use the term to describe a model that you can run locally and are allowed to modify and re-release. The rest is useless semantics. No one can "rEpRoDuCe" a model anyway.

throwuxiytayq•1h ago

No-one cares to quit social media or stop using Windows, but it’s a goal worthy of discussion all the same.

The name is bad, doesn’t even make any fucking sense and it gives open source a bad rep.

komadori•1h ago

I wouldn't say that no one cares, but obviously many fewer people care when the cost of "recompiling" a model from its open source training pipeline is so high. Also, if you only have the weights, you can still use it to generate training data for a new model (i.e. distillation) so it's inherently less locked down then closed source binaries were.

judge2020•33m ago

open source vs source-available. Companies taking an extremely cautious approach to AI can't use source data that is potentially a violation of copyright (pending worldwide court decisions and/or regulation on said topic). Although that cat is already out of the bag for basically every stock-traded company using LLMs trained on non-licensed data, so I don't see there being much actual risk in using them.

reinitctxoffset•1h ago

I was advocating for "available weight" as a value neutral term for a while.

I gave up. No one cares. And no one will ever tell the truth about the training anyways.

Substantial and growing freedom beats zero freedom ever again.

jackconsidine•1h ago

Achilles and the tortoise [0] is usually a fallacy. If the tortoise has a head start, then Achilles will never catch it because in the time it takes Achilles to reach the tortoise's location the tortoise has moved some degree further, ad infinitum. Obviously not real because Achilles will pass the tortoise -- I think a fallacy because the framing creates a fake asymptote (they will both pass the point where they're approaching a tie).

In this case it may actually apply though, no? Open models get better from closed model distillation?

[0] https://en.wikipedia.org/wiki/Zeno%27s_paradoxes

profsummergig•1h ago

IMHO, the biggest problem with the future of open weights models is that currently, open weights models are the result of philanthropy by some private org. (e.g. DeepSeek).

The spigot can be turned off at any time.

Until there's some sort of "community owned hardware", open weights models are always at risk of being discontinued.

Shitty-kitty•1h ago

It's just a smart business decision that allows their models to compete and gain market-share against much pricier private models. No philanthropy there.

NitpickLawyer•1h ago

Yeah, but the biggest plus for open models is that they can never be taken away. In other words, whatever capabilities they reach (even if there will never be another model), those stay forever. That can't be said for API-based models where a provider can sunset models whenever they feel like (i.e. gpt5-mini will soon be gone, and replaced by a more expensive 5.4-mini, same for goog, etc).

And there will always be incentivised parties that release models. Nvda for one has every incentive to keep the nemotron line going, as they're directly profiting from people running this. And the models aren't really far from open SotA anyway.

Goog will probably continue to release the small models, since they'll use them for browser stuff anyway, and know that they'll leak. So for them it's a win-win to release the small models and gain some dev market share.

And the chinese labs also have incentives to keep releasing models, and will likely continue to get gov support to do so (yay commercial wars between nations).

felooboolooomba•1h ago

> they can never be taken away

Your right to 3d print whatever you want is about to be taken away (in California).

What software you can run on your computer can already be restricted.

Absolutely everything can be taken away. The simplest way to remove open models is probably to declare them a tool that terrorists could use. Crazy? Yes, the world is totally crazy these days.

jacobgold•1h ago

It would be interesting to know how much of a boost the closed models companies are giving the open models.

If the closed models stop improving will the progress of open models slow?

amluto•1h ago

> It would be interesting to know how much of the "distillation" boost is helping the open weight models keep up.

Some people in China surely know.

> Like if the closed models stop improving will all the closed models also stop improving?

Seems extremely unlikely, unless the models all hit some kind of wall soon. The Chinese companies may be behind the US in compute capacity, but they have excellent researchers [0] who are probably approximately as good as their US counterparts at the kind of problem generation and RL that is currently working so well.

I would be very surprised, though, if the models cannot continue to be improved rapidly in any area that allows a tight feedback loop like programming, at least up to the point where we puny humans lose the ability to define objective functions.

(And, conversely, I don’t expect magic in fields where the feedback is slow or expensive. A model is not about to reliably invent a wonderful medicine for the same reason that a large and extremely competent pharma company cannot: the evaluation process is extremely slow and it’s so expensive that the kind of utterly enormous corpus that is driving the current progress in coding is simply not available. Running RL on m iterations of n medication-development trajectories each is going to cost n*m times $10-100 million and take m years if it’s even possible at all.)

[0] The US advantage in this space will likely decline, since the brain drain from the rest of the world via the US university system to US labs is drying up.

typs•32m ago

Perhaps. RL env companies based in the U.S. sell to Chinese labs quite a bit too though (though on a discount, once they're no longer on the frontier)! And it would make sense that a lot of these problems which are based on work in the U.S. enterprise economy would be coming from the U.S.

justindotdev•1h ago

at first glance, these graphs are confusing

nsingh2•44m ago

Yea these plots are too noisy and dense. Especially that second one, lines all over the place.

gunalx•25m ago

Utterly unreadable on mobile

gehsty•1h ago

Interesting to consider this inline with recent us export bans, could the US be squandering its lead by giving the open source, largely Chinese labs catch up (in terms of model quality available to masses), will US labs be able to maintain the lead without users being able to use their latest models?

llmslave•1h ago

The gap is huge and im tired of reading these articles constantly

Gigachad•7m ago

Are you talking about hosted vs the ones you can easily run locally? Because there are open models that require hundreds of gb of vram which are apparently pretty close.

JumpCrisscross•59m ago

Now let’s look at the economics of buying versus renting. I’ve seen a lot of attention given to hardware capital costs. But a comment the other day got me thinking about power costs, too—at what performance differential do these factors intersect to make on-prem economically competitive with datacenters for businesses?

cj•58m ago

At what point will model advancement start to have minimal to no value for the majority of use cases?

Or is the idea that more advanced models will unlock more use cases?

dabinat•37m ago

I believe the open model party will eventually end. Perhaps because companies realize it’s too much of a commercial advantage, countries don’t want to give other countries commercial or military help, or maybe even an outright ban after someone uses an open model to guide them through how to make a bomb.

_pdp_•37m ago

Frankly it does not matter if there is gap because for most practical use-cases the end user can barely perceive the difference in intelligence.

On paper frontier models will be ahead of the curve but I don't think hardly anyone will be able to tell if a piece of work, say a landing page, is created with Fable or GLM and that is the point. The perceptible intelligence will reach a point beyond which it is no longer considered, except for some narrow use-case.

christina97•34m ago

The Chinese models will not overtake the frontier US ones given the current way things are going. The US models derive their lead from incredible efforts to source more and higher quality (mostly synthetic data) via great feats (eg generating with humongous teacher models that could never feasibly serve interactive traffic). The Chinese models advance via heroic efforts to optimize models and great feats to secure more and higher quality training data from the US frontier models.

For an (Chinese) open weight model to surpass the (US lab) frontier models, this equation must flip and the Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data; as well as procuring latest generation hardware en masse for this. This does not happen easily. Also training a frontier scale model is actually not such an unimaginable feat: doing all the inference with the teacher models is where the hardware goes.

andy99•22m ago

> Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data

Even if your characterization is accurate, they could do this tomorrow and are not so myopic that they wouldn’t have thought about it. I don’t see this as a barrier, and I see a lot of the same underestimation of Asia that’s been happening for 50 years. There’s not some innate American advantage to building LLMs, and personally I think whatever head start the US has is going to be squandered on delays from the export control “to dangerous for release” LARPing we’re seeing.

elisbce•19m ago

Chinese frontier models don't need to catch up in every category. They just need to win in coding and that's exactly where they are going. The gap went from 12+ months to 1-2 months with the latest release of GLM 5.2 and coding is a task that you don't need heroic efforts to find rare and long-tail training data, you can just outsmart your competitor by optimizing algorithms and training recipes. This is something they can do at scale with the money and talent pool.

redox99•1h ago

That only affects people in California. Whereas Fable being shut down affects people all over the world.

anticorporate•25m ago

There's also, importantly, a distinction between what are told we can no longer use, and what can actually be taken away.

Open source and open hardware can be called illegal by a government, but, if we collectively invest our energy into open alternatives, they can't be taken away in the same sense. I can build a RepRap printer and I can use a local AI model. It's on all of us to make sure that the open alternatives are viable, maybe in the current global political reality now more than ever.

Making something illegal isn't a disincentive for everyone. When they start banning books, some of us start assembling printing presses.

vitally3643•31m ago

Just like declaring piracy illegal stopped piracy and removed pirated materials from everyone's computers.

Everything cannot, in fact, be taken away. Don't propagandize yourself. Some things, like information, are free. Not even China can prevent all its citizens from accessing Western internet. USGov simply does not have the resources to find and audit every hard drive and USB stick in the country for illegal files. The internet cannot be censored 100% without literally cutting every cable and confiscating every radio.

The software that runs on my computer cannot, in fact, be restricted. It can be declared illegal, but there literally is no mechanism by which it can be enforced other than a government goon standing over my shoulder 24/7.

Some freedoms really cannot be removed without utterly implausible amounts of effort. Arguing otherwise is helping to erode freedom. So stop it.

Simran-B•24m ago

Remote attestation?

jfim•44m ago

True, but the capabilities and knowledge of that model are also frozen in time, so the value of that model declines over time.

A model that writes code without knowledge of any language or library changes for half a decade is less useful. A 2021 era chatgpt would be quite quaint in 2026.

Right now the Chinese labs might have incentives to release their models for free, and maybe Google is happy to release open weights today, but I'm sure there are already bean counters at Google salivating at the idea of having Gemini in Chrome as part of a Google AI monthly subscription just like YouTube premium and other Google subscriptions.

UncleOxidant•44m ago

> Nvda for one has every incentive to keep the nemotron line going

They're releases so far have been kind of lackluster compared to Qwen and other Chinese models. My suspicion is that Nvidia won't be releasing models that appear to compete with frontier models because that would upset their big customers.

fridder•1h ago

We need a SETI@Home but for model training

kamranjon•1h ago

Have been thinking about this a lot lately.

0x3f•1h ago

Consumer hardware over the internet is not really suitable for this, AFAIK.

baby_souffle•34m ago

There's some really early days work on making training loops robust to failure but they all have trade-offs right now.

I remain hopeful that we'll be able to democratize the entire tech stack for this tech.

Azantys•1h ago

I think model training is pretty hard to do efficiently on a vastly distributed network. If the model cant fit into the VRAM of the node your performance becomes so bad its useless, so a distributed model could only be properly trained if the size of the model doesnt exceed the majority of the nodes VRAM sizes. Maybe there is a different way of doing training but this would be the only way I can see. And it would still be much worse than just using a big datacenter where everything is fully interconnected. BOINC projects work great because its usually just a lot of small compute and memory required so every old desktop and laptop can contribute. Training a model which can compete and is not tiny requires neither low compute or low memory amount. BOINC tasks take minutes usually or sometimes hours but not weeks or months like training a model from scratch. But something like 7B or lower could maybe be trained like this. Im not sure but I think someone is already working on something like this but I dont remember the name of the project.

wuschel•14m ago

My understanding is that in addition to your comment and the development of a method to separate the training data for distributed learning, the latency/bandwidth of systems connected on the internet is a challenge, too. Information has to be sent around before and after any hypothetical number crunching.

ainka-ainka•33m ago

Here's a project trying that - https://nousresearch.com/nous-psyche

calebkaiser•23m ago

This has been a (noble) goal of lots of different projects in the community for a long time. Federated learning projects like Flower have been chipping away at it for a long time. There are many many hurdles to be cleared before anything in this area is super feasible as an alternative, but I applaud everyone who works on it.

recursive•1h ago

This seems backwards. Access to Fable can be removed. I don't see how an open weight model can ever be put back into the bag though.

Smaug123•1h ago

The model itself, sure; the comment is about the production of more advanced models (to keep open weights near the frontier).

ForHackernews•1h ago

It's not pure philanthropy: https://gwern.net/complement

notnullorvoid•1h ago

> Until there's some sort of "community owned hardware"

Or until some bright people figure out drastically more efficient means of training.

UncleOxidant•46m ago

> The spigot can be turned off at any time.

True. And it's possible that this has already happened at Alibaba Qwen - at least for the smaller models that people had a chance of running at home (122B and smaller).

gunalx•31m ago

We'll see. The qwen team has always released a few close to sota but proprietary models in between tgeir open releases. We did get 3.6 35B and 27B so its not all set in stone yet.

Its higley unlikely we get another open llama model though after the llama4 flop, even if their muse spark seems pretty good.

slashdave•11m ago

Training these models is not a "hardware" problem.

jmyeet•7m ago

How is this a complaint? Once you have the model, you have the model. Download DeepSeek-R1 671B and you have it. You might not get improvements in the future, just like you may not ever get a future release of an open source project. Is that an indictment of open source?

But consider the alternative. OpenAI and Anthropic can shut off your account or API key at any time for any reason. How is this better? You have way more security when you're running your own model.

amunozo•59m ago

Why are we assuming only American labs can innovate? DeepSeek already innovated a lot in efficiency, for example.

Schiendelman•30m ago

It's really unclear how much innovation DeepSeek has actually done, vs training on frontier model conversations.

slopinthebag•17m ago

Wym it's unclear? They publish their research...

CuriouslyC•11m ago

Coding a case where it's possible to programmatically generate large amounts of data relatively cheaply. China could realistically surpass the US in coding while still being behind in many other areas.

kulahan•4m ago

How so? You'll soon have your choice of a very old OAI model or a new Chinese model, because the USG has no interest in letting you access the newest models without explicit permission.

nomel•1m ago

> or a new Chinese model,

Trained using the same very old OAI models, unless things flip, as they said.

Previewing GPT‑5.6 Sol: a next-generation model

A C++ implementation of a fast hash map and hash set using hopscotch hashing

The gap between open weights LLMs and closed source LLMs

U.S. government will decide who gets to use GPT-5.6

MicroVMs: Run isolated sandboxes with full lifecycle control

We Can Still Stop California's 3D Printer Surveillance Scheme

The "Bizarre Headgear" exhibit at the Sam Noble museum

Show HN: Smart model routing directly in Claude, Codex and Cursor

Ultrasound imaging of the brain

The US lifts its block on Mythos 5

Hightouch (YC S19) Is Hiring

What Is a Nomogram and Why Would It Interest Me?

A Tiny Compiler for Data-Parallel Kernels

The open source DOCX editor submitted to HN a few weeks ago has been deleted

PlayStation Is Deleting 551 Movies from Customers' Accounts

Modern GPU Programming for MLSys

Show HN: Autofit2 – End-to-end pipeline for multilingual text classification

A human postmortem of the 1996 AOL outage

Long Wave radio era set to end with Droitwich switch-off

The Art of Kite Flying (1430–1929)

Pre-Modern Armies for Worldbuilders, Part III: Paying for It

The National Parks Were Reportedly Told to Stay Silent on Deaths

Lippmann Photography

Gossamer: a Rust-flavoured language with real goroutines and pause-free memory

Data centers trigger voter backlash

LaTeX.wasm: LaTeX Engines in Browsers

My Steam Machine is a 50ft HDMI cable

Slisp: Simple Lisp compiler (Linux/amd64)

Bipartite Matching Is in NC

Show HN: WebBase-III – dBASE III rebuilt in the browser with its own interpreter

Previewing GPT‑5.6 Sol: a next-generation model

A C++ implementation of a fast hash map and hash set using hopscotch hashing

The gap between open weights LLMs and closed source LLMs

U.S. government will decide who gets to use GPT-5.6

MicroVMs: Run isolated sandboxes with full lifecycle control

We Can Still Stop California's 3D Printer Surveillance Scheme

The "Bizarre Headgear" exhibit at the Sam Noble museum

Show HN: Smart model routing directly in Claude, Codex and Cursor

Ultrasound imaging of the brain

The US lifts its block on Mythos 5

Hightouch (YC S19) Is Hiring

What Is a Nomogram and Why Would It Interest Me?

A Tiny Compiler for Data-Parallel Kernels

The open source DOCX editor submitted to HN a few weeks ago has been deleted

PlayStation Is Deleting 551 Movies from Customers' Accounts

Modern GPU Programming for MLSys

Show HN: Autofit2 – End-to-end pipeline for multilingual text classification

A human postmortem of the 1996 AOL outage

Long Wave radio era set to end with Droitwich switch-off

The Art of Kite Flying (1430–1929)

Pre-Modern Armies for Worldbuilders, Part III: Paying for It

The National Parks Were Reportedly Told to Stay Silent on Deaths

Lippmann Photography

Gossamer: a Rust-flavoured language with real goroutines and pause-free memory

Data centers trigger voter backlash

LaTeX.wasm: LaTeX Engines in Browsers

My Steam Machine is a 50ft HDMI cable

Slisp: Simple Lisp compiler (Linux/amd64)

Bipartite Matching Is in NC

Show HN: WebBase-III – dBASE III rebuilt in the browser with its own interpreter

The gap between open weights LLMs and closed source LLMs

Comments