The problem is that the size of the training set required for a good model is so large, that's really hard to make a good model without including almost all known written text available.
Particularly one SQL question that has tripped every other model of similar or smaller size that I've tried, like Devstral 24B, Falcon 3 7B, Qwen2.5-coder 14B and Phi 4 14B.
The question contains an key point which is obvious for most humans, and which all of the models I tried previously have failed to pick up on. GPT-OSS picked up on it, and made a reasonable assumption.
It's also much more thorough at explaining code compared to the other models, again including details the others miss.
Now if only I had a GPU that could run the whole thing...
That seems like a good focus. Why learn details that can change within days of it being released? Instead, train the models to have good general knowledge, and be really good at using tools, and you won't have to re-train models from scratch just because some JS library now has a different API, instead the model goes out to fetch the latest APIs/gossip when needed.
"The main use-case for fine-tuning small language models is for erotic role-play, and there’s a serious demand."
Ah.
NitpickLawyer•1h ago
I think the mention of the "horny people" is warranted, they are an important part of the open models (and first to explore the idea of "identities / personas" for LLMs, AFAIK). Plenty of fine-tuning bits of know-how trickled from there to the "common knowledge".
There's a thing that I would have liked to be explored, perhaps. The idea that companies might actually want what -oss offers. While the local llm communities might want freedom and a horny assistant, businesses absolutely do not want that. And in fact they spend a lot of effort into implementing (sometimes less than ideal) guardrails, to keep the models on track. For very easy usecases like support chatbots and the like, businesses will always prefer something that errs on the side of less than useful but "safe", rather than have the bot start going crazy with sex/slurs/insults/etc.
I do have a problem with this section though:
> Really open weight, not open source, because the weights are freely available but the training data and code is not.
This is factually incorrect. The -oss models are by definition open source. Apache2.0 is open source (I think even the purists agree with this). The requirement of sharing "training data and code" is absolutely not a prerequisite for being open source (and historically it was never required. The craze surrounding LLMs suddenly made this a thing. It's not).
Here's the definition of source in "open source":
> "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
Well, for LLMs the weights are the "preffered form of making modifications". The labs themselves modify models the same as you are allowed to by the license! They might use more advanced tools, or better datasets, but in the end the definition still holds. And you get all the other stuff, like the right to modify, re-release, etc. I'd really wish people would stop proliferating this open weight nonsense.
Models released under open source licenses are open source. gpt-oss, qwens and mistrals (apache2.0), deepseeks(MIT), etc.
Models released under non open source licenses also exist, and they're not open source because the licenses under which they're released aren't. LLamas, gemmas, etc.
jononor•59m ago
When someone joins your data science team your would give them all this code and data. Not just the weights and say - the weights are the source, modify that to improve the model, I look forward to see your MR next week.
EDIT: Heck, sometimes the way to make improvements (modifications) is just to improve the data, and not touch the training code at all. It is often one of the most powerful ways. You still need training code though, and evaluation to measure the impact.
NitpickLawyer•43m ago
wizzwizz4•42m ago
charcircuit•8m ago
>The “source code” for a work means the preferred form of the work for making modifications
The GPL refers to a form of the artifact being released
mejutoco•52m ago
> The labs themselves modify models the same as you are allowed to by the license
Do the labs do not use source code?
It is a bit like arguing that releasing a binary executable is releasing the source code. One could claim developers modify the binary the same as you are allowed to.
NitpickLawyer•28m ago
The weights are part of the source code. When running inference on a model you use the architecture, config files and weights together. All of these are released. Weights are nothing but "hardcoded values". The way you reach those values is irrelevant in the license discussion.
Let's take a simple example: I write a chess program that is comprised of a source file with 10 "if" statements, a config file that matches between the variables used in the if statements and a "hardcoded values" file that stores the actual values. It would be a crappy chess program, but I hope you agree that I can release that as open source and no-one would bat an eye. You would also be granted the right to edit those hardcoded values, if you wish so. You'd perhaps make the chess bot better or worse. But you would be allowed to edit it, just like I would. That's the preferred way of modifying it. Me providing the methods that I used to reach those 10 hardcoded values has 0 bearing on my crappy chess bot being open source or not. Do we agree on that?
Now instead of 10 values, make it 100billion. Hey, that's an LLM!
> It is a bit like arguing that releasing a binary executable is releasing the source code.
That's the misconception. Weights are not a binary executable. In other words, there isn't another level above weights that the labs use to "compile" the weights. The weights exist from the beginning to the end, and the labs edit the weights if they want to modify the models. And so can you. There isn't a "compilation" step anywhere in the course of training a model.
tuckerman•51m ago
I also believe the four freedoms are violated to some extent (at least in spirit) by just releasing the weights and for some that might be enough to call something not open source. Your "freedom to study how the program works, and change it to make it do what you wish" is somewhat infringed by not having the training data. Additionally, gpt-oss added a (admittedly very minimal) usage policy that somewhat infringes on the first freedom, i.e. "the freedom to run the program as you wish, for any purpose".
BoorishBears•49m ago
Most "vibes" people are missing that it as only has 5B active parameters.
They read 120B and expect way more performance than a 24B parameter model, even though empricaly a 120B model with 5B active parameters is expected to perform right around there.
jchw•42m ago
Consider the following: it is possible to release binaries under the Apache2 license. Microsoft has, at least at one point, released a binary under the BSD license. These binaries are not open source because they are not source.
This isn't the same argument as given in the article though, so I guess it is a third position.
NitpickLawyer•22m ago
Agreed. But weights are not binaries in the licensing context. For weights to be binaries it would imply another layer of abstraction, above weights, that the labs use as the preferred way of modifying the model, and then "compile" it into weights. That layer does not exist. When you train a model you start with the weights (randomly initialised, can be 0 can be 1, can be any value, whatever works best). But you start with the weights. And at every step of the training process you modify those weights. Not another layer, not another abstraction. The weights themselves.