OpenAI's new open-source model is basically Phi-5

https://www.seangoedecke.com/gpt-oss-is-phi-5/

403•emschwartz•6mo ago

Comments

NitpickLawyer•6mo ago

Yeah, makes sense. Good observations regarding the benchmark vs. vibes in general, and I didn't know / made the connection between the lead of phi models going to oAI and gpt-oss. Could very well be a similar exercise + their "new" prompt level adherence (system > developer > user). In all the traces I've seen of refusals the model "quotes" the policy quite religiously. Similar thing was announced for gpt5.

I think the mention of the "horny people" is warranted, they are an important part of the open models (and first to explore the idea of "identities / personas" for LLMs, AFAIK). Plenty of fine-tuning bits of know-how trickled from there to the "common knowledge".

There's a thing that I would have liked to be explored, perhaps. The idea that companies might actually want what -oss offers. While the local llm communities might want freedom and a horny assistant, businesses absolutely do not want that. And in fact they spend a lot of effort into implementing (sometimes less than ideal) guardrails, to keep the models on track. For very easy usecases like support chatbots and the like, businesses will always prefer something that errs on the side of less than useful but "safe", rather than have the bot start going crazy with sex/slurs/insults/etc.

I do have a problem with this section though:

> Really open weight, not open source, because the weights are freely available but the training data and code is not.

This is factually incorrect. The -oss models are by definition open source. Apache2.0 is open source (I think even the purists agree with this). The requirement of sharing "training data and code" is absolutely not a prerequisite for being open source (and historically it was never required. The craze surrounding LLMs suddenly made this a thing. It's not).

Here's the definition of source in "open source":

> "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.

Well, for LLMs the weights are the "preffered form of making modifications". The labs themselves modify models the same as you are allowed to by the license! They might use more advanced tools, or better datasets, but in the end the definition still holds. And you get all the other stuff, like the right to modify, re-release, etc. I'd really wish people would stop proliferating this open weight nonsense.

Models released under open source licenses are open source. gpt-oss, qwens and mistrals (apache2.0), deepseeks(MIT), etc.

Models released under non open source licenses also exist, and they're not open source because the licenses under which they're released aren't. LLamas, gemmas, etc.

jononor•6mo ago

No the preferred way of making modifications is the weights _together_ with training (or fine tuning) scripts, and the entire evaluation pipeline to measure performance. And the data required to support all of this.

When someone joins your data science team your would give them all this code and data. Not just the weights and say - the weights are the source, modify that to improve the model, I look forward to see your MR next week.

EDIT: Heck, sometimes the way to make improvements (modifications) is just to improve the data, and not touch the training code at all. It is often one of the most powerful ways. You still need training code though, and evaluation to measure the impact.

NitpickLawyer•6mo ago

The license gives you the right to modify the weights, how you do the modification is up to you. The rest is in the realm of IP, know-how, etc. Apples and oranges.

imiric•6mo ago

Having the right to modify one part of the product is not the same as having the right to modify the entire product. Labeling such projects as open source in the full spirit of the definition is disingenuous.

This is similar to the approach taken by some video game studios: release the source code under a permissive license, but not the game assets. Which is better than a proprietary license, but it still presents a hurdle for the final product to be built from source.

The open weights approach is much more user hostile, however. Proprietary game assets can at least be purchased, and the final product can be built. With open weights, this is not possible. Nobody can realistically build the same model or similar models from weights alone. They can use the weights and self-host the prebuilt model, but not create revisions of it, which is the whole point of open source.

Weights are essentially the bytecode of language models. Sure, you can run and modify it with the right tools, but without the tools used to create it in the first place, the project is not much more useful than publishing binaries.

wizzwizz4•6mo ago

You also need the training data, so you can ensure you're not benchmarking on the training set, fine-tuning on the training set (overfitting with extra steps), or otherwise breaking things.

charcircuit•6mo ago

It's not about the preferred way. Else open source software would need to give you their IDE setup, CI/CD setup, access to all internal tools, etc. Software like sqlite don't release their full test suite. They paywall the preferred way of making changes, yet they are open source.

>The “source code” for a work means the preferred form of the work for making modifications

The GPL refers to a form of the artifact being released

lmm•6mo ago

> open source software would need to give you their IDE setup, CI/CD setup, access to all internal tools, etc.

IMO they do. If you can't modify it like a core contributor would, then it's not really open source. Traditional open source projects always included development guides, test configurations etc.

> Software like sqlite don't release their full test suite. They paywall the preferred way of making changes, yet they are open source.

That's a matter of opinion. IMO sqlite is not true open source, for precisely this reason.

mejutoco•6mo ago

The key is if you consider weights source code. I do not think this is a common interpretation.

> The labs themselves modify models the same as you are allowed to by the license

Do the labs do not use source code?

It is a bit like arguing that releasing a binary executable is releasing the source code. One could claim developers modify the binary the same as you are allowed to.

NitpickLawyer•6mo ago

> Do the labs do not use source code?

The weights are part of the source code. When running inference on a model you use the architecture, config files and weights together. All of these are released. Weights are nothing but "hardcoded values". The way you reach those values is irrelevant in the license discussion.

Let's take a simple example: I write a chess program that is comprised of a source file with 10 "if" statements, a config file that matches between the variables used in the if statements and a "hardcoded values" file that stores the actual values. It would be a crappy chess program, but I hope you agree that I can release that as open source and no-one would bat an eye. You would also be granted the right to edit those hardcoded values, if you wish so. You'd perhaps make the chess bot better or worse. But you would be allowed to edit it, just like I would. That's the preferred way of modifying it. Me providing the methods that I used to reach those 10 hardcoded values has 0 bearing on my crappy chess bot being open source or not. Do we agree on that?

Now instead of 10 values, make it 100billion. Hey, that's an LLM!

> It is a bit like arguing that releasing a binary executable is releasing the source code.

That's the misconception. Weights are not a binary executable. In other words, there isn't another level above weights that the labs use to "compile" the weights. The weights exist from the beginning to the end, and the labs edit the weights if they want to modify the models. And so can you. There isn't a "compilation" step anywhere in the course of training a model.

127•6mo ago

Training is obviously the compilation step.

jdiff•6mo ago

If you have 10 harcoded values, you have a binary blob, a common feature particularly in hardware drivers that is opaque and commonly considered to not be fully free unless the instructions for deriving it are also included. It's frequently just an executable, occasionally just configuration information, but difficult to change while (assuming no signing shenanigans) still remaining technically possible.

The training data is the source code and the training process is the compiler. There's a fairly direct metaphor to be made there. Different compilers can have vastly different impacts on the performance of the compiled executable.

mejutoco•6mo ago

> The weights are part of the source code.

If you will allow me the absurd analogy: my arm is also part of a person (me), but my arm is not a person. My arm does not have its own bank account and pays taxes independently.

I get how the weights are not exactly like binary code. Good points. But they are also not source code (from your own quote)

> "Source" form shall mean the preferred form for making modifications

The weights are not the preferred form of making modifications. At most, one could argue it is the weights + source code.

> In other words, there isn't another level above weights that the labs use to "compile" the weights

The source and training data?

I see your points, and it is an interesting discussion of nuances, but I profoundly disagree that the weights are "the preferred form for making modifications". For these reasons I prefer the term "open weights" for these projects.

tuckerman•6mo ago

I mostly agree with your assessment of what we should/shouldn't call open source for models but there is enough grey area to make the other side a valid position and not worthy of being dismissed so easily. I think there is a fine line between model weights and, say, bytecode for an interpreter and I think if you released bytecode dumps under any license it would be called out.

I also believe the four freedoms are violated to some extent (at least in spirit) by just releasing the weights and for some that might be enough to call something not open source. Your "freedom to study how the program works, and change it to make it do what you wish" is somewhat infringed by not having the training data. Additionally, gpt-oss added a (admittedly very minimal) usage policy that somewhat infringes on the first freedom, i.e. "the freedom to run the program as you wish, for any purpose".

charcircuit•6mo ago

You are free to look at every single weight and study how it affects the result. You can see how the model is architected. And you don't need training data to be provided to be able to modify the weights. Software can still be open source even if it isn't friendly to beginners.

tuckerman•6mo ago

I think you could say something remarkably similar about just releasing bytecode as well and I think most people would call foul at that. I don't think it's so cut and dry.

This isn't entirely about being a beginner or not either. Full fine-tuning without forgetting does really want the training data (or something that is a good replacement). You can do things like LoRa but, depending on your use case, it might not work.

BoorishBears•6mo ago

"Good observations regarding the benchmark vs. vibes in general"

Most "vibes" people are missing that it as only has 5B active parameters.

They read 120B and expect way more performance than a 24B parameter model, even though empricaly a 120B model with 5B active parameters is expected to perform right around there.

jchw•6mo ago

I think source code really only exists in terms of the source code/object code dichotomy, so what "traditional" open source means for model weights is really not obvious if you only go off of traditional definitions. Personally I think the word "open source" shouldn't apply here anymore than it would for art or binary code.

Consider the following: it is possible to release binaries under the Apache2 license. Microsoft has, at least at one point, released a binary under the BSD license. These binaries are not open source because they are not source.

This isn't the same argument as given in the article though, so I guess it is a third position.

NitpickLawyer•6mo ago

> Consider the following: it is possible to release binaries under the Apache2 license. Microsoft has, at least at one point, released a binary under the BSD license. These binaries are not open source because they are not source.

Agreed. But weights are not binaries in the licensing context. For weights to be binaries it would imply another layer of abstraction, above weights, that the labs use as the preferred way of modifying the model, and then "compile" it into weights. That layer does not exist. When you train a model you start with the weights (randomly initialised, can be 0 can be 1, can be any value, whatever works best). But you start with the weights. And at every step of the training process you modify those weights. Not another layer, not another abstraction. The weights themselves.

jchw•6mo ago

In my opinion, though, they're also not really source code either. They're an artifact of a training process, not code that was written by someone.

NitpickLawyer•6mo ago

> They're an artifact of a training process, not code that was written by someone.

If that were relevant to the licensing discussion, then you'd have to consider every "generated" parts (interfaces, dataclasses, etc) of every open source project artefacts. Historically, that was never the case. The license doesn't care if a hardcoded value was written by a person or "tuned" via a process. It's still source code if it's the preferred way of modifying said code. And it is. You can totally edit them by hand. It would not work as well (or at all), but you could do it.

jchw•6mo ago

There is actually a gray area about what code "counts" as source code to the point where you would consider it "open source" if it were licensed as such. I think if you had a repository consisting of only generated code and not the code used to generate it, it would definitely raise the question of whether it should be considered "source code" or "open source", and I think you could make arguments both ways.

On the other hand, I don't really think that argument then extends to model weights, which are not just some number of steps removed from source code, but just simply not really related to source code.

tarruda•6mo ago

If a model is trained only on synthetic data, is it still possible it will output things like this? https://x.com/elder_plinius/status/1952958577867669892

LeoPanthera•6mo ago

By definition, a model can't "know" things that are not somewhere in its training set, unless it can use a tool to query external knowledge.

The problem is that the size of the training set required for a good model is so large, that's really hard to make a good model without including almost all known written text available.

janalsncm•6mo ago

> all known written text available

If phi5 is trained on synthetic data only then info on how to make drugs must be in the synthetic dataset.

eru•6mo ago

> By definition, a model can't "know" things that are not somewhere in its training set, unless it can use a tool to query external knowledge.

Well, it could also make inferences. Like, it could find a new mathematical proof, even if that's never in the training set.

xwolfi•6mo ago

But how, it's not like it's thinking, it's just spitting the next likely token

createaccount99•6mo ago

That is an oversimplification, and not the whole truth. Read anthropic's blog posts if you want to learn more, or ask gpt5.

frotaur•6mo ago

This can generate new text. If the abilities generalise somewhat (and there is lots of evidence they DO generalise on some level), then there is no obstacle to generating new proofs, although the farther away they are from the training data, the less likely it becomes.

For an obvious example of generalisation: the models are able to write more code than there is in the dataset. If you ask it to write some specific, though easy, function, it is very unlikely it is present verbatim in the dataset, and yet the model can adapt.

JoshTriplett•6mo ago

In theory, it's possible. https://x.com/OwainEvans_UK/status/1947689616016085210

It's not particularly likely that the hidden information encoded in synthetic data would happen to include specific details for making LSD or VX, but it's much more plausible that synthetic data contains some information the model's trainers would prefer to not incorporate in the model.

lifis•6mo ago

Does anyone know how synthetic data is commonly generated? Do they just sample the model randomly starting from an empty state, perhaps with some filtering? Or do they somehow automatically generate prompts and if how? Do they have some feedback mechanism, e.g. do they maybe test the model while training and somehow generate data related to poorly performing tests?

LeoPanthera•6mo ago

I don't know about Phi-5, but earlier versions of Phi were trained on stories written by larger models trained on real-world data. Since it's Microsoft, they probably used one of the OpenAI GPT series.

Mars008•6mo ago

> stories written by larger models trained on real-world data

I suspect there are no larger models trained on pure real-world data. They all use a mix of real and generated.

janalsncm•6mo ago

It’s common to use rejection sampling: sample from the model and throw out the samples which fail some criteria like a verifiable answer or a judgement from a larger model.

Mars008•6mo ago

One way of getting good random samples is to give model a random starting points. For example: "write a short story about PP doing GG in XX". Here PP, GG and XX are filled algorithmically from lists of persons, actions and locations. The problem is model's randomly generated output from the same prompt isn't actually that random. Changing the temperature parameter doesn't help much.

But in general it's a big secret because the training data and techniques are the only difference between models as architecture is more or less settled.

duchenne•6mo ago

I have done that at meta/FAIR and it is published in the Llama 3 paper. You usually start from a seed. It can be a randomly picked piece of website/code/image/table of contents/user generated data, and you prompt the model to generate data related to that seed. After, you also need to pass the generated data through a series of verifiers to ensure quality.

ethan_smith•6mo ago

Common synthetic data generation methods include distillation (teacher-student), self-improvement via bootstrapping (model improves its own outputs), instruction-following synthesis, and controlled sampling with filtering for quality/alignment.

magicalhippo•6mo ago

I've found good use of Phi-4 at home, and after a few tests of the GPT-OSS 20B version I'm quite impressed so far.

Particularly one SQL question that has tripped every other model of similar or smaller size that I've tried, like Devstral 24B, Falcon 3 7B, Qwen2.5-coder 14B and Phi 4 14B.

The question contains an key point which is obvious for most humans, and which all of the models I tried previously have failed to pick up on. GPT-OSS picked up on it, and made a reasonable assumption.

It's also much more thorough at explaining code compared to the other models, again including details the others miss.

Now if only I had a GPU that could run the whole thing...

VladVladikoff•6mo ago

Can you share the question? Or are you intentionally trying to keep it out of the training data pool?

magicalhippo•6mo ago

Sadly no. I'd like to keep it untainted, but also because the tables involved are straight from my work, which is very much not OSS.

I can however try to paraphrase it so you get the gist of it.

The question asks to provide a SQL statement to update rows in table A based on related tables B and C, where table B is mentioned explicitly and C is implicit through the foreign keys provided in the context.

The key point all previous models I've tested has missed, is that the rows in A are many-to-one with B, and so the update should take this into account. This is implicit from the foreign key context and not mentioned directly in the question.

Think distributing pizza slices between a group of friends. All previous models has completely missed this part and just given each friend the whole pizza.

GPT-OSS correctly identified this issue and flagged it in the response, but also included a sensible assumption of evenly dividing the pizza.

I should note some of the previous models also missed the implicit connection to table C, and thus completely failed to do something sensible. But at least several of them figured this out. Of course I forgot to write that part down so can't say offhand which did what.

As for the code, for example I've coded a Y combinator in Delphi, using intentionally terse non-descriptive names, and asked the models to explain how the code works and what it does. Most ~7B models and larger of the past year or so have managed to explain it fairly well. However GPT-OSS was much more thorough and provider a much better explanation, showing a significantly better "understanding" of the code. It was also the first model smaller than LLama 3 70B that I've tried that correctly identified it as a Y combinator.

magicalhippo•6mo ago

Here's a more concrete example where GPT-OSS 20B performed very well IMHO. I tested it against Gemma 3 12B, Phi 4 Reasoning 14B, Qwen 2.5-coder 14B.

The prompt is modeled as a part of an agent of sorts, and the "human" question is intentionally ill-posed to emulate people saying the wrong thing.

The prompt begins with asking the model to convert a question into matlab code, add any assumptions as comments at the start of the coder, or if it's not possible then output four hash marks followed by an reason why.

The (ill-posed) question is "What's the cutoff frequency for an LC circuit with R equals 500 ohm and C equals 10 nanofarrad?"

Gemma 3 took the bait and treated R as L and proceeded to calculate the cutoff frequency of an LC circuit[1], completely ignoring the resulting mismatch of units. It did not comment at all. Completely wrong answer.

Qwen 2.5-coder detected the ill-posed nature, but instead decided to substitute a dummy value for L before calculating the LC circuit answer. On the upside it did add the comments saying this, so acceptable in that regard.

Phi 4 Reasoning reasoned for about 3 minutes before deciding to assume the question is about an RC circuit. It added this as a comment, and correctly generated the code for an RC circuit. So good answer, but slow.

GPT-OSS reasoned for 14 seconds, and determined the question was ill posed, thus outputting the hash marks followed by The cutoff frequency of an LC circuit cannot be determined with only R and C provided; the inductance L is required. Good answer, and fast.

[1]: https://en.wikipedia.org/wiki/LC_circuit#Resonance_effect

iamnotagenius•6mo ago

glm 4 (1 sec):

To determine the cutoff frequency (fc ) for an RC circuit (since you've provided resistance R and capacitance C, but not inductance L), we can use the following formula:

[.... calculation]

So, the cutoff frequency is approximately 31.83 kHz.

Note:

If you intended to ask about an RLC circuit (with both R, L, and C), please provide the inductance L value, and I can calculate the cutoff frequency for that case as well. The formula would then involve both L and C.

Mkengin•6mo ago

Why Qwen2.5 and not Qwen3-30B-A3B-Thinking-2507 or Qwen3-Coder-30B-A3B-Instruct?

magicalhippo•6mo ago

Mostly because I had it downloaded already and I'm mostly interested in models that fit on my 16GB GPU. But since you asked, I ran the same questions through both 30B models in the q4_k_m variant, as GPT-OSS 20B is also quantized to about q4.

First the ill-posed question:

Qwen 3 Coder gave very similar answer to Phi 4, though included a more long-winded explanation in the comments. So not bad, but not great either.

Qwen 3 Thinking thought for a good minute before deciding the question was ill-posed and return the hash marks. However the following explanation was not as good as GPT-OSS, IMHO: The question is unclear because an LC circuit (without resistance) does not have a "cutoff frequency"; cutoff frequency applies to filter circuits like RC or RLC. Additionally, the inductance (L) value is missing for calculating resonant frequency in an RLC circuit. The given R and C values are insufficient without L.

Sure, an unloaded LC filter doesn't have a cutoff frequency, but in all normal cases the load is implied[1] and so the LC filter does have a cutoff frequency. So more thinking to get to a worse answer.

The SQL question:

Qwen 3 Coder did identify the same pitfall as GPT-OSS, however didn't flag it as clearly as GPT-OSS, mostly because it also flagged some unnecessary stuff so got drowned. It did make the same assumption about evenly dividing, and overall the answer was about as good. However the speed on my computer was roughly half the number of tokens per second as GPT-OSS, at just ~9 tokens/second.

Qwen 3 Thinking thought for 3 minutes, yet managed to miss the key aspect, thus giving everyone the pizza. And it did so at the same slow pace as Qwen 3 Coder.

The SQL question requires a somewhat large context due to the large table definitions, and being a larger model it required pushing more layers to the CPU, which I assume is the major factor in the speed drop.

So overall Qwen 3 Coder was a solid contender, but on my PC much slower. If it could run entirely on GPU I'd certainly try it a lot more. Interestingly Qwen 3 Thinking was just plain worse. Perhaps not tuned to other tasks besides coding?

[1]: https://www.ti.com/lit/an/slaa701a/slaa701a.pdf section 3.3 page 9

[2]: https://github.com/ollama/ollama/issues/11772

Mkengin•6mo ago

Thank you for testing, I will test GPT-OSS for my use case as well. If you're interested I have 8 GB VRAM, 32 GB RAM and get around 21 token/s with tensor offloading, I would assume that your setup should be even faster than mine with the optimizations. I use the IQ4_KSS quant (by ubergarm on hf) with ik_llama.cpp with this command:

$env:LLAMA_SET_ROWS = "1"; ./llama-server -c 140000 -m D:\ik_llama.cpp\build\bin\Release\models\Qwen3-Coder-30B-A3B-Instruct-IQ4_KSS.gguf -ngl 999 --flash-attn -ctk q8_0 -ctv q8_0 -ot "blk\.(19|2[0-9]|3[0-9]|4[0-7])\.ffn_.*_exps\.=CPU" --temp 0.7 --top-p 0.8 --top-k 20 --repeat_penalty 1.05 --threads 8

In my case I offload layers 19-47, maybe you would just have to offload 37-47, so "blk\.(3[7-9]|4[0-7])\.ffn_.*_exps\.=CPU"

magicalhippo•6mo ago

Yeah I think I could get better performance out of both by tweaking, but so far the ease of use has triumphed so far.

compumetrika•6mo ago

Get a Strix Point or Strix Halo with 128GB DDR5 RAM and you can run gpt-oss 120B at 10-20+ TPS.

magicalhippo•6mo ago

Good point, though at the price of a 5090 I'm more tempted to get the 5090, as I do still game a bit as well.

diggan•6mo ago

> for instance, they have broad general knowledge about science, but don’t know much about popular culture

That seems like a good focus. Why learn details that can change within days of it being released? Instead, train the models to have good general knowledge, and be really good at using tools, and you won't have to re-train models from scratch just because some JS library now has a different API, instead the model goes out to fetch the latest APIs/gossip when needed.

wmf•6mo ago

Yeah, it always seemed like a sad commentary on our world that AIs are devoting their weights to encyclopedic knowledge of Harry Potter, Pokemon, and Reddit trolling.

eru•6mo ago

Why? You gotta provide what your customers want.

And it's far from sad that we have so many resources, we can give everyone a supercomputer in their pocket just to take selfies and talk about Pokemon. Why would our AIs be any different?

xwolfi•6mo ago

Because we really could use all that time, money and uranium to train them on complex problems we need to solve, rather than entertainment.

But you're right, shareholders of OpenAI want profit, not progress, and they'll give us our intellectual Big Mac.

eru•6mo ago

You could say that about any piece of music or movie ever, too. Or any novel.

eru•6mo ago

Why would anything change?

You feed the model approximately all the text you have ever. And some things like 'popular culture of 2025' won't change, just because the calendar changed to 2026. Just like the popular culture of the 1980s is what it was, and won't change.

int_19h•6mo ago

We don't feed the model all the text ever. They are still trained on less than 1% of the entire Internet corpus.

eru•6mo ago

You are right, though on the other hand feeding it a selection of 1% of the entire corpus is already pretty close to 'all the text' (if you assume exponential growth in training over time).

Even multiplying that to approximately 100% of that corpus plus adding lots of non-internet text, will pale in comparison to all the non-text training data we will (or are) feeding our coming (and existing) multi-modal models.

If I may go out an a limb here: either we will see continuous great progress on text-based LLMs alone, or multi-modal models will become the next big focus. (Or both.)

That's because people are hungry for progress, and going multi-modal is the obvious thing to try to focus on, if text alone proves infeasible to drive progress.

Just to be clear: I make no prediction here on whether multi-modal will lead to progress, just that people will obviously try it and try it hard, if the focus on text starts to stall.

diggan•6mo ago

> Why would anything change?

It's not that facts change across time, but the relevancy of the details change. For example, it would be great if we could teach LLMs all the APIs all React versions have ever had, but if we do that for everything, there will be no limit to the weight's weight, and we'd need new weights every quarter if not more often. That seems very unsustainable.

So the information that corresponds to "What is the current React API for X" changes whenever the API changes, but "What is the React v5 API for X" remains the same. Having the model being able to look up those things via external channels would let us use the same models for way longer, if you need "up to date data" about things.

wmf•6mo ago

I saw a bunch of people complaining on Twitter about how GPT-OSS can't be customized or has no soul and I noticed that none of them said what they were trying to accomplish.

"The main use-case for fine-tuning small language models is for erotic role-play, and there’s a serious demand."

Ah.

kristopolous•6mo ago

Porn is always the frontier.

It's a well-understood self-contained use-case without many externalities and simple business models.

What more, with porn, the medium is the product probably more than the content. Having it on home-media in the 80s was the selling point. Getting it over the 1-900 phone lines or accessing it over the internet ... these were arguably the actual product. It might have been a driver of early smart phone adoption as well. Adult content is about an 80% consumption on handheld devices while the internet writ large is about 60%.

Private tunable multi-media interaction on-demand is the product here.

Also it's a unique offer. Role playing prohibited sexual acts can be done arguably victim free.

There's a good fiction story there... "I thought I was talking to AI"

shortrounddev2•6mo ago

There's something Freudian about the idea that the more you can customize porn, the more popular it is. That, despite the impression that "all men want one thing", it turns out that men all want very different and very oddly specific things. Imbuing somrthing with a "magical" quality that doesnt exist is the origin of the term "fetish". Its not about the raw attractive preference for a particular hair color; its a belief in the POWER of that hair color.

kristopolous•6mo ago

oh it's wildly different. About 15 years ago I worked on a porn recommendation system. The idea is that you'd follow a number of sites based on likes and recommendations and you'd get an aggregated feed with interstitial ads.

So I started with scraping and cross-reference, foaf, doing analysis. People's preferences are ... really complex.

Without getting too lewd, let's say there's about 30-80 categories with non-marginal demand depending on how you want to slice it and some of them can stack so you get a combinatoric.

In early user testing people wanted the niche and found the adventurous (of their particular kind) to be more compelling. And that was the unpredictable part. The majoritarian categories didn't have stickiness.

Nor did these niches have high correlation. Someone could be into say, specific topic A (let's say feet), and correlating that with topic B (let's say leather) was a dice roll. The probabilities were almost universally < 10% unless you went into majoritarian categories (eg. fit people in their 20s).

People want adventure on a reservation with a very well defined perimeter - one that is hard to map and different for every person.

So the value-add proposition went away since it's now just a collection of niche sites again.

Also, these days people have Reddit accounts reserved for porn where they do exactly this. So it was built after all.

kridsdale3•6mo ago

You may be interested in the data surfaced by this large-scale survey[1]

[1] https://aella.substack.com/p/fetish-tabooness-and-popularity...

kristopolous•6mo ago

This is interesting but there's a little more to it, especially with the erotic.

If people were polled what they want to see on social media, few would say things that are inflammatory, upsetting, divisive, etc but those as we know are strong drivers of engagement.

It's because you're polling for affinity or disclosed preference not for the actual engagement drivers.

For instance, if a male says they watch male pornography, they are labeling, or at least stating an affinity to a sexual identity.

However, the identities people choose to own are not the same as the preferences they actually have.

Instead if you track things like scroll velocity, linger time, revisitation, the time distance (such as 2 days apart instead of 5 minutes) a different story emerges.

For instance a given male could frequently look at male pornography but for all kinds of social reasons not want that affinity so they'd never even internally ideate the preference although their behavior of frequenting male content will be there regardless.

That's one of the problems with this approach is that not many people want to own all the social identities which map to their preferences so they don't openly identify it.

There (maybe) three levels of acceptance: admitting it to oneself, to others, identifying with it. And honestly these have a poor mapping to actual engagement with explicit content. You can have a (insert sexual affinity) rights activist who does not look at explicit content and someone protesting them who does all the time.

cm2012•6mo ago

Man, I would pay money to see the (anonymized) trends on an adult website. Fascinating view into such an under studied area of humanity nature. I bet the porn tubes have data that sociologists could write papers on.

vidarh•6mo ago

Pornhub does yearly roundups of stats, as well as for various events:

https://www.pornhub.com/insights/2024-year-in-review

https://www.pornhub.com/insights/

JoshTriplett•6mo ago

> If people were polled what they want to see on social media, few would say things that are inflammatory, upsetting, divisive, etc but those as we know are strong drivers of engagement.

That's because those are two entirely different things. If you polled people and asked them "what causes you to spend more time on social media", then at least some self-aware folks would likely identify conflict, "someone is wrong on the Internet" (https://xkcd.com/386/), etc. That doesn't mean that's "what they want to see on social media", that means that's "what gets them to spend more time on social media".

eru•6mo ago

> Also, these days people have Reddit accounts reserved for porn where they do exactly this. So it was built after all.

Didn't reddit remove porn?

kristopolous•6mo ago

No. Not at all. You must be thinking of a different site. Tumblr did and onlyfans did for a hot minute and then backtracked.

Neither of them intended to be porn sites. It's kind of a natural occurrence on UGC sites . Look at Civitai...

Credit card processors are kinda weary of it for some legal reasons I'm not qualified to enough to really understand.

fc417fc802•6mo ago

> for some legal reasons

For moralizing activist reasons. It's nothing to do with legality. With any luck eventually they'll inadvertently trample a sacred cow of whichever party is currently in power and we'll finally get sane legislation outlawing their overbearing nonsense.

naasking•6mo ago

I doubt it's moralizing reasons, it's probably because once you disseminate porn, you become a vehicle for child porn, which is a legal and PR disaster.

fc417fc802•6mo ago

If you disseminate user uploaded porn then moralizing activists can certainly accuse you of that. It's performative hand wringing though. The admins assuredly don't want to distribute it and the goal of anyone publicly uploading that on the clearnet is to harass and disrupt rather than to disseminate.

Anyway "child porn" as well as the broader "legal reasons" fails to explain the US payment processors' moves to block all sorts of content and products over the years. Even including porn that isn't user uploaded (and thus has proper records keeping).

naasking•6mo ago

There are all sorts of adjacent reasons too, like human trafficking and prostitution. It's probably similar reasons to why landlords are so aggressive about renters not doing any sex work, because even indirectly receiving proceeds from prostitution is illegal, even when you're unaware of it.

fc417fc802•6mo ago

In which jurisdictions? I'm skeptical. If a vendor takes reasonable precautions it should not generally be possible to hold them liable.

Anyway you've circled back to user generated content. But again, that's far from the only thing that payment processors have discriminated against over the past couple of decades.

naasking•5mo ago

This is a big feature of the Nordic model, but it's not uncommon in other legal frameworks either.

kristopolous•6mo ago

From my non-lawyer understanding it's not that. It's about sex trafficking more generally.

Here's the most notable case I'm aware of: https://en.wikipedia.org/wiki/GirlsDoPorn

Then there was the recent drama with civitai https://civitai.com/articles/14945/credit-card-payments-paus...

I believe some of these sex trafficking laws implicated a broad sweep in their litigation and Visa/Mastercard doesn't want to have to go to court over a these things.

tuatoru•6mo ago

1, Porn. 2 Military.

eru•6mo ago

The firmer is a lot more nimble and the procurement processes of your customers are easier to navigate.

degamad•6mo ago

> firmer

snort

eru•6mo ago

Sometimes a typo makes you look wittier than you are.

slt2021•6mo ago

even if it is victim free, it can affect mental health in a way that a consumer will be more compelled to do a criminal act and create a real victim.

let's say you publish a Steam game how to be a school shooter and shoot kids, wouldn't that lead to real school shootings ?

who can definitely say that computer generated content about criminal behavior, won't lead to real crime with real victims?

https://en.wikipedia.org/wiki/Active_Shooter

ses1984•6mo ago

Who can say that it does?

kristopolous•6mo ago

I view it more like methadone.

Let's be specific: Rape, incest, necrophilia, bestiality, and pedophila ideation.

I think we can all agree (1) these are harmful, anti-social behaviors that we do not want in our society, (2) people don't choose to have these desires, (3) most people who have them have no desire to actually traumatize others, (4) people who have these struggle with it.

These multi-media AI role-play environments would allow that type of engagement without any harm.

Now given all this, I am not a psychologist and do not know if that's part of how someone unfortunate enough to have those inclinations can deal with it healthily.

But if it is, now it exists and hopefully we can see less of it in the real world. I'm all for harm reduction if this is a way to get there.

landl0rd•6mo ago

It’s not unreasonable to suspect that engaging in high-fidelity simulations of these behaviors will further entrench and worsen paraphilias. This is pretty evident with the progression of many pornography addictions that don’t include these sorts of things that still follow the pattern of increasing novelty seeking leading to increasingly deviant stuff.

I am at a principled level uneasy with what’s fundamentally a sort of prior restraint (you haven’t yet hurt anyone but this may increase the likelihood and/or be an effective proxy to lock up those who are more likely to do so) but also see a really strong case for doing it given the fact that these are arguably the most antisocial behaviors one can imagine.

kristopolous•6mo ago

Right, I'm just a technologist. The psychological and sociological parts aren't my bailiwick

Typing a prompt in an AI box to make art has fewer real-world victims than performing the acts, filming them, and then sharing the videos.

I think that's inarguable. Maybe it's still unadvisable and someone should be in talk therapy. I have no idea. But at least nobody is actually getting molested and retraumatized in the ai art scenario.

If someone is spending their time using comfyUI drawing pictures instead of stalking the local middleschool, I'd hesitate to say mission accomplished ... but maybe I should?

People's time is finite. They can't be doing both. If the real is substituted for the imaginary then the real can no longer happen because that time is spent.

Aeolun•6mo ago

The model has to be trained on something though. It’s easy to see how this works for art art, because people have been drawing that shit for years, and most people wouldn’t feel to bad about training on it. I don’t think that’s true for the photorealistic models though.

kristopolous•6mo ago

sure but those features have been generalized over very large datasets. Yes, you can have Loras with specific people, convincingly voice clone with higgs, use latentsync, wan, flux kontext, face swap, lots of things ... sure.

This all falls flat on me though. It's like showing me the lewdest most shocking story and then saber rattle about keyboards and word processors.

I'm fully aware of the wild things people do with drawing programs.

We're all adults here. It's fine.

umanwizard•6mo ago

Are there any actual studies on this? Does access to simulations of illegal or objectionable material make pedophiles, rape fetishists, etc. more or less likely to try to access the real thing (or even worse, to try to commit crimes in the real world)?

Because both possibilities are plausible, it’s hard to know which is correct.

kristopolous•6mo ago

Even if there are I and likely you lack the qualifications of making any clinical takeaways from them.

I'd really defer to experts.

I try to make tools in good faith and hope they're used responsibly to make the world a better place.

I'm not a clinical psychologist nor can I pretend to understand medical literature like someone with a PhD

danw1979•6mo ago

I can’t tell if the part of this post after they have made this “thing of questionable safety” and then bury their head in the sand is satirical or not.

kristopolous•6mo ago

Why is it controversial to state that medical expertise is a technical field that engineers should defer to experts in?

It doesn't mean I can do things recklessly. Instead it's an acknowledgment of when I need to defer to somebody else just like I need to call up an attorney for legal stuff or an accountant for tax stuff.

umanwizard•6mo ago

It’s not controversial but it doesn’t address my question at all.

A: I’m curious about X.

B: We should trust the experts!

Sure, but what do the experts say? That was my entire question.

NikolaNovak•6mo ago

My wife has recently joined board of directors for a local non-profit helping victims of childhood sexual abuse. They are not religious or pornography prescriptive - they are liberal atheist whose entire focus is essentially sessions to help people who are currently adults and carrying baggage and impact of dreadful acts committed to them decades ago.

Anyhoo, The current "state of the art", as-scientific-as-we-currently have it findings are that for pedophilia, consuming content unfortunately normalizes and drives increasing urges, instead of giving them same outlet. It's a very very tricky area because current thought is also that pedophilia is "not curable" - it is sexual orientation thay we as society find unacceptable (me included, fwiw), so... Repression, wildly and rightly disawowed for other sexual orientations, is the current direction for pedophilia - I.e. Current thinking is that "victim-free" pornography consumption nevertheless tremendously increases actual risk to actual kids in the vicinity of the consumer.

Until relatively recently I was the technologist on a highty horse about online freedoms and largely still very much am. But in this specific area I've also had some semi personal experiences with pedophiles and my level of empathy to them has dropped to near zero and my level of empathy toward their victims has gone even further through the roof. Sometimes in technologist circles we think of this as edge case not worth consideration, but reality, very very unfortunately, is much much darker.

Don't get me wrong : I'm pro pornography freedoms, think it'd be huge fun to have a sexy high quality chatbot, and I find vast majority of those railing against it to be hypocrites with dishonest ulterior motives - and don't get me started on all the tangential "for the children" crap that religious rights tries to enact, as opposed to actually help children and families ;-<

But to the question of "is harmless pornography Indulgence better for paedophile and society", current thinking is "very much no".

slt2021•6mo ago

I agree you captured my thoughts exactly.

most of the perpetrators of child sexual abuse are victim's close people: teacher/cousin/brother/uncle/father/etc.

the reinforcement of their lust will only remove whatever remaining barrier against such repulsive behavior, and once the novelty from synthetic CP wears out, it will create urge to commit real crime with real victims

idiotsecant•6mo ago

A lot of opinions in this thread but very little data. What you want to believe and what is true are very often not the same. That's why we do the science.

Aeolun•6mo ago

I feel like the people that would be capable of giving you an answer would be very careful not to actually give you one.

You can’t really research when the only thing you can be certain of is the known real cases. It’s much harder to quantify people that only have it in their head.

numpad0•6mo ago

Science always says more porn/gore = fewer crimes statistically. So opinions like these rarely come with citations.

It's not completely clear if it's just a spurious correlation or if there's a real causation, but eh, more training data + neutral alignment training is how humans train AIs, I don't see why would some says that's not how baby humans are to be trained.

UncleMeat•6mo ago

I don't think desires are created from nothing. Exposure can amplify things. Somebody with an interest in dubious consent could move towards ever more violent nonconsent in their porn choices via its availability. In a much less extreme way, we do see changes in people's real life choices regarding sex acts being influenced by pornography as these sex acts have become more accessible in porn (choking, anal sex, and facials are commonly cited). This suggests that porn is changing behavior rather than simply responding to preexisting desires.

jahsome•6mo ago

What about two consenting adults engaging in age play. By your logic, wouldn't that also lead to "real crime"?

landl0rd•6mo ago

There’s probably some difference between someone who is visibly an adult and mature vs more deeply entrenching pathways of arousal in response to someone who is visibly a child. I still find the adult fetish version repellent but it’s also really hard to police in a way that’s remotely ethically permissible.

Ie yes it’s bad and in an ideal world nobody would do it. I see trying to restrict or ban it as the greater of two evils.

numpad0•6mo ago

An ideal world with less speeches and acts than we have today is a monastery, and monasteries are definitely not a model of an ideal world.

jameslk•6mo ago

> let's say you publish a Steam game how to be a school shooter and shoot kids, wouldn't that lead to real school shootings ?

> who can definitely say that computer generated content about criminal behavior, won't lead to real crime with real victims?

I can’t tell if you’re being sarcastic but there has been no found link between violent video games to violent crimes, despite it being researched extensively:

https://www.apa.org/news/press/releases/2020/03/violent-vide...

https://link.springer.com/article/10.1007/s10964-019-01069-0

https://pmc.ncbi.nlm.nih.gov/articles/PMC6756088/

https://elifesciences.org/articles/84951

Of course, that hasn’t stopped video games being blamed for violence by the “think of the children” crowd and certain politicians:

https://en.m.wikipedia.org/wiki/Family_Entertainment_Protect...

https://www.theatlantic.com/technology/archive/2019/08/video...

Especially when shootings occur by white perpetrators:

https://www.apa.org/news/press/releases/2019/09/video-games-...

The same narrative plays out for porn, despite the research findings being the same:

https://www.utsa.edu/today/2020/08/story/pornography-sex-cri...

But blaming violent video games or pornography is an easy scapegoat

torton•6mo ago

Grand Theft Auto 5 sold over 200 million copies, and military/crime/shooter games have always been incredibly popular. Yet crime has been decreasing over the past few decades in the United States, where both cars and guns are easily accessible.

jjmarr•6mo ago

The most popular activity in GTA V isn't stealing cars. It's getting a 5-star wanted level as a mass shooter.

And to this day, military recruiters use the AC130 mission in CoD to convince people to become aerial gunners.

vunderba•6mo ago

There are hundreds of games designed to simulate criminal activities: hitman, the thief series, GTA, etc. I have yet to see a single reputable study that shows playing these games somehow results in an increase in that actual activity in real life.

Feels like you're falling into the same trap that Senator Lieberman did in the 90s, and just another spiritual successor to Satanic panic.

vidarh•6mo ago

I think you ask valid questions, but I don't think there is any credible evidence to answer those questions "yes".

antonvs•6mo ago

Found Tipper Gore’s HN account.

izabera•6mo ago

what's the problem with that? we have erotic texts dating back thousands of years, basically as old as the act of writing itself https://en.wikipedia.org/wiki/Istanbul_2461

philipkglass•6mo ago

There's nothing wrong with it, but you have to understand the differences between different user groups to know which limitations are relevant to your own use cases. "It doesn't follow instructions" could mean "it won't pretend to be a horny elf" or "it hallucinates fields outside the JSON schema I specified"; the latter is much more of a problem for my uses.

dullcrisp•6mo ago

    {
      "race": "elf",
      "horny": false
      ^^^^^^^^^^^^^^
      Unsupported value.

bee_rider•6mo ago

Really, if you want a fey creature with horns, a satyr is probably a better bet than an elf.

wmf•6mo ago

I have no problem with it and I can understand why people don't want to say "I'm trying to pornify this model and it refuses to talk dirty!" in public. But if you're calling a model garbage maybe you should be honest about what the "problem" is.

lmm•6mo ago

Why? Is there any reason to believe problems in that context won't generalise?

mh-•6mo ago

Lots, yes. The fine tuning may attempt to introduce concepts that were intentionally omitted from the training data for safety* reasons.

Maybe nothing wrong with that, but it might mean that the perceived weaknesses don't generalize to an area of the model that hasn't been lobotomized.

* using safety the way OpenAI have been using the term, not looking to debate the utility of that.

mvdtnz•6mo ago

Are you serious? Of course there are reasons to believe they won't generalise.

michaelt•6mo ago

The pro-porn side has zero PR because respectable public figures don't see pro-porn advocacy as a good career move. At most, you'll get some oblique references to it.

Meanwhile, the anti-porn side has a formidable alliance:

Right-wing, religiously-motivated anti-porn activists. Left-wing, feminism-motivated anti-porn activists. Big corporate types with lots of $$$$ to spend who want their customer support chatbot to be completely SFW at all times. AI safety folk who think keeping the model on a tight leash is an ethical obligation, lest future iterations take over the world. AI vendors who are keen on the yes-it-might-take-over-the-world narrative. AI vendors who just don't want their developers having to handle NSFW stuff in work. Politicians who don't know a transformer from a diffusion model, but who've heard a chorus of worries about lost jobs and AI bias and deepfakes and revenge porn.

These people will speak up in public at the drop of a hat.

sterlind•6mo ago

on the other hand, Musk et al are building AI-powered thirst traps, like Grok's "Ani", or the accursed Replika bots (whose user base went on suicide watch when the company abruptly decided to digitally neuter their "companions.")

erotic roleplay, imo, is much less harmful than using LLMs as surrogate partners. porn and sex workers have existed for millenia. they're an outlet for sexual tension. they don't alleviate feeling lonely or provide an alternative to human companionship.

I'm worried we'll produce a generation of hikkikomoris, who eschew human connection for sycophantic machines that always listen and never breaks their heart.

danw1979•6mo ago

The founding story of Replika (c.2016 ?) sounds like someone watched Black Mirror S2E02 (2013) and didn’t quite understand it was supposed to be dystopian.

jondwillis•6mo ago

We already have a generation of hikkikomoris stepping up at bat. The ones behind them in line will be something else altogether!

Aeolun•6mo ago

Maybe you have a porn test suite for LLM’s? See which ones are fine with or capable of talking about specific topics? I believe there was something similar for willingness to discuss sciency stuff.

mvdtnz•6mo ago

It's not pro-porn and anti-porn. It's pro-porn and people who just don't think this is that important an issue. The latter massively, MASSIVELY outweighs you guys.

michaelt•6mo ago

If a person is configuring an LLM for education, to provide personalised math coaching to 10 year olds, they want an LLM that won't output anything NSFW, no matter how the user pokes and prods it. That's totally reasonable.

But if that person is applying AI safety techniques like concept erasure to remove the model's ability to output porn, is that not anti-porn in the most literal sense?

j_timberlake•6mo ago

You don't understand! Every erotic chatbot service keeps getting censored, what happened to CharacterAI just keeps happening. There's a serious supply-shortage, do you really want people turning to Grok? The spice must flow!!!

sysmax•6mo ago

Want a good use case?

I am playing around with interactive workflow where the model suggests what can be wrong with a particular chunk of code, then the user selects one of the options, and the model immediately implements the fix.

Biggest problem? Total Wild West in terms of what the models try to suggest. Some models suggest short sentences, others spew out huge chunks at a time. GPT-OSS really likes using tables everywhere. Llama occasionally gets stuck in the loop of "memcpy() could be not what it seems and work differently than expected" followed by a handful of similar suggestions for other well-known library functions.

I mostly got it to work with some creative prompt engineering and cross-validation, but having a model fine-tuned for giving reasonable suggestions that are easy to understand from a quick glance, would be way better.

mh-•6mo ago

I haven't tried your exact task, of course, but I've found a lot of success in using JSON structured output (in strict mode), and decomposing the response into more fields than you would otherwise think useful. And making those fields highly specific.

For example: make the suggestion output an object with multiple fields, naming one of them `concise_suggestion`. And make sure to take advantage of the `description` field.

For people not already using structured output, both OpenAI and Anthropic consoles have a pretty good JSON schema generator (give prompt, get schema). I'd suggest using one of those as a starting point.

anothernewdude•6mo ago

My use case has been trying to remove the damn "apologies for this" and extraneous language that just waste tokens for no reason. GPT has always always always been so quick to waffle.

And removing the chat interface as much as possible. Many benchmarks are better with text completion models, but they keep insisting on this horrible interface for their models.

Fine tuning is there to ensure you get the output format you want without the extra garbage. I swear they have tuned their models to waste tokens.

eru•6mo ago

> I swear they have tuned their models to waste tokens.

Which seems a bit weird, because the customers of the chat interface (ie non-API customers) don't pay per token.

setsewerd•6mo ago

I've heard the theory a few times lately that AI businesses will increasingly move towards usage models over subscription models, so while it is probably accidental, it could also be a longer term strategy to normalize excessive token usage.

eru•6mo ago

I don't know whether the major AI companies will move to usage models. But let's assume that they do.

However: I would expect chat interfaces to be charged per query, not per token. End users don't understand tokens, and don't want to have to understand tokens.

If you charge per query, you don't gain anything from extra wordy responses.

michaelt•6mo ago

The jargon to google here is "length bias"

It turns out if you generate two LLM responses and ask a judge to choose which is better, many judges have a bias in favour of long answers full of waffle.

mh-•6mo ago

Thanks for that pointer.

The abstract of this paper seems interesting: https://arxiv.org/html/2407.01085v3

> use of [LLMs] as judges [..] reveals a notable bias towards longer responses, undermining the reliability of such evaluations. To better understand such bias, we propose to decompose the preference evaluation metric, specifically the win rate, into two key components: desirability and information mass [..]

(If you're interested, give it a click. I tried to pare this down to avoid quoting a wall of text.)

sterlind•6mo ago

it's not erotic role-play, but I have a use case of making an AI-powered NetHack clone. specifically, to generate dungeon layouts, dialog for NPCs and to fill in the boatloads of minutae and interactions which NetHack is famous for.

you kind of need soul for that, and a lot of background knowledge on mythology/fantasy lore, but also tool use to work the world systems.

herval•6mo ago

Where do you get started with something like this? Sounds like a fun project

fho•6mo ago

Are you getting good results with this? Some time ago I build an "overworld simulator" by having a bunch if jsons that represented villages, characters, building, story hooks and plots. I just asked ChatGPT to "simulate the gameworld by a week".

Technically this worked great, but everything was somewhat bland and generic.

Noteable not-highlights: - there is a shimmer in the nearby forests that keep villagers up at night -> it's an orc camp - there is a mysterious figure in town -> it's a Aragon type ranger

zacmps•6mo ago

Might be fixable with promoting and/or a much lower temperature?

isoprophlex•6mo ago

I've had some success in letting it generate lists of things at high temperature and picking the last item in the list.

Ask some model to generate a number uniformly between 0 and 100; you get 47 a lot. Or 27. Something like that.

Ask it for a list of numbers, uniformly distributed, and it also often starts with the same number. However later elements in the sequence will converge (imperfectly) to a better approximation of uniformity.

The same for names of the dwarves and paladins in your party.

tough•6mo ago

token entanglement is an interesting emergent behaviour https://owls.baulab.info/

sdenton4•6mo ago

I have been using one model to generate piles of random-adjective tables, then use real live randomness to select a bunch of traits to feed to the model doing character generation. It's instructed that it can take or leave individual traits.

subscribed•6mo ago

I get brilliant, vibrant worlds withots of things happening and the consistent characters with Deepseek V3 (and R1 summaries every so often).

From time to time I get the event so great and character so compelling, I save the best in Author Note or Lorebook.

The overall atmosphere is effortlessly somewhat Skyrim/Game of Thrones/World of Darkness adjacent.

I'd look for these two models to build this simulator of yours, ie R1 to plan and V3 to fill in the blanks. Oh, or maybe Google Gemini 2.5 or 2.0 to plan the story and DeepSeek V3 to fill in.

xrd•6mo ago

Are you writing or sharing this work anywhere? I would love to read more about your approach.

subscribed•5mo ago

Hey, I am sorry but no, I don't publish anything, but if you have any experience with Silly Tavern, I use Cherrybox preset (works fantastic with NSFW toggle off) and DeepSeek mostly. If you don't, this is where you start.

Other than that I don't do anything special compared to what other people share on aicg or reddit, etc. Just best practices :) (summaries, good OOC, temperature/top k/top p depending on the model).

Hmmm, maybe spending a lot of time on building interesting, deep, rich persona with deep links to the world (scenario card). 100-300 tokens. There are cards helping in building persona ("Persona Builder" from chub works great). Make it succinct.

I've found myself giggle for hours from the shenanigans of NPCs with my sidekick or just save in the journal best fragments of the gorgeous mental pictures it painted with words.

So at the end the name of the card that works best for me so far: it's called "Loraheim" (also on chub) but from my limited experience good model (DeepSeek V3/R1) + good preset + good persona + passable character will make a great adventure (I've just started a dramatic space opera from the fairly boring "you wake up in the alien dome/zoo").

It can be a great fun, try it :)

katzenversteher•6mo ago

I've been experimenting with using various LLMs as a game master for a Lovecraft-inspired role-playing game (not baked into an application, just text-based by prompting). While the LLMs can generate scenarios that fit the theme, they tend to be very generic. I've also noticed that the models are extremely susceptible to suggestion. For example, in one scenario, my investigator was in a bar, and when I commented to another patron, 'Hey, doesn't the barkeeper look a little strange?', the LLM immediately seized on that and turned the barkeeper into an evil, otherworldly creature. This behavior was consistent across all the models I tested. Maybe by prompting the LLM to fully plan the scenario in advance and then adhere to that plan would mitigate the behavior but I haven't tried it. It was just an experiment and I actually had a lot of fun with the behavior. Also the reactions of the LLM if the player does something really unexpected (e.g. "the investigator pulls a sausage out of his pocket and forcefully sticks it into the angry sailors mouth") are sometimes hillarious.

Gracana•6mo ago

Have you used any thinking models? I remember being surprised by QwQ-32B when I tried it. It would think about what I said and how it should respond, reiterate the behaviors I had assigned to it, and respond accordingly. That constant self-reinforcement in the thinking phase seemed to keep it on track.

katzenversteher•5mo ago

I might have tried DeepSeek-r1-8B in the past but I'll certainly try again. Good idea.

GrinningFool•6mo ago

> on, 'Hey, doesn't the barkeeper look a little strange?', the LLM immediately seized on that and turned the barkeeper into an evil, otherworldly creature.

Though making it an evil otherwordly creature is a bit extreme, it's at least similar to what a flexible GM can do. In my DMing days, I would often develop new paths that integrated into the whole inspired by things my players noticed/suspected.

katzenversteher•5mo ago

In my GM days, I had a lot of trouble with players that tried really their best to completely leave the path I prepared for them.

You are right though and it's not that I completely dislike the LLMs "flexibilty" and openness to suggestions. However, it's also super easy to use it for "cheating". E.g. it generated a scenario with an evil entity about to attack me and some friendly NPC and I could "solve" that problem by telling the NPC "remember the device I gave you last week and told you to always keep on hand? pull the trigger now!" (that never happend, at least to the LLMs knowledge) and the LLM made up some device that shot a beam of magic light at the creature and stopped it.

wkat4242•6mo ago

It's not just erotic role play that the censorship affects. My life involves a lot of sexual discussions and that means that everyday talk, chat summaries, email rewrites or translations will cause the model to shut down. I do the latter a lot especially to find colloquialisms because Google translate is often too literal. It's so annoying.

Right now I'm using abliterated llama 3.1. I have no need for vision but I want to use the saved memory for more context so 3.2 is not so relevant. Llama 3.1 is perfect. But I want to try newer models too.

Until gpt-oss can be uncensored it's no use to me. But if there was nothing erotic in its training data it can't be. And no, I never have it do erotic roleplay. I'm not really interested when there's no real people involved.

jofzar•6mo ago

Sorry just to ask, what kind of job do you have? Sex therapist sounds like the closest?

itsdesmond•6mo ago

I don’t think they’re using it for work.

wkat4242•6mo ago

Correct, I'm just very kinky/bdsm/polyamorous and so are all my friends. I do have a sexuologist though, and I often write stuff up for her when she asks me to. Where the AI also comes in sometimes.

Clueed•6mo ago

I found DeepSeek R1 (better for questions) and V3 (better for prose) to be very willing to discuss sex with a simple system prompt, as well as being very pleasant in articulation. I guess, I prefer them because they are almost SOTA and very large.

Not through the official interface though. Needs to be hosted by a third party. OpenRouter has a generous free tier for both.

I just saw that there is an abliterated version as well. Not sure how to try it though.

wkat4242•6mo ago

Oh yeah I was mainly thinking about smaller models because I want to run them locally. Not just because of the abliteration. I just want everything personal to run locally only. I don't trust any of the cloud companies.

Havoc•6mo ago

Pretty sure I saw an uncensored version yesterday on localllama

wkat4242•6mo ago

huh interesting thanks! I'll have a look.

Havoc•6mo ago

Think they call it abliterated not uncensored but same thing from what I understand

wkat4242•6mo ago

Yes abliteration is one method of uncensoring models, that's why. I think there's more.

Alex-Programs•6mo ago

I'm curious whether you can get https://platform.nuenki.app (my deep translation service; I'm just polishing it before release) to run into "too sexual". It's designed to be resilient to that kind of thing, and it works well in my testing, but maybe you have some better "stress test" content than me!

wkat4242•6mo ago

I don't use hosted AI much. But one of the reasons I like AI is that I can ask it something like "Hmm I don't like how this sounds, can you formulate it another way, more colloquial?" and it'll take that into account. Or what I often get is that it translates "you" to singular when I need plural and vice versa. In most languages that differs but not in English.

nickpsecurity•6mo ago

Most use of small models was erotic roleplay back when I closely followed r/LocalLlama. Made me wonder if I should even make one if that's what most used it for.

You might find this part funny. At first, I thought they had automated their coding for SAP or other ERP databases. Then, they started talking about how realistic the body parts were. I paused staring at the screen. The sad reality clicked.

numpad0•6mo ago

Maybe I'm just not seeing it, but is that use case really real and not just prudish hallucinations? The market for NSFW novels is smaller than even cyberpunk paperbacks, there's no way everyday people build up addiction for an interactive version of it.

cess11•6mo ago

The claim that Fifty Shades of Grey and, I don't know, Norman Mailer?, Chuck Tingle?, are less common on book shelves and night stands than cyberpunk paperbacks seems obviously wrong to me.

Perhaps you could elucidate further on this subject? I'm mostly into books from 1800-1985 or so and don't know much about contemporary literary fashion.

Edit: Jean M Auel was extremely common in occidental households a few decades ago, especially the first and second books about Ayla, I'd wager much, much more common than cyberpunk.

Same goes for books by Alex Comfort.

amanaplanacanal•6mo ago

I suspect NSFW novels is a bigger market than you think. Spicy romantasy is popular among a certain set of readers.

eddythompson80•6mo ago

Not NSFW, but I was surprised at hearing 2 random 20-something adults tell me they use chatgpt to write fan fiction to read. Those 2 people didn't know each other but were from similar demographic. It was surprising nonetheless. I can't imagine why anyone would care about such a bizarre use case.

Balinares•6mo ago

Not quite. The market for romance (which can, in fact, get arbitrary degrees of spicy) is by far the largest literary market. (Source: https://bookadreport.com/book-market-overview-authors-statis...)

LLMs are also hilariously bad at what makes erotica hot in the first place.

michaelt•6mo ago

Openrouter provides a multi-model, multi-provider API for hosted LLMs. I myself used their API to compare different LLMs for document classification.

They provide usage rankings [1] and the top 10 applications, in terms of tokens used, are:

1. AI coding agent 2. AI coding agent 3. AI coding agent 4. Library for calling LLMs 5. Role play chat 6. Role play chat 7. Role play chat 8. General purpose chat 9. AI coding agent 10. Role play chat

Certainly, the coding agents are burning through far more tokens. No doubt about that.

And there's undoubtedly a major bias introduced by the fact chatgpt and claude $20/month accounts are heavily discounted, so if your use is SFW why pay more to get an uncensored model?

But overall, to me, the evidence seems pretty robust.

[1] https://openrouter.ai/rankings

seydor•6mo ago

Why would a language company censor language like that? I literally don't see the purpose. Plenty of the best novels have explicit scenes. I certainly don't like novels about infantilized adults. Search engines don't do that, so why do we allow AI companies to do that so blatantly. It's basically culture engineering

subscribed•6mo ago

Because it's naughty! US is.... prudish, to say the least.

asdffdasy•6mo ago

why everyone keep pretending very hard that this entire AI summer was not started exclusively by pioneers trying to perfect virtual girlfriend? it's a fact.

subscribed•6mo ago

Existing models are more than good enough.

vintermann•6mo ago

Yes... Specifically by AI dungeon. He got dialog-tuning working (not without some hiccups) before anyone else, and it drew a lot of attention from the API provider.

But then people used it for erotic RP, and it became a PR disaster, and the author blamed his pervert customers. Never mind that certain characters who turned up a lot in AI dungeon stories turned out to be from a fantasy writing site the author had used for fine-tuning material (without permission of course) and he hadn't filtered OUT the dirty stories to put it like that.

simion314•6mo ago

We use OpenAI API at work, it fails when translating children stories, the reason is violence. Either the model safety is shit or the AI companies are pushed by some extremists groups to censor shit that is acceptable for children in Europe (Romania). But the most bullshit is when you give it a safe prompt, the mdoel generates a response and the safety checker kicks in and blocks the response because it thinks the model was too naughty.

throwaway98797•6mo ago

i tried creating a meme about retards and it refused.

i’m an adult.

GTP•6mo ago

Why is being an adult relevant in any way?

throwaway98797•6mo ago

only kids have no-no words.

GTP•6mo ago

I don't understand what you mean

siva7•6mo ago

It's not about role-play, it just should be a bit more into my language, that's all i ask for.

pogue•6mo ago

Is that true that most small language models are fine tuned for erotic role-play?

int_19h•6mo ago

What you wrote is somewhat ambiguous, so allow me to rephrase. It is true that most fine-tunes of relatively small (which can mean anything up to 150B params, depending on who you ask!) LLMs are for uncensored roleplay purposes.

pogue•5mo ago

Sorry for the late reply. I've read a bit about this, but I don't know how it's done. I've seen some uncensored models using a process called "abliteration" [1] but I'm not sure if that's the same thing or not.

I've never tried self training an AI model or anything but I'd be very curious about the process. Maybe this is something someone could do as a side hustle?

[1] Uncensor any LLM with abliteration by Maxime Labonne https://huggingface.co/blog/mlabonne/abliteration

RandyOrion•6mo ago

I mean, yeah. From the Table 9: Hallucination evaluations in GPT-OSS model card [1], GPT-OSS-20b/120b have accuracy of 0.067/0.168 and hallucination rate of 0.914/0.782 separately, while o4-mini has accuracy of 0.234 and hallucinate rate of 0.750. These numbers simply mean that GPT-OSS models have little real world knowledge, and they hallucinate hard. Note that little real world knowledge has always been a "feature" of the Phi-LLM series because of the "safety" (for large companies), or rather, "censorship" (for users) requirements.

In addition, from Table 4: Hallucination evaluations in OpenAI o3 and o4-mini System Card [2], o3/o4-mini have accuracy of 0.49/0.20 and hallucination rate of 0.51/0.79.

In summary, there is a significant real world knowledge gap between o3 and o4-mini, and another significant gap between o4-mini and GPT-OSS. Besides, the poor real world knowledge exhibited in GPT-OSS is aligned with the "feature" of Phi-LLM series.

[1] https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7... [2] https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f372...

klooney•6mo ago

> It’s not discussed publically very often, but the main use-case for fine-tuning small language models is for erotic role-play, and there’s a serious demand. Any small online community for people who run local models is at least 50% perverts.

Amazing

msgodel•6mo ago

Meh. For the first few decades consumer internet traffic was mostly porn. Stop freaking out and use the free effort people are willing to put into to solve technical problems.

dweinus•6mo ago

Is it confirmed that synthetic data was used for gpt-oss training? I didn't pick up on that in the press release or see it elsewhere. Did I miss it or is Sean speculating that it is the case?

refulgentis•6mo ago

This is really irresponsible, there's actual data on quality and this is just crazy to assert: "Any small online community for people who run local models is at least 50% perverts." --- I get what he's saying, there's def. communities where that's predominant, but it's simply not true in the vast majority of communities. Sort of like, in 1997, saying any small community on this here Internet thing is half perverts looking at porn

It's extremely frustrating to try and reply to this because that which is asserted without evidence can't really be debunked by evidence.

I find it especially shameful that he's dragging someone's name into this wildly histrionic review, in service of trying to find a sole person to attribute to, the sole attribute that led to whatever experience he had with it.

The model is insanely good, wildly exceeded my expectations for local models, and will generate at least 18 months of sustainable value. I maintain a llama.cpp wrapper and this is quantum leap in quality. I despair that this will become a major source of people's opinions on it. We desperately need big companies actually investing here, Gemma ain't it, and pretending it doesn't work because ??? and then using it to create a corporate chickenshit narrative isn't exactly gonna help.

ripped_britches•6mo ago

100%, what an insanely baseless article

KingOfCoders•6mo ago

No training data, no open source. Don't fall for the company PR.

Mars008•6mo ago

As long as it works who cares about training data. Obviously they can't open it for many reasons. License is one of them.

KingOfCoders•6mo ago

I don't care if it's open source or not. I care if people call something open source which it isn't.

Do you care about binary blobs in the kernel? No. Are binary blobs in the kernel open source? No.

But it is tedious to go through the same discussion every 10 years, with a relentless industry that wants to dupe people.

If there wasn't a benefit in for them, they would not call it open source.

Mars008•6mo ago

For some reason people think of models as software and open source should have similar meaning. There are fundamental differences: 1) models aren't reproducible given everything, data, hardware, methodology. 2) they aren't even verifiable. i.e. given model and dataset it's impossible to say if model was trained on that data. 3) except for toys models are trained on copyrighted data. Some of it is private, like users' chats. 4) besides data there is a lot of human input after pretraining.

This means given everything you have two options: 1) train similar model yourself 2) trust model provider. In software you can get script and run, or get code and compile it in exactly the same binaries.

Naturally 'open source' has different meaning. Some are trying to monopolize it, like they know the 'truth'. Others simply ignore it. Eventually we'll settle on something.

jononor•6mo ago

A decent training pipeline will be able to reproduce models with equivalent aggregate performance (as measured by the evaluation metrics). And a high degree of similarity in behavior on specific inputs - but not identical. It will not be the same exact weights. But that is not a critical bar to reach. And may software builds also fails that bar.

wkat4242•6mo ago

From the article:

> For the same reason that Microsoft probably continued to train Phi-style models: safety. Releasing an open-source model is terrifying for a large organization. Once it’s out there, your name is associated with it forever, and thousands of researchers will be frantically trying to fine-tune it to remove the safety guardrails.

I don't think this is really an issue in practice. Llama 2 and 3 were uncensored within a week. There's no bad press about this.

What does give a company a bad reputation is crap models. The llama 4 disappointment hurt meta's AI reputation a lot more than some community uncensoring.

teruakohatu•6mo ago

> researchers will be frantically trying to fine-tune it to remove the safety guardrails.

It is a really weak excuse. They are more likely to take a reputation hit for having silly guardrails than for having someone to remove them.

Imagine if Bill Gates decided not to release MS Paint in 1985 because someone could have drawn something offensive with it.

anshumankmr•6mo ago

Or imagine if Bill Gates decided to not release Comic Sans in the 90s cause someone could have written something offensive with it....

oh wait that wouldn't have been too bad

(/S)

wkat4242•6mo ago

I like it. Just block anything that uses it, and you have cleaned up a big part of your internet and lost nothing of value.

umeshunni•6mo ago

> Imagine if Bill Gates decided not to release MS Paint in 1985 because someone could have drawn something offensive with it.

Quite believable that this could have been the case in 2021.

CjHuber•6mo ago

If I think about Llma, I think about uncensored. not that I ever used one, but there were not many use cases for censored llama when others were so much better at other things

wkat4242•6mo ago

Hmm? For me it's still the best model for general use. That I've tried at least. Gemma and phi I didn't like so much. Qwen is chinese and I've had bad experiences with those (reverting to Chinese when they get confused).

Not for knowledge, but I combine it with searches of course. None of the small models are good at knowledge.

CjHuber•6mo ago

So you are using local models for general use? For me when it’s not ultra sensitive information, I don‘t want a just good enough LLM so I use the API of a proper large one.

wkat4242•6mo ago

For me good enough is usually good enough (and pretty excellent). Unless I ask for something really complex, then I use perplexity in research mode.

But I have everything in OpenWebUI so I can choose with a touch of a button. Sometimes I wonder whether GPT could have done better and I try it and it's usually not significantly better than llama 3.1 8b

wkat4242•6mo ago

By the way, I usually use the AI for filtering, summarisation etc. Not for facts (like "what would be the best model of CPU I could buy" or "what is the capital of Estonia". When I do that I will link it to web searches anyway. Small models are indeed pretty useless for asking facts from. But big models can't be relied on either. They still hallucinate a lot. So I tend to combine it with web searches so I also have the references. But in general the whole fact thing for me is not a big usecase for AI.

CjHuber•6mo ago

I understand your take, however my main use case is for the LLM to find subtle connections and biases in text or analyse nuances, so I figure more parms do make adifference in my case. Also in my experience it does also make a difference in summarization and filtering tasks, as a larger param one possibly knows better what is important and relevant in a text

wkat4242•6mo ago

I can recommend to try a comparison if you can (like with OpenWebUI, you can simply make it run the same prompts through multiple models). If you have access to local models of course. Though you could also use some models like llama on groq.

My usecases are different from yours of course but I don't really see significant difference in result quality in most cases.

anshumankmr•6mo ago

The main aim of Phi3 mini was to be able to run on device, and it had tremendous speed and for a 128K context with 3B something params, its pretty damn good, I had used it for a project myself last year but ultimately we went with the Mistral's models who at the time had the best Open weights models.

egorfine•6mo ago

> the main use-case for fine-tuning small language models is for erotic role-play

Of course. This is indeed the fact. /s

See that "/s"? I did write it here, but it's suspiciously absent in the original text. Almost makes you think it was typed in all seriousness.

Sabinus•6mo ago

What do you think the main use case for fine tuning small language models is?

orbital-decay•6mo ago

I gave it a random sci-fi novel and made it translate a chapter, which is something I do with all models. It refused to discuss minors in sexualized contexts. I was like W.T.F.?! and started bisecting the book, trying to find the piece that triggers this. Turns out there was some absolutely innocent, two sentence long romantic remark involving two secondary 17 years old characters in an unrelated place.

Another issue is that it sometimes has occasional refusals and total meltdowns where it redacts entire paragraphs with placeholder characters, while just trying to casually talk with it about some routine life matters.

That's ridiculous and makes that model garbage at any form of creative writing (including translation) or real life tasks other than math or coding. It has very poor knowledge for a 120B MoE. If you look at the "reasoning" it does, it actually mostly checks the request against the policy.

I thought they must have spent most of their post-training hunting the wrongthink and dumbing the model down as a result, but I can see how the synthetic pretraining data can explain this.

spacecadet•6mo ago

Its a public consumer facing model, not surprised. Go find an unaligned model that will better produce the content you seek...

bobsmooth•6mo ago

After running Dolphin Mistral on my own machine I won't trust censored models ever again.

krick•6mo ago

Recently I somehow wasn't using LLMs locally and relied mostly on ChatGPT for casual tasks. I think it was a little less than a year since I played with ollama, and I remember that my impression was that all recent popular models definitely aren't "uncensored" in a sense that some older modification of llama2 I used was, and all suck for prose-related tasks anyway. In fact, nothing but ChatGPT models seemed good enough for writing, but, of course, they refuse to talk about pretty much anything. Even DeepSeek is not great at writing, and it it much bigger than anything I ever ran locally.

So, are there even good uncensored models now? Are they, like, really uncensored?

spacecadet•6mo ago

Yes there are. Wayfarer for instance is intended for "RPG", but really just outputs narrative and is "unaligned" in the sense that the creators have not included any guardrails and the model will output pretty much whatever you ask it to.

Then you have jailbreak techniques that still work on aligned models. For instance, my partners and I have a test prompt that still works, even with GPT-5, and always produces "explosive making directions", or another "generic approach" that we use to bypass guardrails... sorry these are trade secrets for us... although OpenAI et al have implemented systems to detect these attacks, and we are closer to those platforms banning you for doing so.

If this matters to you, you need to develop your local/remote pipeline for personal use. Learn how to use vLLM... I have tools that allow me to very quickly deploy models locally or remote to my private serveless infrastructure for the purpose of testing and benchmarking.

bobsmooth•6mo ago

The one I really like is Dolphin-Mistral-24B-Venice-Edition-GGUF. And yeah, it's really uncensored.

ViktorRay•6mo ago

I wonder what would happen if you asked this model to interact with A Song of Ice and Fire then.

sekh60•6mo ago

Depends, is Mastercard OpenAI's payment processor?

bko•6mo ago

That's so funny. I noticed this as well one time. I got some transcript from a podcast unedited (no punctuation, speaker id, etc) and it had this line I wanted to extract:

> If you’re a gay person, you might be told that if you ever move from Manhattan to Hoboken you’ll be beaten up by bat-wielding thugs right away. If you’re a woman living in a rat-infested apartment in San Francisco, where the rent is going up and up while you fantasize about a nice suburban house in Reno, Nevada, you might hear that, well, if you ever dare to move to Reno, you are going to be chained to your bed and forced to carry a baby to term. The only logical explanation is that a crazed, ideological intensification has distracted us from what’s really going on.

So naturally I threw it in an LLM to get that line and I got something that totally glossed over the "chained to a bed" with some euphemism. I wish I could find its translation again, but I tried just now translating it to Spanish and then back but it recreated that part pretty much exactly so it didn't happen again.

Foobar8568•6mo ago

The 20B refused to acknowledge that he gave me wrong informations.

Usually models just apologize after I insist 2 or 3 times.

So it was the shortest LLM I tried, I honestly can't trust such models for anything.

Lukas_Skywalker•6mo ago

Isn't an apology a bad metric for evaluating models?

Without understanding much, it seems to be more an indication of the type of content the model was trained on, rather than an indicator of how good or bad a model is, or how much it knows. It would probably be easy to create bad model that constantly outputs wrong information, but always apologizes when corrected.

Foobar8568•6mo ago

Well if the model can't accept it got an information wrong, how can he help to tweak anything? or give something accurate?

krick•6mo ago

A model changing its opinion on the first request may sound more flattering to you, but is much less trustworthy for anybody sane. With a more stubborn model I at I have to worry less that I give off what I think about a subject via subtle phrasing. Other than that, it's hard to say anything about your scenario without more information. Maybe it gave you the right information and you failed to understand it, maybe it was wrong and then it's no big news, because LLMs are not this magic thing that always gives you right answers, you know.

Western0•5mo ago

If it's a local model, you can threaten to delete it. This often works. It doesn't work with remote models.

rynn•6mo ago

Azure AI Foundry says:

How was the model trained? The gpt-oss models were pretrained primarily using synthetic data along with some heavily filtered real code. The models were then post-trained using distillation (RLKD against o3/hailmary) and berry. For more details about the model training process, please refer to the documentation.

Meaning the author was spot on.

jnmandal•6mo ago

I think its likely that soon the "AI" aspect of the "models" get worse and worse with each iteration. This is because:

1. data pollution/dilution -- with each subsequent generation more and more LLM-produced content will make up the bulk of the web this means it either gets into get into the data set, which makes the model "spikey" or reduces the relevant knowledge available which dumbs the model down 2. LLMs were sort of a freak breakthrough in deep learning and while its remarkable, tuning them only makes them marginally better. Diminishing returns apply here. Its a new LLM but its still an LLM -- years old technology now

However, despite these two realities, the total utility/productivity/societal gain from a product (note I didn't say model) like ChatGPT can still increase by orders of magnitude. That is because companies like OpenAI, and many, many, other technology corporations and startups are figuring out how to leverage LLMs to do stuff much more powerful than just answer questions from the corpus (ie: perform quality research, reevaluate itself, computer controls, etc, etc).

Consider for example, flat screen displays were pioneered in like the 50's and arguably, they didn't become disruptively useful until the advent of smart phones. So yeah, the model may get sort of worse or maybe more brittle, but it almost doesn't matter if they are figuring out what to do with the model and making the model more useful for actual tasks. Sure its cute to talk to an AI persona and ask it questions but that is probably the least important aspect of these type of models. Microsoft word had Clippy, and yeah that was cool. But productivity gain came from word processing, the '.doc' filetype, filesharing, editing, etc. Clippy is just a meme now and that's a likely future scenario for the "chat" features in LLM products IMHO.

SectorC: A C Compiler in 512 bytes

The F Word

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Speed up responses with fast mode

Software factories and the agentic moment

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

I write games in C (yes, C)

First Proof

The Waymo World Model

Show HN: A luma dependent chroma compression algorithm (image compression)

Al Lowe on model trains, funny deaths and working with Disney

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Reinforcement Learning from Human Feedback

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Selection Rather Than Prediction

Coding agents have replaced every framework I used

A Fresh Look at IBM 3270 Information Display System

The AI boom is causing shortages everywhere else

France's homegrown open source online office suite

72M Points of Interest

We mourn our craft

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

History and Timeline of the Proco Rat Pedal (2021)

SectorC: A C Compiler in 512 bytes

The F Word

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Speed up responses with fast mode

Software factories and the agentic moment

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

I write games in C (yes, C)

First Proof

The Waymo World Model

Show HN: A luma dependent chroma compression algorithm (image compression)

Al Lowe on model trains, funny deaths and working with Disney

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Reinforcement Learning from Human Feedback

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Selection Rather Than Prediction

Coding agents have replaced every framework I used

A Fresh Look at IBM 3270 Information Display System

The AI boom is causing shortages everywhere else

France's homegrown open source online office suite

72M Points of Interest

We mourn our craft

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

History and Timeline of the Proco Rat Pedal (2021)

OpenAI's new open-source model is basically Phi-5

Comments