I'm not sure what they'd get from training on that
I had a very basic React question about useState while porting some vanilla code last week which all models of all stripes I've tried it on have been confidently and completely incorrect about, up to stating the code absolutely will not work, even when I take a turn to assert that I ran it and it does, so there's plenty of shit in there already.
If human response is "That's BS", "fuck off", or something similar, mark as bad assistant message.
If human response is "huh" or "cool", mark as good assistant message.
If on ChatGPT, watch how much scrolling user does. If there's a lot, its somewhat likely that the LLM outputted something useful.
That strategy would have holes of course but as long as its better than guessing something like that would be a useful heuristic.
Even very weak human signals can be immensely valuable over large enough datasets.
Marking is not a trivial task though. Use some AI system to mark it and you get a 99.something% filter maybe but whatever that remainder is leaks through. Over time your filter may get worse as a result.
In Gemini you can turn off Gemini Apps Activity (warning: deletes your chat log, you need to copy paste everything into notes)
Highly recommended.
The real process involves submitting a request on another one of OpenAI's sites and awaiting a confirmation email (either their privacy or platform site).
Feel deceived and violated? Yeah, you, me and millions of other people, welcome to the club.
So maybe we'll get better reasoning and therefore better generated data/content in the wild, without this negative feedback loop everyone is worried about.
Are you sure?
If AI is meant to sound nearly identical to a human, you don't need more training data.
If its meant to act as a natural language encyclopedia, we'll never get there with LLMs which amount to natural language processing on top of a massively compressed dataset.
... and sigh ... it look like I'm twiddling with a noddy!
------
The user wants to know how many “I”s are in the word “antidisestablishmentarianism.” They probably mean the letter "I" in lowercase, though the question was a little broader with "Is."
I’ll treat it case-insensitively and count the total occurrences of the letter "i" (both uppercase and lowercase). I can count them manually now, or use a tool for faster analysis.
Analyzed word = "antidisestablishmentarianism" word_lower = word.lower() word_lower.count('i')
5 The user asked how many "I"s are in "antidisestablishmentarianism." I’m taking it to mean whether it’s counting "i" or "I", and I’ll consider them as the same for simplicity. So, I’ll count the 5 occurrences of "i" in the word. I manually broke it down and verified there are indeed five "i"s, so the final count should be 5. Pretty straightforward, now that it's all clear!
Done
There are 5 “i”s in “antidisestablishmentarianism.”
------
You can come up with complicated definitions of "core reasoning" where this doesn't count, and I understand there's great demand for such definitions, but this clearly satisfies common-sense understanding of the term.
sudp
Command 'sudp' not found, did you mean:
command 'sudo' from deb sudo (1.9.9-1ubuntu2.4)
command 'sudo' from deb sudo-ldap (1.9.9-1ubuntu2.4)
command 'sup' from deb sup (20100519-3)
command 'sfdp' from deb graphviz (2.42.2-6)
Try: sudo apt install <deb name>
Of course there is ambiguity though, more context would be hard to distinguish from core-reasoning and vice versa.
I think LLMs/AI mean we can substitute reasoning with vast accumulations and relations between contexts.
Remember, RLHF gives the models some, and perhaps most of these chains-of-thought, when there isn’t sufficient text to scrape for each family of problems. When I see that chain-of-thought, the first thing I think of is of my peers who had write, rewrite, nudge, and correct these chains of thought, and not about core reasoning.
The CoT has that same overexplained step-by-step so many RLHF’ers will be accustomed to, and much of it was authored/originated by them. And due to the infinite holes it feels like plugging, I dont call that RL reasoning.
Did we read the same response? It did write a lot of reasons, but didn't do any reasoning at all, it just suddenly wrote "5" here
So, I’ll count the 5 occurrences of "i" in the word.
There was no reasoning at all to arrive at 5, so no your example just proves how these models are great at faking reasoning.No. Synthetic data is being used to improve LLMs
I know they're training with synthetic data, I didn't realize that has been done at scalr for long enough to really know if it improved (assuming the metrics its improving are defined well).
LLama3 were post-trained on almost entirely synthetic data. Yes, it works. No, the model doesn't collapse (unless you want it to, of course).
What they did is use Model n-1 to classify, filter and enhance the datasets for Model n.
That doesn't mean there aren't ways to train a model incorporating synthetic data without seeing model collapse
This line of thought was exacerbated by that one paper that was then parroted (hah!) by every influencer / negativist in the space. It didn't matter that the paper was badly executed, their setup was flawed and that it got rendered moot by the existence of LLama3 models. People still quote that, or the "articles" stemming from it.
A simple example would be chess ai. The core knowledge is rules of the game. We have human generated examples of plays, but we don’t really need them - we can (and we did) synthesize data to train ai.
A similar pattern can be used for all math/physics/programming/reasoning.
No it can't, the pattern for chess worked since it was an invented problem where we have a simple outcome checks, we can't do the same for natural problems where we don't have easily judged outcomes.
So you can do it for arithmetics and similar where you can generate tons of questions and answers, but you can't use this for things that are fuzzier like physics or chemistry or math theorem choices. In the end we don't really know what a good math theorem is like, it has to be useful but how do you judge that? Not just any truthy mathematical statement is seen as a theorem, most statements doesn't lead anywhere.
Once we have a universal automated judge that can judge any kind of human research output then sure your statement is true, then we can train research AI that way. But we don't have that, or science would look very different than it does today. But I'd argue that such a judge needs to be AGI on its own, so its circular.
If you've noticed, most LLM interfaces have a "thumbs up" or "thumbs down" response. The prompt may provide novel data. The text generated is synthetic. You don't need an automated judge, the user is providing sufficient feedback.
Same thing goes for the other disciplines.
You might be interested in some of the details of how AlphaGo (and especially the followup version) works.
Go is a problem where it's very difficult to judge a particular position, but they were still able to write a self-improving AI system that can reach _very_ high quality results starting from nothing, and only using computing power.
There does not appear to me to be any fundamental reason the same sort of techniques can't work for arbitrary problems.
> But I'd argue that such a judge needs to be AGI on its own, so its circular.
But is it circular in a way that means it can't exist, or can it run in circles like AlphaGo and keep improving itself?
I have a best fit line. Then I take random data on that line to train a new line.
I pretty much get the same line.
From an intuitive perspective... it doesn't get worse. At worst it stays the same.
Now imagine something a bit more complex. I have a best fit curve that's very close to a line.
I use random data from that curve to train a new best fit line.
I get something different now. Not necessarily worse.
I mean literally just take all your ideas of ML and just imagine it on the 2D plane doing curve fitting. If retraining new lines from generated data doesn't necessarily make things worse.
If people at OpenAI, Anthropic, or Google said this, that would be interesting. But I don't think it makes sense any more to treat academic computer scientists as relevant experts here.
Any understanding of how they work is largely theoretical, that seems like a reasonable place for academics to lean in and join the conversation.
There wasn't any known active AI back then, but statistics on popular ideas and internet content was already a thing, and speech pollution based on those assessments had already started to spread fast, manually outputted.
Sure, a lot of good content came out since then. But the amount of garbage... it's immense and very difficult to sort out automatically.
The major issue is that this garbage then _became_ the norm. Only people who lived back then can remember what it was. For new folk, it looks just like a generational shift. However, it is quite obvious that some aspects of this shift were... unnatural (in the sense of not being spontaneous cultural manifestations).
Pay attention.
I mentioned explicitly that I see what happened as distinct from a natural generational shift.
There are many phenomena around that era to support what I am saying. Like, for example, the first massive political campaign to leverage internet as its primary vehicle.
(Most of my disks are still MBR as they're not big enough to be worth the hassle of using GPT.)
Should it eventually happen for LLM outputs, I hope we name it Slop Wars.
Wikipedia had some comical instances where high quality contributors accident ventured into other areas where they spontaneously transformed into ignorant trolls.
Humans have been trained on "human-generated data" (cultural artifacts) for centuries, and quality is not down. AI is only an accelerator of this process, but there is nothing inherent in creating "artifacts" that would pollute the original training data.
If anything, we should be worried about destroying nature, because that's the original inspiration for human-produced artifacts.
So yes, you can’t compare humans generating and picking influential content to AIs doing so.
GAI is a dead end IMO anyway we’ve seen much more success with machine learning, GAI is good for fooling humans into thinking they see glimmers of intelligence.
In fact I wouldn't be surprised if this tainted information somehow enriches a dataset by providing an extra dimensionality for training specialized heuristics. Maybe this would turn out to be how LLM hallucination can be solved, through being able to accurately identify AI generated material, and as result, becoming more robust against both the identification and generation of nonsense.
Humans learn to discern what/who to best pay attention to via all manners of heuristics. I don't see in principle why LLMs (or something like it in the future) won't eventually be able to do the same.
behnamoh•6h ago
[0] Like statistics of words, etc.
jsheard•5h ago
I noticed that a big chunk of the default Llama 4 system prompt is devoted to suppressing various GPT-isms, which to me implies they weren't able to keep their newer training set from being contaminated by competing models.
> You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, "it's unethical to", "it's worth noting…", “Remember…” etc. Avoid using these.
dustingetz•5h ago
jimbob45•5h ago
sampullman•5h ago
djeastm•4h ago
jimbob45•4h ago
umbra07•2h ago
IAmGraydon•5h ago
Rodeoclash•5h ago
jbaber•4h ago
AaronAPU•4h ago
bitwize•1h ago
_heimdall•5h ago
energy123•1h ago