Haven't all the big labs been doing this for a couple years now? It's a good idea, with great execution, but it's far from novel.
The paper is very accessible (it's mostly written by Anthropic researchers), and Section 4 summarises their findings really well. They were themselves really surprised by the results:
> We were initially very skeptical of these findings, because they seemed clearly too good to be true, and suspiciously close to training with actual labels. To ensure we didn’t accidentally train on the labels, (1) we re-ran the experiment several times on different datasets, (2) we copied the dataset into a new file, excluding any labels before re-running our algorithm with that file, and (3) one coauthor independently replicated the findings on the Claude 3.5 Haiku base model using a different codebase.
(emphasis mine)
Techniques can be arbitrarily old & common in industry, but still be a novel academic paper, first to document & evaluate key aspects in that separate (& often lagging) canon.
I didn't read the whole paper but it seems important that you still need real ground truth to measure improvement, so you still need to get real labels at some point. The task they focus on where LLMs have "superhuman" performance is guessing the gender of blog authors. While humans are bad at this, humans are decent as remembering their gender, and a bunch of them are willing to write a blog post, so there's obviously a better way to get supervised examples than asking humans to guess labels: you collect posts in from authors whose gender is known. i.e. "human generated labels are low quality" should not be taken to mean "good labels are not available so we should go fully unsupervised".
So since you already need some real ground truth to know whether your algorithm accomplished anything, I think it's fair to ask: when would you commit to using _all_ your labeled data for evaluation and none for fine tuning, as described in this work? Logical consistency seems valuable, sure, but it seems like really you'd want to use both consistency and some (small?) amount of labeled examples, and a perhaps larger amount of self-labeled examples. In their loop where they revise labels to be more coherent, it seems natural to imagine that pre-provided labels should be stickier than self-generated ones, but not immutable, because there's always some chance of noise in your upstream data generation process.
My immediate thought is how this differs from just naively asking each each question individually to the LLM multiple times and taking the ground truth as consensus majority. The search algorithm probably implicitly does this, though I guess there is some benefit in providing it other related questions as well. I thought I remember similar papers dealing with LLM self-interrogation, using the idea that "true" statements must be logically consistent and so the same underlying explanation should hold for perturbations or related questions as well.
The flaw seems to be that it's still beholden to the training data. Any misconceptions that are internalized during pretraining won't actually be fixed, and in fact they'll only be propagated further.
It looks like she's a science communicator rather than a scientist herself. That's interesting... I'm not used to seeing academic papers that include an author devoted entirely to the writing aspect. (Then again, maybe I just haven't noticed?)
This might seem like a nit but the term "superhuman" is a VERY strong one to my mind. It doesn't suggest "better than the average human off the street at a particular random task" but instead suggests "better than humans are capable of getting with training, at a high percentile-level".
One of the biggest advantages of LLMs as a tool are that they are generally quite good against a broad variety of things without needing a ton of further domain-specific training. Humans tend to be the opposite.
It doesn't seem like they gave much training to the human annotators they recruited. Whereas an LLM trained on the internet has been trained on a LOT of blog posts + associated metadata. And nobody has ever really bothered figuring out "how would we best train humans to identify gender of blog post authors" - there's very little economic incentive for it. It's not like we generally train people to write in gender-specific ways in school either, so we haven't been formally instructed on potential differences. We'd have to rely on broad-brush generalizations if not given an opportunity to deep dive to try to find more specific tendencies.
But if you pay people to study a big majority chunk of the corpus they're using for this for a couple years, focusing consciously on the post style, contents, and the gender both, and then test them on stuff from the ones you held out... how well could they do?
The term is often used in fiction, particularly in superhero comics and fantasy, but it can also be used metaphorically to describe extraordinary effort or achievement in real life (e.g., "It took a superhuman effort to finish the marathon").
(Definition from Gemini)
It seems reasonable to use the term to me simply to say the abilities on a benchmark of the model were greater than the human annotated data. Computers have always been superhuman at many tasks, even before llms.
How do you know what normal human capabilities are for an unusual task that humans have not trained for? Is identifying the gender of the author of a blog post 80% of the time "extraordinary"? How do I know what a human is capable of doing for that with training?
If a person with no programming experience asked Claude or ChatGPT to produce some code, they'd get better code than their "normal" human capability could produce. So: superhuman coders?
But also today, I have asked Claude and ChatGPT to do coding tasks for me that both models got stuck on. Then I fixed them myself because I've had a lot of training and practice. So: not superhuman? But wait, the model output the broken code faster than I would've. So: superhuman again?
Extraordinary shouldn't be so easily disputable.
LLMs have superhuman breadth and superhuman speed. I haven't seen superhuman depth in any capabilities yet. I've seen them have "better than untrained median person" and often "better than hobbyist" depth. But here the authors claim "superhuman capabilities" which is pretty specificly not just meaning the breadth or speed.
https://en.wikipedia.org/wiki/Superhuman
First line: "The term superhuman refers to humans, humanoids or other beings with abilities and other qualities that exceed those naturally found in humans."
Golly, I wonder what that model based its first sentence on.
unchocked•16h ago