LLMs are more persuasive than incentivized human persuaders

140•flornt•8mo ago

Comments

metalcrow•8mo ago

My guess for the reason behind this is that LLMs have more facts memorized, and thus can make more reasonable and well-researched sounding answers. If you ask an LLM vs a Human "Is a stack in computer science a) a data structure that is first in first out or b) a data structure that is first in last out" the LLM can say stuff resembling "Based on Dijkstra's algorithm proof given in 1943 and the nature of Turing complete languages being traditionally a top-down oriented system, a stack is ..." while a human is just going to say "It's B because that's what a stack is".

CJefferson•8mo ago

Based on reading bad AI generated student essays it’s worse than that, LLMs are happy to “fill in the blanks” with whatever made up fact would make their argument look best.

Most people can’t lie that smoothly, and most readers don’t check carefully, unless they are already an expert in the area.

Any kind of maths proof is particularly bad, they will look convincing and clear until you read them very carefully and see all the holes.

AlexCoventry•8mo ago

It depends on the AI. ChatGPT's higher models (o1-pro/o3/o4-mini-high) have some kind of limited capability to detect errors in the user's thinking, and have relatively few hallucinations.

UltraSane•8mo ago

I've had fun debates about things like p-zombies with Gemini 2.5 Pro

energy123•8mo ago

o3 have twice the hallucinations of o1 according to their own hallucination benchmark

smeej•8mo ago

It's funny you mention this, because my father operates exactly as you describe the LLMs, making facts up on the spot, lying smoothly and keeping track of the lies...

...and he's built his whole career in sales because of it.

He despises the existence of Google, because the last thing a pathological bullshitter wants is fact-checking in pockets!

It's taken me nearly 40 years to understand that anchoring statements in reality is just a completely meaningless endeavor for him. He does not care what is true. He cares only what is convincing.

I've been wondering for about a year now why I feel like I can tell LLM work from human work so much more easily than most people, when the only "tell" I can put my finger on is, "The hair stands up on the back of my neck," but this explains ALL of it.

MattGaiser•8mo ago

In his line of work, it doesn't matter what is true.

Llamamoe•8mo ago

I feel like a good half of humanity operates this way, with it being far more prevalent in boomers than younger generations. It doesn't matter what is backed by evidence to them, instead they rely on anecdotes and persuasive quips and factoids. Having a friend who claims to have experienced X and listing off several other anecdotes means more to them than any amount of evidence.

smeej•8mo ago

The truly scary thing to me is watching them start to believe the anecdotes they've stolen from people and presented as their own stories actually did happen to them, as they lose their marbles.

I've spent much of my life learning to tell when people are making things up, but telling when they genuinely believe something that's completely wrong is a very different skill.

It's especially frustrating when they change the narrative of a real story about something where there were multiple witnesses (e.g., my mom and my siblings), then come to believe the narrative, and then accuse us of conspiring to gaslight them.

On the one hand, I get why that would be disorienting and scary, to have a whole group of people telling you you're wrong about something you're sure you remember. On the other hand...karma?

armchairhacker•8mo ago

LLMs also never get tired of arguing. They'll respond to every point from a gish-gallop and provide quality-sounding replies to points that are obviously (to an informed person) flawed or seem (but aren't necessarily) mal-intentioned.

EDIT: LLMs also aren't egocentric; they'll respond in the other person's style (grammar, tone, and perhaps maintain their "subtext" like assumptions), and they're less likely to omit important information that would be implicit to them but not the other person.

sitkack•8mo ago

Any qualities you ascribe to an LLM is part of its RLHF, ask to get irritated or lazy and it will simulate those qualities. They are high dimensional text simulators. They can and do simulate anything.

koakuma-chan•8mo ago

I asked an LLM and it said "A stack is a data structure that follows the Last In, First Out (LIFO) principle. This means that the last element added to the stack is the first element to be removed."

hansmayer•8mo ago

Yikes:( I am so worried about the damage that will be caused by the misuse of these tools. Already a lot of young folks will just mindlessly trust whatever the magic oracle spits out at them. We need to go back to testing people with pen and paper I suppose.

koakuma-chan•8mo ago

I mean, is it wrong? It seems correct. Unless I'm missing something.

hansmayer•8mo ago

Oops, my bad. I seem to have misread. Sorry.

thinkcritical•8mo ago

No, a stack is LIFO like it said. A queue is FIFO or in other words LILO “Last In Last Out”.

koakuma-chan•8mo ago

My last job was at the office. I had my work queue implemented as a stack of files. I would sit at my desk and, in an infinite loop, pop files from my stack and process them. Occasionally, my supervisor would come and push a new file onto my stack. A naive worker would think that, once I was done with my stack, I could finally get some sleep, but no. Our office implemented something called "work stealing," where, once I was done with my own work, I had to visit a random co-worker and pop files from their stack.

lovasoa•8mo ago

No. The LLM's answer is correct.

jstanley•8mo ago

Why is that a bad answer?

hansmayer•8mo ago

Sorry - I misread the LLM answer - actually the LLM produced a correct answer here

lovasoa•8mo ago

No it is not: https://en.wikipedia.org/wiki/FIFO_%28computing_and_electron...

louthy•8mo ago

> No it is not…

That’s a queue, not a stack. The LLM response was correct.

danielbln•8mo ago

But a stack is commonly LIFO, not FIFO?!

idonotknowwhy•8mo ago

This reads like a line from a QwQ or Qwen3 CoT chain :)

Karrot_Kream•8mo ago

I read this and I see a common thinking fallacy, when someone is inclined to believe something a priori they fit the evidence to their a priori beliefs.

hansmayer•8mo ago

No, its fairly simple - I misread

abtinf•8mo ago

It’s subtle but I would regard this as an incorrect answer.

The structure of the LLM answer is:

A is B; B exhibits property C.

The correct answer is:

A exhibits property C; B is the class of things with property C; therefore A is B.

There is a crucial difference between these two.

literalAardvark•8mo ago

This doesn't apply to all prompts, and the prompt was not provided. Natural language is a fickle thing.

moffkalast•8mo ago

This kind of pointless hair splitting is why people would rather talk to an LLM.

Benjammer•8mo ago

This kind of “hair splitting” is the foundation on current prompt engineering though…

Matthyze•8mo ago

I think you've read too much early Wittgenstein. That is simply not how people communicate.

Sharlin•8mo ago

The gap between LLM and human cases was greater in the deceptive case. This may, of course, simply reflect the fact that random humans are bad at lying.

hammock•8mo ago

Reminds me of the horrific state of student debate competitions today where the winning strategy is to incomprehensibly rattle off as many arguments as quickly as possible with strange breathy sounds in between

azemetre•8mo ago

Do you have a YouTube video demonstrating this? My only experience with debate is from the TV show Community.

justonceokay•8mo ago

This one is very short but conveys the idea well. Not all debate is like this but it is definitely a real phenomenon

https://youtu.be/LMO27PAHjrY

cwmoore•8mo ago

A small step for a man, a giantleapfrogmankind.

OJFord•8mo ago

Hamdiddle-eedah-hamdiddle-ah (do do do do dodododo expi-ali-do-cious)

What is the point of that? They're incomprehensible. (For those who haven't watched it: the video just shows people talking very fast, it doesn't explain why, kind of implies it's somehow good or impressive.)

nimih•8mo ago

The point is to win debate tournaments. In particular, it is (or at least was, when I competed in policy debate in high school and college in the 00s) strategically advantageous to maximize the number of distinct arguments, each with their own set of supporting evidence (usually read verbatim from a prepared excerpt of a news article or authoritative reference or whatever), you make within the allocated time. This incentivizes talking extremely quickly, which requires a fair bit of practice to become proficient at (and to understand).

OJFord•8mo ago

And the judges of these tournaments not only understand it too (I can understand an opponent understanding if they've practiced the same thing) but seriously value it in scoring?

Again/stepping back: what is the point of winning a debate tournament like this, or that values this 'debate'?

umanwizard•8mo ago

What’s the point of winning a chess tournament or any other intellectual game/sport?

OJFord•8mo ago

What's intellectual about slurring some words really quickly?

bardan•8mo ago

The equivalent would be speed chess, wouldnt it? There is nothing intellectual about speaking as fast as possible.

nimih•8mo ago

> And the judges of these tournaments not only understand it too (I can understand an opponent understanding if they've practiced the same thing)

Generally yes, although a good team will slow down and speak more or less like normal people if they have a so-called "lay judge" who wouldn't be able to understand them going at full speed.

> but seriously value it in scoring?

You don't really get "scored" on how fast you speak (there's no points system), but, as I mentioned, there are strategic reasons to speak quickly.

For instance, a time-honored strategy is to spew out a huge number of roughly-orthogonal arguments (e.g. "my opponent's policy failed to support the resolution we are debating this tournament, and thus shouldn't win for procedural reasons." "my opponent's policy would destabilize the Kashmir conflict and thus lead to global thermonuclear war." "my opponent's policy would preclude this alternative policy I am now presenting, and my policy is better, ergo my opponent's policy is bad in terms of opportunity cost." and so on), and then circle back later in the debate and further develop any arguments your opponent failed to adequately address (perhaps because they can't speak as fast as you).

An interesting counter-example from when I was actively debating is that at least one team on the national circuit was arguing (somewhat successfully, if I remember right) that speaking fast was a reason to actively vote against a team. The rough gist of the argument was something along the lines that being trained to speak quickly (and have the huge amount of prep-work required to really get value out of the skill) was something really only accessible to affluent/"privileged" kids (although that latter term was a couple years away from entering the common lexicon, I think), and then connecting it back to the central topic of that debate season so as to undermine whatever position the other team had originally presented (but, of course, pointing out that the impact of their argument was occurring in the real world, right now, contra the assumed fiction of their opponent's policy proposal or whatever, and thus a more urgent reason to vote for them).

> what is the point of winning a debate tournament like this, or that values this 'debate'?

For the most part, it's a fun and challenging game for the people involved, the same reason people play chess or go bowling. There's a lot of work and creativity that goes into preparing for a tournament, and the debate rounds themselves reward being able to think quickly on your feet and work well with your debate partner. You get a lot of practice at speaking in public, to a hostile audience no less, which is imo an incredibly valuable life skill (and can be very exciting).

prisenco•8mo ago

https://en.wikipedia.org/wiki/Spreading_(debate)

1oooqooq•8mo ago

not even Idiocracy predicted that one.

AlexCoventry•8mo ago

These students are probably intellectually gifted, they're just playing a stupid game for the sake of an item on their resume.

namaria•8mo ago

I question the intellect of anyone engaging in silly games with the sole purpose of impressing other people.

bardan•8mo ago

Seems like a competition that started reasonably and mutated into nonsense over time as the rules were exploited (and never modified, I guess). If it's an established debate style and offered to kids as legitimate you can't blame them. Kids do what is available to them.

namaria•8mo ago

Everyone has choices to make in life.

smeej•8mo ago

I'm accustomed to listening to regular speech at 2-3x speed, but apparently that's entirely different than listening to a human try to speak 2-3x faster than normal, because I could barely pick intelligible syllables out of that mess.

This is such an example of getting what you incentivize, not what matters.

jimbokun•8mo ago

What the fuck is wrong with the people running these debates that they reward these techniques?

xrhobo•8mo ago

It is quite strange. One would think a judge would easily throw this out.

I mean there is probably not a specific rule I could point to that a high school athlete couldn't ride a bike or a motorcycle in a 400m track run either.

There is probably not a specific rule that you can't shoot the shot put out of a canon either.

I would just assume the judges have the slightest bit of common sense.

sebastiennight•8mo ago

> I mean there is probably not a specific rule I could point to that a high school athlete couldn't ride a bike or a motorcycle in a 400m track run either.

There most definitely is such a rule, and there most definitely are people who have tried to do that - and been the cause of the original rule wording ; and others who still have tried to do so by "creatively interpreting" said rule.

Have you met humans?

timeforcomputer•8mo ago

"Because we raise the trigger and only two carrying noodles, and only two can announce in this network but their excess cites their examine this places where the apparatus of military power torches the ground"

He makes an intriguing point.

sebastiennight•8mo ago

This is hilarious and reminds me of when I was exactly that age, and learning to spit out Busta Rhymes's "Break Your Neck" [0] at full speed.

When Busta makes more intelligible listening than the arguments of your debate team, you know debate is broken.

[0]: Start 2 minutes in, give it a try: https://youtu.be/W7FfCJb8JZQ?feature=shared&t=120

upghost•8mo ago

This is a consequence of the fact that any argument not responded to "flows across" the score sheet and is automatically a win for the team making the argument, no matter how silly. So a "natural" tendency would be to ignore ridiculous arguments like "not paying for school lunches will cause children to hyperventilate, and by the butterfly effect will lead to infinite hurricanes in developing nations causing a collapse of the global economy and intergalatic war and genocide". But if the opposite team fails to acknowledge the argument then that is the same as conceding it will happen.

beeflet•8mo ago

Which is pretty ridiculous. The purpose of a debate should be to change/consolidate the hearts and minds of the audience to your side. To this end, it's usually sufficient to pick apart a few of the key points of your opponent's argument. Nitpicking every aspect of your opponent usually comes off as uncharismatic.

Brevity is really important in a debate. Especially in the modern day where someone might turn you into a chad vs soyjack meme.

And if anything, what happens before the debate is more important than what happens during it. Our dear president showed us you can become the leader of the free world using playground insults and ad-libbed speeches if you choose the right demographics and look good in a suit.

johnisgood•8mo ago

Debates these days (especially political ones) are just unnecessary, totally unrelated ad hominems, and people yelling over each other.

Yup to your last sentence. It irritated me how off-topic his responses were.

amenhotep•8mo ago

He looks awful in a suit!

thih9•8mo ago

I guess winning like this cheapens the victory. Then again, this strategy continues to be used at all levels of disputes and politics. I wish there was a way to stop that, not just in student debates.

api•8mo ago

That’s just like the larger discourse. The Gish gallop is standard practice.

Are there no rules in debates? There should be. You’re not allowed to punch someone in basketball so why should you be allowed to DOS people with bullshit in a debate?

Der_Einzige•8mo ago

Btw - my first author NeurIPS dataset and benchmarks paper is taking basically all the evidence that such debate community (American hs and college level policy and LD debate) produced over its recent history and making it easy for LLMs and people to consume it.

They’ve been quietly open sourcing all of their arguments for like 20+ years.

This dataset is so large and good entirely because of speed reading and the current state of debate tournament competitive dynamics. Spreading might be objectively absurd to listeners but the effects of it are literally good for society.

https://arxiv.org/abs/2406.14657

https://huggingface.co/datasets/Yusuf5/OpenCaselist

SoftTalker•8mo ago

I learned a stack is like a stack of plates in a cafeteria. That seems a better answer than either of those.

nickpsecurity•8mo ago

They also have more persuasive conversations in their pretraining data. That includes tons of marketing material, cons, and bullying. They are also as bold as you want them to be about imitating such tactics. They have no remorse or legal accountability either.

rstuart4133•8mo ago

> My guess for the reason behind this is that LLMs have more facts memorized,

From https://ai.meta.com/research/cicero/ :

    When playing 40 games against human players, CICERO achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game

There are not a lot of facts to know when playing diplomacy. It's all about manipulating the other guy with words.

jfengel•8mo ago

Welp, we're boned.

kragen•8mo ago

Which incentivized human persuaders? Are we talking about top salespeople and litigators, or are we talking about average college freshmen?

It says they recruited participants from the US through Prolific and paid them £10.12 per hour, so probably more like the latter.

lordofgibbons•8mo ago

Does it matter? the difference is only 6 months of LLM progress.

kragen•8mo ago

It matters if studies like this matter, that is, it matters to people who are interested in what has currently happened rather than what might happen in the future. 6 months of LLM progress keeps not looking like what I expected.

On the other hand, if you're content with your pre-existing predictions about what would happen, which I think is actually a reasonable position, there's no reason to read the paper.

fzzzy•8mo ago

Is progress faster or slower than you expected?

kragen•8mo ago

An astounding amount of both.

fzzzy•8mo ago

That makes sense. Progress does seem very lumpy. Commercialization and secrets means that what we see publicly may not be state of the art. And I certainly never thought that Zuckerberg would be the one providing open model weights. It's a strange new world.

Morizero•8mo ago

Basically the same finding as the controversial Zurich paper on using LLMs to change opinions in the "change my view" subreddit

echelon•8mo ago

I'm hoping we see a flood of LLMs just like that Zurich piece, but at 10,000x scale. Perhaps even open source platforms to run your hobby LLM bot farm.

Social media has turned into cancer. It'd be riveting to watch it turn into bots talking to other bots. Social media wouldn't go away, but I get the feeling people will engage more with real life again.

As the platforms see less growth and fewer real users, we might even see a return to protocols and open standards instead of monolithic walled gardens.

Tryk•8mo ago

Source?

_boffin_•8mo ago

https://regmedia.co.uk/2025/04/29/supplied_can_ai_change_you...

reducesuffering•8mo ago

We're fast approaching the point that any value someone can provide through a digital interface could be better done by a model. What do we use digital interfaces for? Practically everything.

Oh well not being a plumber, electrician, or farmer... but our society's current productivity, technology, automation reduced our need for 80% of the population needing to be farmers to now 1.3% in the US. Can you imagine what the equivalent of 1 billion digital engineers unlocks in understanding and implementing robotics?

sidibe•8mo ago

Yes when the knowledge jobs are all done best by AI the rest will follow shortly. we will need to adapt to being "useless" as far as work goes and find other sources of worth. There's still a lot of people who want to compare it to Bitcoin hype around here, IMO the next few years everything is going to change way faster than than it ever has.

For the record I always thought Kurzweil and that crowd was clowns, now I think I was the wrong one

hansmayer•8mo ago

> IMO the next few years everything is going to change way faster Honestly, after hearing this for the past 20 years (ever since ML and LLMs became a thing), it is actually more like the level-5 autonomous car hype and less like Bitcoin. Only that the driverless car hype never required such a humongous investment bubble, as does the Statistical-Text-Generator-as-AI one.

tuatoru•8mo ago

A challenge for people who think this way: be first in line to have a robot change your six-month-old daughter's nappy.

sidibe•8mo ago

Now that'd be crazy, like letting a Tesla drive you around in the back seat.

Give it a decade though and people won't think twice about it, though I do hope we'd still do that kind of thing ourselves

alpaca128•8mo ago

Meanwhile I haven't seen any real progress that I'd care about in a while.

Is GPT-4xyz better than the last one? I'm sure some benchmark numbers say that. But the number of applications where occasional hallucinations don't matter is very small, and where it matters nothing really changed. Companies are trying to use it for customer support but that predictably turned out to be a legal risk. Klarna went all-in on AI and regrets it now.

Some media are talking about Microsoft writing 30% of their new code with AI, but what Nadella actually said is less impressive: "maybe 20-30% of the code that is inside of our repos today in some of our projects are probably all written by software". Which, coincidentally, is the ratio of code that can be autocompleted by an IDE without LLM, according to Jetbrains.

I have yet to see any evidence that anything will change way faster than it ever has, aside from the readiness of many younger people to use it in everyday life for things it really shouldn't be used.

sidibe•8mo ago

Yes they have gotten better. If you give Gemini 2.5 the right context it seems to solve whatever. Drop in the folder + docs and it tends to be right about how to proceed now. I think people who don't find LLMs useful now aren't trying with the right context.

Der_Einzige•8mo ago

I’m with you. Weak version of a singularity seems likely. Recursive self improvement isn’t just possible, it’s inevitable. Models are capable of extrapolation, but they don’t even need it to do good interpolation which itself is enough to get us recursive self improvement.

I tend to think that it’ll have an optimistic ending. The key to solving most political problems is eliminating scarcity.

Nevermark•8mo ago

A clear case where LLMs exceed humans is in identifying solutions to disparate shallow constraints involving what would normally require very wide searches of more knowledge than any of us will ever have.

A simple case I have found, is looking for existing or creating new terms. If I have a series of concepts, which I have names for which have a nice linguistic pattern to emphasize their close relationship, except for one. I can describe the regularly named concepts, then ask for suggestions for the remaining concept.

The LLM pulls from virtually every topic with domain terminology, repurposable languages (Greek, Roman), words from fiction, all the way to creative construction of new words, tenses, etc to come up with great proposals in seconds.

I could imagine that crafting persuasive wording would be a similar challenge. Choosing the right words, right phrasing, etc. to carry as much positive connotation, implication of solidity, avoiding anything sounding challenging or controlling, etc. from all of human language and its huge space of emotional constraints and composites.

Very shallow but very wide reasoning/searching/balancing done in very little time.

And with an ability to avoid giving any unnecessary purchase for disagreement, being informed of all the myriad of typical and idiosyncratic ways people get hung up on failed persuasions. Whether in general or specific topic related.

LLM generated writing can be stereotypical.

But the more constraints put on requested material, the more their ability to construct really very original high quality, or even cleverly unique, prose in real time shines.

pottertheotter•8mo ago

Do you have any examples where you’ve used them for this? Would be interesting to see.

Nevermark•8mo ago

A simple example is I have a series of three ordered connected things.

I called the first one “pre” since it “precedes” the others.

I called the last one “pro” since it “proceeds” from the others.

These are somewhat poetic/whimsical terms, for an abstract arrangement, but it’s nice to have good terms even for non-serious stuff.

I couldn’t come up with an equally concise term for the middle.

Claude came up with “per” as in “through” for the middle thing.

Couldn’t be more fitting.

make3•8mo ago

publish this at ACL

sitkack•8mo ago

Inventing words (neologism) with LLMs is a fun past time, or giving it descriptions and having it define the concept in various languages.

andix•8mo ago

It's not programmers who should be scared about getting replaced by AI. It's obviously sales people, who should ;)

thethirdone•8mo ago

Based on the data in table 3, I would attribute most of the difference to length of advice. LLMs average word count (29.4) is more than double human word count (13.25). Most other measures do not have a significant ratio. "Difficult word count" would be the only other with a ratio higher than 2, but that is inherited from total word count.

I think it would be difficult to truly convince me to answer differently in a test with 14 words where 30 would have enough space to actually convey an argument.

I would be very interested to see the test rerun while limiting LLM response length or encouraging long responses from humans.

jstanley•8mo ago

If you think writing more words will be more persuasive, just... write more words?

The test already incentivises being persuasive! If writing more words would do that, and the incentivised human persuaders don't write more words and the LLMs do, then I think it's fair to say that LLMs are more persuasive than incentivised human persuaders.

thethirdone•8mo ago

Sure. I am not contesting that LLMs are more persuasive in this context. That basic result comes through very clearly in the paper. Its not as clear how relevant this is to other situations though. I think its quite likely that humans given the instruction to increase word count might outperform LLMs. People are very unlikely to have practiced the specific task of giving advice on multiple choice tests whereas LLMs have likely gotten RLHF training which likely helps in this situation.

I always try to pick out as many tidbits as possible from papers that might be applicable in other situations. I think the main difference of word count may be overshadowing other insights that may be more relevant to longer form argumentation.

aspenmayer•8mo ago

> I would be very interested to see the test rerun while limiting LLM response length or encouraging long responses from humans.

I don’t know if that would have the effect you want. And if you’re more likely have hallucinations at lower word counts, that matters for those who are scrupulous, but many people trying to convince you of something believe the ends justify the means, and that honesty or correspondence to reality are not necessary, just nice to have.

Asking chatbots for short answers can increase hallucinations, study finds - https://news.ycombinator.com/item?id=43950684 - May 2025 (1 comment)

which is reporting on this post:

Good answers not necessarily factual answers: analysis of hallucination in LLMs - https://news.ycombinator.com/item?id=43950678 - May 2025 (1 comment)

thethirdone•8mo ago

I'm not sure what effect you think I want. The suggestion was just to increase the "interestingness" of the study. It seems to be like the main difference between LLM and human shown was length of response. Controlling for that variable and rerunning the experiment would help show other differences.

I do think its distinctly possible that LLMs will be much less convincing due to increased hallucinations at a low word count. I also think that may have less of an effect for dishonest suggestions. Simply stating a lie confidently is relatively effective.

I would prefer advising humans to increase length rather than restricting LLMs because of the cited effects.

aspenmayer•8mo ago

> I would prefer advising humans to increase length rather than restricting LLMs because of the cited effects.

I would advise the opposite to humans, as your advice is playing to the strengths of AI/LLMs and away from the strengths of humans versus AI/LLMs.

thethirdone•8mo ago

Advising the opposite to humans does not make sense. 13 words is already tiny to convince someone. The choices I was thinking were restricting LLM word count and increasing human word count. The goal is specifically to make them more comparable.

The given study does not show any strength of humans over LLMs. Both goal metrics (truthful and deceptive) are better for LLMs than humans. If you are misinterpreting my advice as general advice for people not under the study's conditions, I would want to see the results of the proposed rerun before suggesting that.

However, if length of text is legitimately convincing regardless of content, I don't know why humans should avoid using that. If LLMs end up more convincing to humans than other humans simply because humans are too prideful to make their arguments longer, that seems like the worst possible future.

aspenmayer•8mo ago

> If LLMs end up more convincing to humans than other humans simply because humans are too prideful to make their arguments longer, that seems like the worst possible future.

People aren’t too proud to make long arguments, they just take more time and effort to make for humans, and so historically, humans subconsciously consider longer arguments as more intellectually rigorous whether they are or not, and so length of a written piece is used as a kind of lazy heuristic corresponding with quality. When we're comparing the output of humans to that of other humans, this kind of approach may work to a certain extent, but AI/LLMs seem to be better at writing long pieces of text upon demand than humans. That humans find the LLM output more convincing if it is longer is not surprising to me, but I’ll agree with you that it isn’t a good sign either. The metric has become a target.

Nevermark•8mo ago

Each of us could benefit from a respective loyal model of our own, critiquing and marking up any persuasive material from others.

tuatoru•8mo ago

I'm seeing a lot of ads from Replika about loyal models...

roywiggins•8mo ago

Great, so Internet arguments devolve to Pokémon battles between our respective LLMs.

> ChatGPT, I choose YOU!

ChatGPT uses GISH GALLOP.

Nevermark•8mo ago

Well, isn’t that essentially how most of us navigate most fine points of science, or deep mathematics?

Let the real scientists settle things for us?

Alternatively, many people today let opinion media and/or their associated group allegiances settle what they believe. Much worse.

godelski•8mo ago

It is CRITICAL that we be realistic about what fulfills the optimization objectives in the models that we train. I think there's been a significant unwillingness that objectives like "human preference" (RLHF, DPO, etc) not only help models become more accurate and sound more natural in speech, BUT ALSO optimize the models to be deceptive and convincing when they are wrong. It's easy to see, because you know what's more preferential than a lie? A lie that you don't know is a lie. You (may) prefer the truth, but if you cannot differentiate the truth from a lie you'll preference based on some other criteria. We all know that lies frequently win out here. If you doubt this, just turn on the news or talk to someone that belongs to the opposite political party of yourself.

This creates a very poorly designed tool! A good tool should fail as loudly as possible, in that it alerts the user of the failure and does its best to specify the conditions that led to this. This isn't always possible, but if you look at physical engineers you'll see that this is where they spend a significant portion of their time. Even in software I'd argue we do a lot here, but also that it is easy to brush off (we all love those compiler messages... right?). Clearly right now LLMs are in a state where we don't know how to make their failures more visible, and honestly, that is okay. But what is not okay is to pretend that this is not current reality and pretend that there are no dangers or consequences that this presents. We dismiss this because we catch some obvious errors and over-generalize the error quality, but that just means we suffer from Murray Gell-Mann Amnesia. It's REALLY hard to measure what you don't know. Importantly, we can't even begin to resolve these issues and build the tools we want (the ones we pretend these are!) if we ignore the reality of what we have. You cannot make things better if you are unwilling to recognize their limitations.

Everyone here is an engineer, researcher, or builder. This framework of thinking should be natural to us! We should also be able to understand that there's a huge difference between critiques and limitations and dismissing things. I'm an AI critic, but also very optimistic. I'm a researcher and spending my life working on this topic. It'd be insane to do such a thing if I thought it was a fruitless or evil effort. But it would be equally insane to pursue a topic with pure optimism. If I were to blind myself to limits and paint everything as a trivial to solve problem, I'd never be able to solve any of those problems. Ignoring or dismissing technical issues and limitations is the domain of the MBA managers, not engineers.

mhuffman•8mo ago

Sam Altman must be literally vibrating at the thoughts of tacking on ads at the end of a "persuasive" interaction about whatever. "... and remember to try new Oreo-flavored Pringles and tell them Gippity sent you with this 20% off code, because we are best friends and we can trust each other!"

staindk•8mo ago

I'm not worried about blatant advertising like you put forth.

There's so much dirty subliminal or informal advertising that you can do with these things.

booleandilemma•8mo ago

I can't wait til I have to argue with my manager because I said one thing and the LLM said another thing.

baal80spam•8mo ago

It's already happening, I experienced this firsthand.

alpaca128•8mo ago

Obviously the solution is to use an LLM to argue with the manager, for increased productivity at the workplace /s

xqcgrek2•8mo ago

Politicians everywhere, remember this for your next campaign.

amelius•8mo ago

I guess our brightest minds will soon use them in advertisements, then.

make3•8mo ago

people have been using language AI for advertising since Adsense

marcosdumay•8mo ago

Well, that's exactly what we train them for. Persuading people they are right while recovering quiz-style facts from memory.

turbojeet•8mo ago

What will happen to jeets?

Aeyxen•8mo ago

It's always amusing to watch people act shocked when LLMs beat average humans at persuasion. The actual headline here should be: 'A system trained on terabytes of successful human persuasion is better at persuasion than a random person on a crowdwork platform.' No mystery—just the mechanics of scale and exposure.

But guess what? Now, finally, we can co-opt LLMs for things humans fumble: e.g., real-time conversational tutoring, adaptive negotiation agents, or even scalable personal 'bullshit detectors' as countermeasures. I hope conversation doesn't go into AI-Safeteyism and restricting LLMs and more about building stuff. Let's build, not block.

sebastiennight•8mo ago

As a marketer with a couple of decades of experience I can tell you that there is way more financial incentive in "slightly persuading [consumer] to tilt towards [product] in their next purchase, and spend more and earlier" than there ever will be towards "next-level unbiased tutor in anything".

The "super tutor" stuff that is always mentioned as the utopian outcome (along with "cures for cancer") is, unsurprisingly, never something being worked on by the person or lab quoting these examples.

I guess anything goes in B2B settings, but there is a valid reason to be cautious about these advances when it comes to mass-market consumer-facing applications.

Aeyxen•8mo ago

I understand your perspective as a marketer, but I think you're creating a false dichotomy. Yes, persuasion tech has stronger financial incentives, but that doesn't prevent beneficial applications from emerging simultaneously.

The "super tutor" isn't some distant fantasy - millions already use ChatGPT, Claude and similar tools daily for personalized learning. They're imperfect but genuinely helpful for programming, languages, math, and countless other topics.

Look at what happened with YouTube: millions of people transformed themselves into programmers, musicians, mechanics, and countless other professions through free video tutorials. Khan Academy revolutionized math education. Coursera and edX brought university courses to anyone with internet. This wasn't utopian thinking - it was practical technology solving real educational problems at scale.

What's different now is that LLMs enable the missing piece: personalization. The one-on-one adaptive experience that was previously limited to those who could afford human tutors at $50-100/hour is now available to anyone at negligible marginal cost.

Your skepticism about cancer applications too ignores the technological trajectory we've been on for decades. Just as YouTube and online platforms democratized education, technology has been steadily dismantling bottlenecks in medical research.

The human genome project initially cost $3 billion and took 13 years. Today you can sequence a genome for under $1,000 in days. This wasn't utopian thinking; it was technological progress following its natural course.

Think what LLMs will do here.

We Mourn Our Craft

I Write Games in C (yes, C)

Hoot: Scheme on WebAssembly

SectorC: A C Compiler in 512 bytes

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

The AI boom is causing shortages everywhere else

Al Lowe on model trains, funny deaths and working with Disney

The Waymo World Model

Reinforcement Learning from Human Feedback

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

A Fresh Look at IBM 3270 Information Display System

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

History and Timeline of the Proco Rat Pedal (2021)

Selection Rather Than Prediction

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Hackers (1995) Animated Experience

Making geo joins faster with H3 indexes

Sheldon Brown's Bicycle Technical Info

We Mourn Our Craft

I Write Games in C (yes, C)

Hoot: Scheme on WebAssembly

SectorC: A C Compiler in 512 bytes

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

The AI boom is causing shortages everywhere else

Al Lowe on model trains, funny deaths and working with Disney

The Waymo World Model

Reinforcement Learning from Human Feedback

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

A Fresh Look at IBM 3270 Information Display System

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

History and Timeline of the Proco Rat Pedal (2021)

Selection Rather Than Prediction

72M Points of Interest

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Hackers (1995) Animated Experience

Making geo joins faster with H3 indexes

Sheldon Brown's Bicycle Technical Info

LLMs are more persuasive than incentivized human persuaders

Comments