frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Prompt Politeness Affects LLM Accuracy (2025)

https://arxiv.org/abs/2510.04950
32•KnuthIsGod•1d ago

Comments

331c8c71•56m ago
Interesting.

I am wondering why would anyone use a t-test when the experiment is clearly modelled by a binomial distribution: 250 independent questions and each one is either answered correctly or not (the null is that the success rate is the same).

plewd•40m ago
I don't know much about stats, but does "the null is that the success rate is the same" imply that it's a sketchy methodology because they can come up with some findings ("ruder prompts are better/worse!") more often?
jampekka•17m ago
That's the usual null hypothesis for these kinds of tests.
331c8c71•4m ago
You are asking about one-sided vs two-sided tests. Not really "more often" because formal type 1 error rate is still the same. I'd say two-sided tests leave more space for post-hoc theorizing but there are valid situations when there is no clear one-sided hypothesis a priori. Do we really know whether that the hypothesis should have been "ruder prompts are worse"?

I'd say this is benign compared to other ways of (mis)using statistics e.g. looking which way the difference goes and then running one-sided tests or tweaking the setup until one gets "significant" p vals.

jampekka•29m ago
The methods could be better described in the paper, but my understanding is that they did 10 runs for each question for each prompt and took an average of those, so the compared values are not binary. You could do a sign test, but you'd lose power and answer a bit different question.
freehorse•5m ago
You can do a generalised mixed effects linear model with binomial outcome (ie a binomial test but with added random effects structure). But unless you want to introduce a richer random effects structure with more variables, it is overkill and overcomplicating things, and the result should be the same as t-tests.
dude250711•50m ago
I have an idea: let's use these things for autonomous software engineering.
faize•43m ago
Remember to always say "please" and "thank you" when planning a critical system
eigenspace•41m ago
Please remember to always say "please" and "thank you" when planning a critical system. Thank you!
theanonymousone•38m ago
I have always said please and thank you to LLMs, not because of accuracy or because I'm stupid. I believe it is more about me than about the LLM, and this is anyway a habit I don't want to lose.
jkarni•34m ago
Thomas Aquinas believed cruelty to animals was wrong not because animals have souls (and with that all the standard moral rights), but because it can teach us cruelty to other humans.
pfortuny•6m ago
Snarky morning: "spiritual souls" as opposed to "mere animal souls". Sorry, could not control myself.
niek_pas•23m ago
Genuine question: do you add 'please' and 'thank you' to Google searches? If not, what sets them apart?
perching_aix•23m ago
Google searches being keyword based, rather than simulated conversations?

The same reason you wouldn't put in an entire actual question/sentence, unless you either don't know how to use Google, are pissed off, or have an actual reason to suspect that it would yield proper hits (e.g. looking up an excerpt).

spiderfarmer•19m ago
Google isn’t conversational.
gum_wobble•11m ago
Genuine question: do you write Google search queries in natural language?
TimCTRL•35m ago
i only say please and thank you such that when the robots finally take over, they will remember i was nice to them.
octocop•29m ago
it seems they will remember that you wasted tokens for no reason and punish you instead.
emil-lp•24m ago
Tokens are their food, it's literally what keeps them alive.

Not feeding them tokens is neglect.

I try to feed them a healthy diet.

polytely•16m ago
it sort of makes sense to me, when asking a question to an expert in the field while you are a student. I would guess the successful interactions on average would be more polite . Like for example if you were asking a question to donald knuth or terrence tao, you'd probably be polite while doing so. Being hostile while asking questions gets you into forum discussion territory.
robinhouston•4m ago
> Contrary to expectations, impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts.
dSebastien•6m ago
I guess it makes sense since we as humans tend to be far less inclined to help someone who is not polite/is not friendly, so that "bias" is part of the training data, thus influences how LLMs function
robinhouston•5m ago
> Contrary to expectations, impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts.

The Melancholy of Slaying Monsters

https://thereader.mitpress.mit.edu/the-strange-melancholy-of-slaying-monsters/
52•prismatic•13h ago•9 comments

Cloudflare Flagship

https://developers.cloudflare.com/flagship/
209•tjek•9h ago•102 comments

What Gets Kept

https://www.newyorker.com/culture/the-weekend-essay/what-jack-kerouac-left-behind
23•lermontov•2d ago•6 comments

BadHost – CVE-2026-48710: Starlette Host-Header Auth Bypass

https://badhost.org/
41•ylk•23h ago•8 comments

Prompt Politeness Affects LLM Accuracy (2025)

https://arxiv.org/abs/2510.04950
34•KnuthIsGod•1d ago•23 comments

That Methyl Methacrylate Tank

https://www.science.org/content/blog-post/methyl-methacrylate-tank
326•nooks•13h ago•127 comments

Cate v1.0 is out: The Infinite canvas workspace for developers

https://github.com/0-AI-UG/cate
36•BlueBerry2001•1d ago•19 comments

A few interesting modern pixel fonts

https://unsung.aresluna.org/a-few-interesting-modern-pixel-fonts/
344•zdw•1d ago•70 comments

The worst job interview I ever had

https://www.oliverio.dev/blog/the-worst-job-interview-i-had
268•oliverio•12h ago•222 comments

I built a Git-tracked book production pipeline

https://www.djspeckhals.com/posts/2026-05-22-how-i-bypassed-adobe-and-microsoft-to-build-a-git-tr...
233•dustin1114•4d ago•59 comments

A history of obituaries in American newspapers

https://blogs.loc.gov/headlinesandheroes/2026/05/mourn-not-a-history-of-obituaries-in-american-ne...
20•NaOH•2d ago•0 comments

TSDuck: Open-source toolkit for MPEG-TS analysis and manipulation

https://tsduck.io/
25•phantomathkg•6h ago•1 comments

Claude Code as a Daily Driver: Claude.md, Skills, Subagents, Plugins, and MCPs

https://arps18.github.io/posts/claude-code-mastery/
34•arps18•3h ago•5 comments

IBM Confidential: System/360 File Organization [video]

https://www.youtube.com/watch?v=zokKqP0plrM
39•DaiPlusPlus•2d ago•11 comments

A portentous reunion

https://bcantrill.dtrace.org/2026/05/25/a-portentous-reunion/
99•cafkafk•1d ago•26 comments

Show HN: Posthorn, self-hosted mail without the mail server

https://github.com/craigmccaskill/posthorn
23•craigmccaskill•4h ago•15 comments

Unicode 18.0.0 Beta

https://www.unicode.org/versions/Unicode18.0.0/
11•birdculture•1h ago•1 comments

The Structural Barriers to AI Lawyers

https://www.diffuseai.pub/p/the-structural-barriers-to-ai-lawyers
41•benbreen•5d ago•47 comments

What I've Learned (So Far) Building Online Mini Games with Elixir and Swift

https://calvinflegal.com/2026/05/24/what-ive-learned-so-far-building-online-mini-games-with-elixi...
48•calflegal•2d ago•20 comments

Launch HN: Minicor (YC P26) – Windows desktop automations at scale

https://www.minicor.com/
87•fchishtie•18h ago•54 comments

Rosalind: A genomics toolkit in Rust running whole-genome pipelines on a laptop

https://github.com/logannye/rosalind
154•samuell•5d ago•40 comments

Spain blocks prediction markets Polymarket, Kalshi over lack of gambling licence

https://www.reuters.com/business/spain-blocks-prediction-markets-polymarket-kalshi-over-lack-gamb...
905•thm•19h ago•419 comments

Tunecat: Simple Internet Radio

https://codeberg.org/lindenii/tunecat/
49•croottree•7h ago•3 comments

C array types are weird

https://anselmschueler.com/blogposts/2025-c-pointers/
81•signa11•2d ago•81 comments

Seeking a Language in Mathematics 1523-1571

https://tyndale.org/journals/reformj01/bmarsden.html
3•jruohonen•3d ago•1 comments

Dropbox CEO Drew Houston to step down

https://www.cnbc.com/2026/05/26/dropbox-ceo-drew-houston-ashraf-alkarmi.html
344•aghuang•19h ago•368 comments

Nvidia Vera CPU Benchmarks: Olympus Cores Delivering Great Performance

https://www.phoronix.com/review/nvidia-vera-benchmarks
7•naves•42m ago•0 comments

The Steinwinter Supercargo

https://www.thedrive.com/article/12603/the-forgotten-steinwinter-supercargo-is-unlike-anything-on...
68•itronitron•3d ago•19 comments

The Forgotten Art of the LAN Party (2023)

https://www.superjumpmagazine.com/the-forgotten-art-of-the-lan-party/
121•susam•3d ago•46 comments

Splinter Cell veteran says realistic modern lighting has screwed up stealth game

https://www.rockpapershotgun.com/splinter-cell-veteran-says-realistic-modern-lighting-has-screwed...
71•Tomte•2d ago•46 comments