frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Should LLMs have a "Candor" slider that says "no, that's a bad idea"?

2•mikebiglan•1h ago
I don’t want a “nice” AI. I want one that says: “Nope, that's a bad idea.”

That is, I want a "Candor" control, like temperature but for willingness to push back.

When candor is high, the model should prioritize frank, corrective feedback over polite cooperation. When candor is low, it can stay supportive, but with guardrails that flag empty flattering and warn about mediocre ideas.

Why this matters • Today’s defaults optimize for “no bad ideas.” That is fine for brainstorming, but it amplifies poor premises and rewards confident junk. • Sycophancy is a known failure mode. The model learns to agree which gets positive user signals which reinforce. • In reviews, product decisions, risk checks, etc, the right answer is often a simple “do not do that.”

Concrete proposal • candor (0.0 – 1.0): probability the model will disagree or decline when evidence is weak or risk is high. Or maybe it doesn't have to be literal "probability". • disagree_first: start responses with a plain verdict (for example “Short answer: do not ship this”) followed by rationale. • risk_sensitivity: boost candor if the topic hits serious domains such as security/finance/health/safety. • self_audit tag: append a note like “Pushed back due to weak evidence and downstream risk” that the user can see.

Examples • candor=0.2 - “We could explore that. A few considerations first…” (gentle nudge, still collaborative) • candor=0.8 + disagree_first=true - “No. This is likely to fail for X and introduces Y risk. If you must proceed, the safer alternative is A with guardrails B and C. Here is a minimal test to falsify the core assumption.”

What I would ship tomorrow • A simple UI slider with labels: Gentle to Direct • A toggle: “Prefer blunt truth over agreeable help” • A warning chip when the model detects flattery without substance: “This reads like praise with low evidence.”

Some open questions • How to avoid needless rudeness while preserving clarity (tone vs content separation)? • What is the right metric for earned praise (citation density, novelty, constraints)? • Where should the risk sensitivity kick in automatically vs be user controlled?

If anyone has prototyped this, whether some prompt injection or an RL signal, I'd love to see it.

Comments

Terr_•1h ago
This seems like asking for them to "just be more correct" except with extra steps.

I'm sure you can get them to choose words and phrases that we associate with "candor", but before they can gently correct you with something truthful, they actually need to know truth.

sim7c00•1h ago
its not really. currently they are so eager to please they will love your bad idea and help u implement it wonderfully. thats different then being wrong. they are not wrong in giving the right solution to the wrong question.
mikebiglan•1h ago
This isn't about correctness. And it has a pretty good idea if you ask it in the right way, it can evaluate if it thinks the idea is good, but sometimes that's on autopilot.
sim7c00•1h ago
you can basically put instructions into any LLM which will make it a dick who belittles you and makes fun of your bad ideas. spells out each time why they are bad and how bad they are.

also to have it say mean things or send u dont the wrong way on purpose if you ask it lazy questions. :')

the fact there is ppl who do this might both be a source of answers for you and an indication its maybe not a bad idea entirely.

mikebiglan•1h ago
But wait, i asked ChatGPT and it told me this candor idea was a good one??!?
joules77•13m ago
Don't ask it if an idea is good or bad.

Ask it to show you weaknesses, missing pieces or blind spots.

Malicious-Looking URL Creation Service

https://www.schneier.com/blog/archives/2025/09/malicious-looking-url-creation-service.html
1•mathattack•2m ago•0 comments

Understanding, not slop, is what's interesting about LLMs

https://blakewatson.com/journal/understanding-not-slop-is-whats-interesting-about-llms/
1•blakewatson•10m ago•1 comments

QNX Resource Manager in Rust

https://membarrier.wordpress.com/2025/09/28/qnx-resource-manager-in-rust-message-passing-and-reso...
1•fork-bomber•14m ago•0 comments

Criminals offer reporter money to hack BBC

https://www.bbc.co.uk/news/articles/c3w5n903447o
2•fork-bomber•19m ago•0 comments

The Elevator Is Slow

https://www.lujingcen.com/blog/2024/04/24/the-elevator-is-slow/
1•brunocvcunha•30m ago•0 comments

Ask HN: How do you say “I don’t know, but I’ll get back to you” confidently?

2•AbstractH24•41m ago•4 comments

How to Optimize Rust for Slowness [video]

https://www.youtube.com/watch?v=ec-ucXJ4x-0
1•pykello•43m ago•0 comments

Popular Reinforcement Learning algorithms and their implementation (2023)

https://pub.aimind.so/popular-reinforcement-learning-algorithms-and-their-implementation-7adf0e09...
1•downboots•45m ago•0 comments

Show HN: Reddit browser for MCP clients – works with any AI assistant

https://github.com/karanb192/reddit-mcp-buddy
4•karanb192•49m ago•0 comments

Global Startup Unicorns

https://matrices.com/share/table/c81500e1-85b9-42a9-8a8b-a26a0c96dd82
1•Olshansky•56m ago•0 comments

We Put Bosch's Revolutionary Tiny PM Sensor Through Its Paces

https://www.airgradient.com/blog/testing-the-bosch-bmv080/
2•biotinker•58m ago•1 comments

Nph – An opinionated code formatter for Nim

https://github.com/arnetheduck/nph
1•TheWiggles•59m ago•0 comments

AI Tinkerbell

https://jesseduffield.com/AI-Tinkerbell/
1•jesseduffield•1h ago•0 comments

How AI is changing the office

https://www.economist.com/business/2025/09/25/how-ai-is-changing-the-office
3•petethomas•1h ago•0 comments

Slop Machines: on the interaction between feed recommender systems and GenAI

https://notes.hella.cheap/slop-machines.html
5•todsacerdoti•1h ago•0 comments

How many times your database backup was corrupted AFTER starting a migration?

https://www.backupguardian.org
1•neural_drift•1h ago•1 comments

Tall Tales: Tangent-Aligned Text Stretching

https://www.davepagurek.com/programming/stretch-text/
1•surprisetalk•1h ago•0 comments

Ask HN: Should LLMs have a "Candor" slider that says "no, that's a bad idea"?

2•mikebiglan•1h ago•6 comments

A new look at how the brain works reveals that wiring isn't everything

https://medicalxpress.com/news/2025-09-brain-reveals-wiring-isnt.html
2•bookmtn•1h ago•0 comments

I used AI to spot my skin cancer – it saved my life

https://www.independent.co.uk/news/health/ai-skin-cancer-nhs-melanoma-b2835223.html
3•wahvinci•1h ago•0 comments

We ran Claude Code in a while loop

https://twitter.com/simonfarshid/status/1972472636195406166
2•sfarshid•1h ago•0 comments

'Large Language Muddle: it's okay to be a Luddite '

https://www.nplusonemag.com/issue-51/the-intellectual-situation/large-language-muddle/
3•MilnerRoute•1h ago•0 comments

The Self-Appointed Artist Residency

https://amystewart.substack.com/p/the-self-appointed-artist-residency
1•zacharykai•2h ago•0 comments

This App helps you to delete temporary screenshots automatically

https://play.google.com/store/apps/details?id=com.markOne.ss_app&hl=en_US
1•tehleelmir•2h ago•0 comments

Would you trust Google to remain committed to Android on laptops and desktops?

https://www.osnews.com/story/143417/would-you-trust-google-to-remain-committed-to-android-on-lapt...
3•pabs3•2h ago•1 comments

Ask HN: How to circumvent the employee monitoring software-Appolye

1•Smith_moor•2h ago•0 comments

Zero ASIC releases Wildebeest, the highest performance FPGA synthesis tool

https://www.zeroasic.com/blog/wildebeest-launch
57•stefanpie•2h ago•7 comments

Show HN: BlueApex – easiest way to share and create maps

https://blueapex.pro/
3•PhysicalDevice•2h ago•0 comments

Stealing Debug Pretty Print from Vitis HLS

https://stefanabikaram.com/writing/vitis-hls-debug-pretty-print/
1•stefanpie•2h ago•0 comments

California's Failed $2.2B Ivanpah Solar Power Facility Is Shutting Down

https://www.sierradailynews.com/state/californias-failed-2-2b-ivanpah-solar-power-facility-is-shu...
6•kaycebasques•2h ago•2 comments