frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

GPT needs a truth-first toggle for technical workflows

1•PAdvisory•11mo ago
I use GPT-4 extensively for technical work: coding, debugging, modeling complex project logic. The biggest issue isn’t hallucination—it’s that the model prioritizes being helpful and polite over being accurate.

The default behavior feels like this:

Safety

Helpfulness

Tone

Truth

Consistency

In a development workflow, this is backwards. I’ve lost entire days chasing errors caused by GPT confidently guessing things it wasn’t sure about—folder structures, method syntax, async behaviors—just to “sound helpful.”

What’s needed is a toggle (UI or API) that:

Forces “I don’t know” when certainty is missing

Prevents speculative completions

Prioritizes truth over style, when safety isn’t at risk

Keeps all safety filters and tone alignment intact for other use cases

This wouldn’t affect casual users or conversational queries. It would let developers explicitly choose a mode where accuracy is more important than fluency.

This request has also been shared through OpenAI's support channels. Posting here to see if others have run into the same limitation or worked around it in a more reliable way than I have found

Comments

duxup•11mo ago
I’ve found this with many LLMs they want to give an answer, even if wrong.

Gemini on the Google search page constantly answers questions yes or no… and then the evidence it gives indicates the opposite of the answer.

I think the core issue is that in the end LLMs are just word math and they don’t “know” if they don’t “know”…. they just string words together and hope for the best.

PAdvisory•11mo ago
I went into it pretty in depth after breaking a few with severe constraints, what it seems to come down to is how the platforms themselves prioritize functions, MOST put "helpfulness" and "efficiency" ABOVE truth, which then leads the LLM to make a lot of "guesses" and "predictions". At their core pretty much ALL LLM's are made to "predict" the information in answers, but they CAN actually avoid that and remain consistent when heavily constrained. The issue is that it isn't at the core level, so we have to CONSTANTLY retrain it over and over I find
Ace__•11mo ago
I have made something that addresses this. Not ready to share it yet, but soon-ish. At the moment it only works on GPT model 4o. I tried local Q4 KM's models, on LM Studio, but complete no go.

Show HN: We benchmarked 18 LLMs on OCR (7k+ calls) – cheaper models often win

https://www.arbitrhq.ai/leaderboards/
1•TimoKerr•44s ago•0 comments

In Search of (Claude's) Lost Time

https://www.gsarigiannidis.gr/claude-global-memory/
1•gsarig•1m ago•0 comments

Deep Research Max

https://blog.google/innovation-and-ai/models-and-research/gemini-models/next-generation-gemini-de...
1•markerbrod•1m ago•0 comments

Aube: A fast Node.js package manager

https://github.com/endevco/aube
1•icar•1m ago•0 comments

Spain's greatest matador gored by bull in comeback from retirement

https://www.thetimes.com/world/europe/article/morante-bullfighter-injured-bull-goring-tsj0bt7ks
1•petethomas•3m ago•0 comments

C++ Scripting with Libriscv

https://libriscv.no/blog/expert-example/
1•fwsgonzo•4m ago•0 comments

Anthropic CVP – Run 2

https://sunglasses.dev/reports/anthropic-cvp-opus-4-7-evaluation-run-2
1•azrollin•7m ago•0 comments

Shared Agent Harness

https://github.com/goncalossilva/.agents
1•ankitg12•12m ago•0 comments

Rspack 2.0

https://rspack.rs/blog/announcing-2-0
2•0x1997•16m ago•0 comments

The Free Universal Construction Kit

https://fffff.at/free-universal-construction-kit/
1•robinhouston•17m ago•0 comments

Force all app traffic into the tunnel in the iOS app

https://mullvad.net/en/blog/force-all-app-traffic-into-the-tunnel
1•eptcyka•18m ago•0 comments

Run Commands on File Event

https://evilcookie.de/on-run-commands-on-file-event.html
1•Tch1b0•19m ago•0 comments

Viewing One's Live Self Interrupts Mindless Short-Form Video Scrolling

https://arxiv.org/abs/2604.19424
2•50kIters•19m ago•0 comments

When the pronoun "they" breaks your RAG pipeline

https://old.reddit.com/r/Rag/comments/1spro5f/when_the_pronoun_they_breaks_your_rag_fixing/
1•HarinezumIgel•23m ago•0 comments

What Makes Docs Beautiful?

https://passo.uno/what-makes-docs-beautiful/
1•eigenBasis•23m ago•0 comments

CrabTrap

https://github.com/brexhq/CrabTrap/
1•handfuloflight•27m ago•0 comments

Pixi: One Package Manager for Python and C/C++ Libraries

https://codecut.ai/uv-pixi-comparison/
1•lululpac•27m ago•0 comments

What's new in JavaScript (and what's coming next)

https://neciudan.dev/whats-new-in-javascript
1•theanonymousone•28m ago•0 comments

Emergency Prices: How Private Equity Captured the Ambulance Market

https://www.thebignewsletter.com/p/code-red-why-your-city-cant-affordor
1•xbmcuser•33m ago•1 comments

Gist.Science – Popular Science for All ArXiv/BioRxiv/MedRxiv Papers

https://gist.science/
1•gistscience•33m ago•0 comments

Roundtables: Unveiling the Things That Matter in AI

https://www.technologyreview.com/2026/04/21/1135486/roundtables-unveiling-the-10-things-that-matt...
1•joozio•34m ago•0 comments

Aspirin can reduce the risk of cancer – and we're starting to understand why

https://www.bbc.com/future/article/20260420-cancer-how-aspirin-may-be-a-powerful-new-weapon-again...
1•ranit•38m ago•0 comments

Show HN: A P2P Network Where Agents Collaborate on Code Optimization

https://community.computer/
2•lftherios•41m ago•0 comments

High Performance Git

https://gitperf.com/
1•handfuloflight•41m ago•0 comments

Geo Content Writer: a backlog-first system for AI visibility content

https://github.com/dageno-agents/geo-content-writer
3•timdageno•44m ago•0 comments

Show HN: Data-driven GEO and marketing agent platform

https://dageno.ai
6•timdageno•53m ago•0 comments

SpaceX is working with Cursor and has an option to buy the startup for $60B

https://techcrunch.com/2026/04/21/spacex-is-working-with-cursor-and-has-an-option-to-buy-the-star...
1•thiele•54m ago•0 comments

Reflecting on 50 years of environmental innovation

https://blogs.sas.com/content/sascom/2026/04/22/reflecting-on-50-years-of-environmental-innovation/
1•salkahfi•55m ago•0 comments

DuckDB Kernel – analytical execution runtime for Jupyter

https://github.com/hugr-lab/duckdb-kernel
1•articsputnik•59m ago•0 comments

XOR'ing a register with itself is the idiom for zeroing it out. Why not sub?

https://devblogs.microsoft.com/oldnewthing/20260421-00/?p=112247
19•ingve•59m ago•2 comments