frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

AI hallucinate. Do you ever double check the output?

6•jackota•1h ago
Been building AI workflows and then randomly hallucinate and do something stupid so I end up manually checking everything anyway to approve the AI generated content (messages, emails, invoices,ecc.), which defeats the whole point.

Anyone else? How did you manage it?

Comments

AlexeyBrin•1h ago
You can't 100% be sure the AI won't hallucinate. If you don't want to manually check it, you can have a different AI check it and if it finds something suspect flag it for a human to verify it. Even better have 2 different AIs check the output and if they don't agree flag it.
Gioppix•1h ago
I also don't trust LLMs, but I still find automations useful. Even with human-in-the-loop they save a bunch of time. Clicking "Approve & Send" is much quicker than manually writing out the email, and I just rewrite the 5% that contains hallucinations.
Zigurd•1h ago
You have put your finger on why agent assisted coding often doesn't suck, and other use cases of LLMs often do suck. Lint and the compiler get there licks in before you even smoke test the code. There aren't two layers of deterministic, algorithmic checking for your emails or invoices.

So before anyone concludes that coding agents prove that AI can be useful, find some use cases with similar characteristics.

7777777phil•1h ago
I have been building Research automation with LangGraph for the past 2 months. We always put a human in the loop checkpoint after each critical step, might be annoying now but I think it will save us long-term.
codingdave•1h ago
Ever? More like always. Keeping humans in the loop is the current best practice. If you truly need to automate something that cannot afford a human checkpoint, find a deterministic solution for it, not LLMs.
varshith17•36m ago
Build validation layers, not trust. For structured outputs (invoices, emails), use JSON schemas + fact-checking prompts where a second AI call verifies critical fields against source data before you see it. Real pattern: AI generates → automated validation catches type/format errors → second LLM does adversarial review ("check for hallucinated numbers/dates") → you review only flagged items + random samples. Turns "check everything" into "check exceptions," cuts review time 80%.
casualscience•17m ago
Also lets 50% of errors through
exabrial•29m ago
The new guys on my team do not check it. They already had problems checking their work, AI is just amplifying the actual human problem.

AI hallucinate. Do you ever double check the output?

6•jackota•1h ago•8 comments

Ask HN: How do you find the "why" behind old code decisions?

27•siddhibansal9•18h ago•31 comments

How do I make $10k (What are you guys doing?)

22•b_mutea•6h ago•37 comments

Ask HN: What AI feature looked in demos and failed in real usage? Why?

8•kajolshah_bt•6h ago•3 comments

Locked out of my GCP account for 3 days, still charged, can't redirect domain

7•lifeoflee•5h ago•2 comments

Ask HN: Do you have any evidence that agentic coding works?

442•terabytest•3d ago•448 comments

Ask HN: How realistically far are we from AGI?

2•HipstaJules•4h ago•4 comments

Ask HN: What's the current best local/open speech-to-speech setup?

5•dsrtslnd23•6h ago•0 comments

Ask HN: What 'AI feature' created negative ROI in production?

5•kajolshah_bt•7h ago•2 comments

Ask HN: Does DDG no longer honor "site:" prefix?

18•everybodyknows•16h ago•6 comments

Tell HN: 2 years building a kids audio app as a solo dev – lessons learned

134•oliverjanssen•2d ago•75 comments

Tell HN: Cursor agent force-pushed despite explicit "ask for permission" rules

6•xinbenlv•12h ago•7 comments

Ask HN: Best practice securing secrets on local machines working with agents?

8•xinbenlv•1d ago•11 comments

Ask HN: Why are so many rolling out their own AI/LLM agent sandboxing solution?

30•ATechGuy•2d ago•12 comments

Why is software still built like billions don't exist in 2026?

8•yerushalayim•7h ago•8 comments

Ask HN: Is Claude Down for You?

26•philip1209•19h ago•19 comments

Ask HN: COBOL devs, how are AI coding affecting your work?

168•zkid18•4d ago•183 comments

Ask HN: How do you authorize AI agent actions in production?

5•naolbeyene•1d ago•4 comments

Ask HN: What is your opinion on non-mainstream mobile OS options (e.g. /e/OS)?

5•sendes•1d ago•3 comments

Ask HN: Have you managed to switch to Bluesky for tech people?

9•fuegoio•18h ago•10 comments

Ask HN: What's the best virtual Linux desktop experience on macOS for devs?

7•darkteflon•19h ago•4 comments

Ask HN: Revive a mostly dead Discord server

20•movedx•2d ago•28 comments

Tell HN: Drowning in information but still missing everything

10•akhil08agrawal•1d ago•8 comments

Tell HN: We have not yet discovered the rules of vibe coding

4•0xbadcafebee•16h ago•0 comments

From Sketch to Masterpiece: Understanding Stable Diffusion Img2Img

2•bozhou•12h ago•0 comments

Ask HN: Modern test automation software (Python/Go/TS)?

7•rajkumar14•21h ago•3 comments

Ask HN: How do you verify cron jobs did what they were supposed to?

6•BlackPearl02•1d ago•9 comments

Ask HN: Is there any good open source model with reliable agentic capabilities?

5•baalimago•1d ago•1 comments

Ask HN: Does "Zapier for payment automation" exist?

8•PL_Venard•2d ago•13 comments

Tell HN: Claude session limits getting small

23•pragmaticalien8•2d ago•15 comments