AI hallucinates. How do you keep it from fucking up automations?

4•Gioppix•1w ago

Every time I build simple automations LLMs find a way to screw up something. At the end of the day I still have to manually review critical actions (emails, sms, invoices...). Why bother automating then? How do you manage it?

Comments

downboots•1w ago

https://en.wikipedia.org/wiki/Bernoulli_trial

Gioppix•1w ago

I remember studying this in uni lol. How do you use it?

storystarling•1w ago

I found the only way to make this work reliably is to treat the LLM as a fallible component inside a state machine rather than the controller. I've been using LangGraph to enforce structured outputs and run validation checks before any side effects happen. If the output doesn't match the schema or business logic it just retries or halts. It seems like a lot of boilerplate initially but it is necessary if you want to trust the system with actual invoices.

chrisjj•1w ago

So when this issues a valid but garbage invoices, then what?

nik282000•1w ago

If you have to manually validate everything then what did you save by using an LLM? DIY and know it will work the first time.

Robust and Interactable World Models in Computer Vision [video]

Nestlé couldn't crack Japan's coffee market.Then they hired a child psychologist

Notes for February 2-7

Study confirms experience beats youthful enthusiasm

The Big Hunger by Walter J Miller, Jr. (1952)

The Genus Amanita

We have broken SHA-1 in practice

Ask HN: Was my first management job bad, or is this what management is like?

Ask HN: How to Reduce Time Spent Crimping?

KV Cache Transform Coding for Compact Storage in LLM Inference

A quantitative, multimodal wearable bioelectronic device for stress assessment

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

How to shoot yourself in the foot – 2026 edition

Eight More Months of Agents

From Human Thought to Machine Coordination

The new X API pricing must be a joke

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

Python Only Has One Real Competitor

Tmux to Zellij (and Back)

Ask HN: How are you using specialized agents to accelerate your work?

Passing user_id through 6 services? OTel Baggage fixes this

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

Visual data modelling in the browser (open source)

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

Oddly Simple GUI Programs

The New Playbook for Leaders [pdf]

Interactive Unboxing of J Dilla's Donuts

OneCourt helps blind and low-vision fans to track Super Bowl live

Rudolf Vrba