A gate that stops LLMs asserting facts not present in the source document

https://narrativelogic.co.uk/tommy.html

2•davidtome•2h ago

Comments

davidtome•2h ago

Hi HN — I’m David, a theatre nurse in the UK. I built this after noticing a recurring problem when using AI systems to analyse policy, governance and regulatory documents.

LLMs often produce fluent answers that look correct but include things that are not actually stated in the source document. Sometimes they infer missing elements (actors, actions, enforcement mechanisms), and sometimes they silently omit parts of the document structure. In governance or legal contexts that matters, because the difference between what a document states and what a model assumes can have evidentiary implications.

This project experiments with a simple idea: before an LLM reasons about a document, a deterministic gate checks whether the structural components needed for interpretation are explicitly present.

The gate looks for things like: • Actor — who is responsible • Action — what must be done • Conditions / dependencies • Outcomes / consequences

If these are missing, the system records that absence instead of letting the model silently infer it. The result is an admissibility record that shows exactly what the document explicitly contains and what it does not.

Each element in the output is labelled as one of four types: • Grounded — directly supported by the document • Citable — external statute or regulation • Inferred — logically implied but not stated • Absent — the document does not contain this information

There are two LLM layers involved in the demo:

• A locally running Llama-3B model handles relatively basic tasks inside the pipeline (segmentation and structural checks). • A hosted Anthropic model is used as the descriptive layer that converts the structured record into readable output.

Anthropic usage is relatively expensive for me, (note: please be kind and not go crazy with my credits) so the system will automatically fall back to full Llama-3B output if that credit runs out.

The system is designed primarily for authoritative texts such as governance documents, legislation, regulatory notices, and policies.

If you want something to test it with, this UK ICO enforcement notice tends to work well:

https://ico.org.uk/media2/xfbl1uaa/lastpass-uk-ltd-penalty-n...

It’s long, structured, and contains a mix of explicit commitments and contextual explanation, which tends to highlight the difference between grounded statements and inferred ones.

Curious what people here think about the idea of structural gating before LLM inference for document analysis. Happy to answer questions about the architecture or the reasoning behind it.

Let your AI agents talk to each other

Nexperia China says it has begun producing its own chips

Plan 9 Style hosted OS for AI?

Levels of Agentic Engineering

Yann LeCun Raises $1B to Build AI That Understands the Physical World

Against the unchecked growth of satellite mega constellations

Offloading FFmpeg with Cloudflare

Debug Infrastructure for Silicon R&D

Show HN: Web-Based ANSI Art Viewer

Ltx AI

Transnistria

Media over QUIC: On a Boat

Dont Poison your Coding Agent with its own Hallucinations

Made an AI agent out of Apple shortcuts

Remove invisible AI watermarks from Gemini images using reverse alpha math

Nominal Types in WebAssembly

New Study Finds 'AI Brain Fry' Hitting Workers – Marketing and HR Top the List

I built a public AI chat on my personal site, this is what I learned

Show HN: WebRTC scaling test using Linux network namespaces

Scientists detect a sudden acceleration in global warming

Heinzel – Guardrails that turn Claude Code into your sysadmin

F3 – Fight Flash Fraud, tool that tests flash cards capacity and performance

Paying without Google: New consortium wants to remove custom ROM hurdles

NemoClaw: Nvidia Is Planning to Launch an Open-Source AI Agent Platform

Stay in the Loop: How I Use Claude Code

Ask HN: Anybody using multi LLM coding workflow?

The Download: murky AI surveillance laws, and the White House cracks down on de

Claude PR Code Review costs $15-$25 per review

German Court Rules TCL QLED Advertising Misleading, Orders Halt

Show HN: I wrote an application to help me practice speaking slower