Hi HN — I’m David, a theatre nurse in the UK. I built this after noticing a recurring problem when using AI systems to analyse policy, governance and regulatory documents.
LLMs often produce fluent answers that look correct but include things that are not actually stated in the source document. Sometimes they infer missing elements (actors, actions, enforcement mechanisms), and sometimes they silently omit parts of the document structure. In governance or legal contexts that matters, because the difference between what a document states and what a model assumes can have evidentiary implications.
This project experiments with a simple idea: before an LLM reasons about a document, a deterministic gate checks whether the structural components needed for interpretation are explicitly present.
The gate looks for things like:
• Actor — who is responsible
• Action — what must be done
• Conditions / dependencies
• Outcomes / consequences
If these are missing, the system records that absence instead of letting the model silently infer it. The result is an admissibility record that shows exactly what the document explicitly contains and what it does not.
Each element in the output is labelled as one of four types:
• Grounded — directly supported by the document
• Citable — external statute or regulation
• Inferred — logically implied but not stated
• Absent — the document does not contain this information
There are two LLM layers involved in the demo:
• A locally running Llama-3B model handles relatively basic tasks inside the pipeline (segmentation and structural checks).
• A hosted Anthropic model is used as the descriptive layer that converts the structured record into readable output.
Anthropic usage is relatively expensive for me, (note: please be kind and not go crazy with my credits) so the system will automatically fall back to full Llama-3B output if that credit runs out.
The system is designed primarily for authoritative texts such as governance documents, legislation, regulatory notices, and policies.
If you want something to test it with, this UK ICO enforcement notice tends to work well:
It’s long, structured, and contains a mix of explicit commitments and contextual explanation, which tends to highlight the difference between grounded statements and inferred ones.
Curious what people here think about the idea of structural gating before LLM inference for document analysis. Happy to answer questions about the architecture or the reasoning behind it.
davidtome•2h ago
LLMs often produce fluent answers that look correct but include things that are not actually stated in the source document. Sometimes they infer missing elements (actors, actions, enforcement mechanisms), and sometimes they silently omit parts of the document structure. In governance or legal contexts that matters, because the difference between what a document states and what a model assumes can have evidentiary implications.
This project experiments with a simple idea: before an LLM reasons about a document, a deterministic gate checks whether the structural components needed for interpretation are explicitly present.
The gate looks for things like: • Actor — who is responsible • Action — what must be done • Conditions / dependencies • Outcomes / consequences
If these are missing, the system records that absence instead of letting the model silently infer it. The result is an admissibility record that shows exactly what the document explicitly contains and what it does not.
Each element in the output is labelled as one of four types: • Grounded — directly supported by the document • Citable — external statute or regulation • Inferred — logically implied but not stated • Absent — the document does not contain this information
There are two LLM layers involved in the demo:
• A locally running Llama-3B model handles relatively basic tasks inside the pipeline (segmentation and structural checks). • A hosted Anthropic model is used as the descriptive layer that converts the structured record into readable output.
Anthropic usage is relatively expensive for me, (note: please be kind and not go crazy with my credits) so the system will automatically fall back to full Llama-3B output if that credit runs out.
The system is designed primarily for authoritative texts such as governance documents, legislation, regulatory notices, and policies.
If you want something to test it with, this UK ICO enforcement notice tends to work well:
https://ico.org.uk/media2/xfbl1uaa/lastpass-uk-ltd-penalty-n...
It’s long, structured, and contains a mix of explicit commitments and contextual explanation, which tends to highlight the difference between grounded statements and inferred ones.
Curious what people here think about the idea of structural gating before LLM inference for document analysis. Happy to answer questions about the architecture or the reasoning behind it.