HalluciGuard – open-source middleware to detect and mitigate LLM hallucinations

https://github.com/Hermes-Lekkas/HalluciGuard

2•Hermes_dev•1h ago

Comments

Hermes_dev•1h ago

  Hi HN,


  Hallucinations remain the single biggest bottleneck for moving LLM applications from "cool demo" to "reliable production." Whether it’s a RAG pipeline inventing
  citations or an autonomous agent fabricating data, the lack of a reliable "truth layer" is costing companies billions in trust and market cap.


  I’m sharing HalluciGuard, an open-source middleware layer (AGPLv3) designed to act as a security and reliability buffer between LLM providers and your end-users.

  GitHub: https://github.com/Hermes-Lekkas/HalluciGuard (https://github.com/Hermes-Lekkas/HalluciGuard)


  How it works
  Instead of just hoping the model is right, HalluciGuard intercepts the LLM response and runs it through a multi-signal verification pipeline:


   1. Factual Claim Extraction: It uses lightweight LLMs to break down a response into discrete, verifiable factual claims.
   2. Multi-Signal Scoring: It evaluates each claim using four distinct signals:
       * Self-Consistency: LLM-as-a-judge verification.
       * Linguistic Heuristics: Detecting "uncertainty language" and high-risk patterns.
       * RAG Cross-Reference: Verifying claims directly against your retrieved documents.
       * Web Verification: Optionally pulling real-time snippets from search engines (like Tavily).
   3. Risk Flagging: It returns a comprehensive GuardedResponse with an overall "Trust Score" and flags specific claims as SAFE, MEDIUM, or CRITICAL risk.


  Key Features
   * Provider Agnostic: Native support for OpenAI (GPT-5.x), Anthropic (Claude 4.6), Google (Gemini 3.1), and local models via Ollama.
   * OpenClaw Integration: We’ve built a native interceptor for the OpenClaw agent framework, allowing you to monitor agent actions and thoughts in real-time.
   * Streaming Support: Performs analysis asynchronously so you don't lose the "real-time" feel of streaming responses.
   * Cost-Optimization Cache: We cache verification results locally to reduce your API bills by avoiding redundant checks for common facts.
   * LangChain Ready: Includes a drop-in CallbackHandler for existing LangChain projects.

  Why AGPLv3?
  We believe the "Truth Layer" of the AI stack should be owned by the community, not hidden behind a proprietary corporate API.


  What’s Next?
  We are currently working on "Lookahead Verification" (v0.9), which will attempt to auto-correct hallucinations during the token generation phase before the user
  ever sees them.


  I'd love to get the community's feedback on our scoring heuristics and hear about the edge cases you're seeing in production.

Happy to answer any technical questions about the architecture or the benchmark results we've seen so far!

verdverm•1h ago

The feedback I have is that HN is not real interested in projects that are started less than an hour ago (see git history), with an even newer HN account and submission.

eBay buys Depop for $1.2B in effort to lure younger shoppers

I Let Claude Read My Email

The Unbearable Weight of Cruft

Cybernetic practices for design research pedagogy (2023)

Show HN: Routype – typed REST client in ~200 lines, no codegen

Irish man detained by ICE [Update] – It's not what it seems

Agent Compromised by Agent to Deploy an Agent

DHS Admits Its Website the 'Worst of the Worst' Immigrants Was Rife with Errors

The Stanford Emerging Technology Review 2026 [pdf]

How to Die Optimally – A Theory of Consumption When AI Takes Your Job

ATAboy is a USB adapter for legacy CHS only style IDE (PATA) drives

Your tech or my tech: make up your mind quickly (2024)

Show HN: Murl – Curl for MCP Servers

Fork, Explore, Commit: OS Primitives for Agentic Exploration

Show HN: Are – Rule engine for JavaScript, C#, and Dart with playground

Show HN: AI Council – multi-model deliberation that runs in the browser

The decline of single-earner housebuyers in America

Fediverse Discovery Providers

Org Structure Is My Opportunity

Google Lyria 3: Create custom tracks for any moment

AI dev tool power rankings and comparison [Feb. 2026]

Show HN: Natural language search across Kalshi and Polymarket (API and MCP)

Piantor Pro Review: My RSI Journey and Switching to a 36-Key Keyboard

Show HN: Open a Linux Container (for Mac)

Flexport's take on the Supreme Court ruling on tariffs: What's next? Refunds?

The Russian village that lost its men to war

The enviromental impact of using LLMs for writing code

Xkcd: Suspicion

TikToker Khaby Lame's $975M deal is riding on a crashing stock

The Quest for Clean Cargo