frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Director-AI – token-level NLI+RAG

https://github.com/anulum/director-ai
1•anulum•1h ago
Hey HN,

After watching too many agents confidently lie in production, I built Director-AI.

It sits between your LLM and the user, scoring every generated token with: • 0.6× DeBERTa-v3 NLI (contradiction detection) • 0.4× RAG against your own ChromaDB knowledge base

If coherence < threshold → Rust kernel halts the stream before the token is sent.

Key technical bits: • Works with any OpenAI-compatible endpoint (Ollama, vLLM, llama.cpp, Groq, OpenAI, Claude…) • StreamingKernel + windowed scoring • GroundTruthStore.add() for easy fact ingestion • Dual licensing: AGPL open + commercial (closed-source/SaaS OK)

Honest AggreFact numbers inside (66.2% balanced acc with streaming enabled). Not claiming SOTA on static NLI — the value is in the live gating + custom KB system.

Repo + full examples: https://github.com/anulum/director-ai

Would love feedback on the scoring weights, halt logic, or kernel design. What hallucination problems are you solving today?

Comments

soletta•1h ago
Sounds interesting. What makes DeBERTA + RAG any better than detecting contradictions in the context than a frontier LLM, and why? I see that the NLI scorer itself was evaluated, but I’d love to see data about how the full system performs vs SotA if you have any on hand.
anulum•1h ago
@soletta Great question — this is exactly why we built it this way.

*Short answer*: frontier LLMs are excellent at static self-critique, but terrible for *real-time token-by-token streaming guardrails* because of latency, cost, and lack of persistent custom memory.

*Why DeBERTa + RAG wins here*: - *Latency*: DeBERTa-v3-base + Rust kernel scores every ~4 tokens in ~220 ms (AggreFact eval). A frontier LLM call (GPT-4o/Claude 3.5) is 400–2000 ms per check. You can’t do that mid-stream without killing UX. - *Cost*: Frontier self-checking at scale = real money. This runs fully local/offline after the one-time model download. - *Custom knowledge*: The 0.4× RAG weight pulls from your GroundTruthStore (ChromaDB). Frontier models don’t have a live, updatable external fact base unless you keep stuffing context (expensive + context-window limited). - *Determinism & auditability*: Small fine-tunable NLI model + fixed vector DB = reproducible decisions. LLMs-as-judges are stochastic and hard to debug in prod.

We’re completely transparent: the NLI scorer alone is *not SOTA* (66.2% balanced acc on LLM-AggreFact 29k samples — see full table vs MiniCheck/Bespoke/HHEM in the repo). The value is the live system: NLI + user KB + actual streaming halt that no one else ships today.

Full end-to-end comparisons vs. LLM-as-judge in streaming setups are next on the roadmap (happy to run them on any dataset you care about).

Have you tried frontier self-critique in real streaming agents? What broke for you?

Repo benchmarks: https://github.com/anulum/director-ai#benchmarks

Will A.I. Take Away Our Basic Skills?

https://paperrobots.substack.com/p/will-ai-take-away-our-basic-skills
1•NomNew•1m ago•0 comments

Show HN: Free online audio translator that translates voice instantly

https://audioconvert.ai/audio-translator
1•Katherine603•2m ago•0 comments

Plugin to give Claude Code perception (screen, system audio and mic context)

https://twitter.com/ashu_trv/status/2026296815860203888/
1•ash-ishh•3m ago•0 comments

Show HN: Squidy – How I stopped losing AI agent context mid-project

https://rendernet.com.br/squidyrun/
1•marcfox182•8m ago•0 comments

Show HN: Easyemailfinder.com (5 Free Credits)

https://easyemailfinder.com
1•faalbane•12m ago•0 comments

The Internet Was Weeks Away from Disaster and No One Knew [video]

https://www.youtube.com/watch?v=aoag03mSuXQ
1•trinsic2•17m ago•1 comments

Tesla Lab – 20 computational experiments

https://github.com/consigcody94/tesla-lab
1•sentinelowl•19m ago•1 comments

Show HN: NovelStar – a functional novel writing suite in a single HTML file

https://github.com/pixeldude84/novelstar
1•pixeldude84•19m ago•0 comments

Claude Code Anywhere

https://happy.engineering
1•vismit2000•24m ago•0 comments

Detecting AI scammers and bringing back the control to humans

https://veritrue.ai/
1•cheroll•27m ago•2 comments

I hacked ChatGPT and Google's AI – and it only took 20 minutes

https://www.bbc.com/future/article/20260218-i-hacked-chatgpt-and-googles-ai-and-it-only-took-20-m...
2•leephillips•30m ago•1 comments

RSA-signed prompt envelopes for OpenClaw agents

https://github.com/Mediocr3Mik3/open-claw-spa
1•Mediocr3Mik3•32m ago•1 comments

Connectors: Discord, Notion, and Slack Now Wired into Every Debate

https://www.askverdict.ai/updates/connectors-notion-discord-slack
1•thegdsks•32m ago•0 comments

A Computational Perspective on NeuroAI and Synthetic Biological Intelligence

https://arxiv.org/abs/2509.23896
1•andsoitis•32m ago•0 comments

A faithful, native Windows Notepad clone built in Zig using raw Win32 APIs

https://github.com/leebase/lfznotepad
1•garbagepatch•33m ago•1 comments

Optimism Engine – The first AI engine with a deterministic Safety Layer

https://optimism-engine.vercel.app/
1•sucharithan•33m ago•1 comments

Worried Europeans can now cut Azure's phone cord completely

https://www.theregister.com/2026/02/25/microsoft_azure_local/
2•abdelhousni•34m ago•0 comments

Show HN: Marcus –AI math tutor that guides you to answers instead of giving them

https://marcusmath.com
1•sbharadwaj•35m ago•2 comments

Show HN: I built a persistent LSM-Tree storage engine in Go from scratch

1•Jyotishmoy•35m ago•0 comments

Human brain cells playing Doom

https://www.youtube.com/watch?v=yRV8fSw6HaE
1•noosphr•36m ago•1 comments

Add repo line count to coverage drip emails

https://gitauto.ai/blog/what-are-dora-metrics
1•nishiohiroshi•38m ago•0 comments

I don't know how you get here from "predict the next word."

https://www.grumpy-economist.com/p/refine
2•qsi•39m ago•0 comments

A high-quality OSS graphical session manager and dashboard for pi.dev agent

https://dwsy.github.io/pi-session-manager/en/
1•sinenomine•40m ago•0 comments

Show HN: AI-assert – Constraint verification for LLM outputs (278 lines, Python)

https://github.com/kaantahti/ai-assert
1•kaantahti•45m ago•0 comments

US farmers are rejecting multimillion-dollar datacenter bids for their land

https://www.theguardian.com/technology/2026/feb/21/us-farmers-datacenters
5•carabiner•46m ago•2 comments

Show HN: Prince Cloud – Create PDFs with AI Agents

https://prince.cloud
2•mikeday•48m ago•0 comments

What I Saw Inside Apple's U.S. Chip Supply Chain

https://www.wsj.com/tech/what-i-saw-inside-apples-effort-to-rebuild-the-u-s-chip-supply-chain-28f...
5•Brajeshwar•48m ago•0 comments

Apple Needs to Copy Samsung's New Security Smartphone Screen ASAP

https://www.wsj.com/tech/personal-tech/samsung-galaxy-s26-privacy-display-d5bce9ab
9•Brajeshwar•48m ago•3 comments

Stop babysitting your AI. OpenKoi iterates

https://openkoi.ai
1•yongqianme•51m ago•1 comments

Hacker Used Anthropic's Claude to Steal Sensitive Mexican Government Data

https://news.bloomberglaw.com/privacy-and-data-security/hacker-used-anthropics-claude-to-steal-se...
3•alephnerd•53m ago•0 comments