news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: UQLM – Closed-book hallucination detection with UQ

https://github.com/cvs-health/uqlm

2•virenbajaj•1h ago

Comments

virenbajaj•1h ago

Hi HN,

We built an open source library for closed book hallucination detection called UQLM (Uncertainty Quantification for Language Models).

It packages over 20 types of scorers from the latest uncertainty quantification literature in one API so that one can calculate a confidence score (between 0 and 1) of an LLMs response without reference to external knowledge. It accepts any LangChain BaseChatModel which wraps every major provider.

UQLM has six families of scorers: 1. Black-box scorers: calculate the consistency across multiple responses to the same prompt 2. White-box scorers: use log-probabilities from the model’s output to estimate confidence 3. LLM-as-a-judge scorers: use a single judge or panel of judges 4. Long-text scorers: work like black box scorers but first decompose the claim into sentences or claims. They also have uncertainty aware decoding which drops low confidence claims / sentences and reconstructs the output. 5. Code generation scorers: calculate functional correctness of a piece of code without running it by using functional equivalence methods 6. Ensemble scorers: calculate a weighted combination of other scorers that can be tuned

A canonical use case for UQLM is to use one or many of the scorers to flag low confidence responses from an LLM in production for human review instead of sending them to the user.

We’d love to hear your feedback about the library’s methods and its use in production.

Show HN: Voice control coding agents on your machine via smartwatch / CarPlay

https://dashvox.ai

1•Zante•3m ago•0 comments

Show Faith by Works Digital Dragnet – Untitledsource

https://untitledsource.substack.com/p/show-faith-by-works-digital-dragnet

1•untiledsource•3m ago•0 comments

Email from 10k Feet

https://maneeshaxyz.substack.com/p/email-from-10000-feet

1•maneeshaxyz•4m ago•0 comments

NIS2 Email Security: Mapping Article 21 Controls to DMARC, MTA-STS, and Dane

https://dmarcguard.io/blog/nis2-email-security/

2•meysamazad•4m ago•0 comments

Are blue zones real? Answering that question is harder

https://www.statnews.com/2026/05/04/are-blue-zones-real-new-scrutiny-longevity-hot-spots/

2•mfld•4m ago•0 comments

Free IoT security scorecard mapped to ETSI EN 303 645 and EU CRA

https://tangibles-book.com/tools/iot-security-scorecard

1•yoelf22•5m ago•0 comments

I hired 5 people to sit behind me and make me productive for a month (2023)

https://www.lesswrong.com/posts/gp9pmgSX3BXnhv8pJ/i-hired-5-people-to-sit-behind-me-and-make-me-p...

2•LorenDB•5m ago•0 comments

Radxa Dragon Q8B: A Laptop Cosplaying as an SBC?

https://bret.dk/radxa-dragon-q8b-a-laptop-cosplaying-as-an-sbc/

1•gainsurier•5m ago•0 comments

ToolPix – Free Online Image, PDF, Coding and AI Tools

https://toolpix.pythonanywhere.com/

1•uthakkan•7m ago•0 comments

RFC 2119 for Agents.md: Make Your Agent Rules Compact and Strict

https://fertsch-roever.com/blog/rfc-2119-for-agents-md-make-your-agent-rules-compact-and-strict/

1•And-y•7m ago•0 comments

Governing a Codebase as a Commons

https://b-erdem.github.io/2026/06/01/governing-a-codebase-as-a-commons.html

1•baris_erdem•7m ago•0 comments

WhatsKept: Local agent-queryable WhatsApp from iOS encrypted backup

https://github.com/alkait/WhatsKept

1•tenthead•7m ago•0 comments

Virtual Memory: Page Tables, TLBs, and Linux Internals

https://blog.codingconfessions.com/p/virtual-memory

1•ibobev•9m ago•0 comments

Show HN: One-click open-source ecommerce starter (Magento), drive it with Claude

https://ecommerce-ai-starter.graycore.io

1•damienwebdev•12m ago•0 comments

Nvidia Cosmos 3

https://developer.nvidia.com/blog/develop-physical-ai-reasoning-world-and-action-models-with-nvid...

3•tosh•12m ago•0 comments

[video] WebRTC for the Streamer – How Whip/WebRTC Can Improve Streaming

https://www.youtube.com/watch?v=NOVH70nH7jw

1•Sean-Der•14m ago•1 comments

NPM packages from RedHat have been compromised

https://github.com/RedHatInsights/javascript-clients/issues/492

51•kurmiashish•15m ago•8 comments

Blue Origin Moon Lander Completes Testing at NASA Vacuum Chamber

https://www.nasa.gov/missions/artemis/blue-origin-moon-lander-completes-testing-at-nasa-vacuum-ch...

2•tcp_handshaker•15m ago•0 comments

Windows GOG DOS Games on M-Series Macs

https://f055.net/technology/windows-gog-dos-games-on-m-series-macs/

3•f055•16m ago•0 comments

Introducing Pulse: A Post-Scrum Manifesto and Agent Workflow for AI-Native Teams

https://github.com/pssah4/digital-innovation-agents

1•pssah4•17m ago•0 comments

AeroThemePlasma: Alternative shell for KDE Plasma that replicates Windows 7 look

https://gitgud.io/aeroshell/atp/aerothemeplasma

2•Tiberium•17m ago•0 comments

Major Israeli tech firms start layoffs as AI revolution roils industry

https://www.timesofisrael.com/major-israeli-tech-firms-commence-sweeping-layoffs-as-ai-revolution...

2•tcp_handshaker•19m ago•0 comments

Cosmos 3: Omnimodal World Models for Physical AI

https://research.nvidia.com/labs/cosmos-lab/cosmos3/

2•armcat•19m ago•0 comments

Show HN: Prela – Purely Algebraic Relation Combinators

https://github.com/remysucre/prela

1•remywang•19m ago•0 comments

The CIA crash that opened a fraught month in Mexico–US relations

https://english.elpais.com/international/2026-05-26/the-cia-crash-that-opened-a-fraught-month-in-...

2•wslh•19m ago•1 comments

Gemma 4 26B on a consumer GPU: build pain, throughput, and BFCL numbers

https://algollabs.com/blog/gemma4-bfcl

1•tumbak•20m ago•0 comments

Simple batch decoding of unary codes

https://fgiesen.wordpress.com/2026/05/30/simple-batch-decoding-of-unary-codes/

1•ibobev•21m ago•0 comments

When AI Starts Writing Systems Code

https://www.coreauto.com/blog/when-ai-starts-writing-systems-code

4•yurivish•22m ago•0 comments

Building DNA from Scratch in C

https://twitter.com/TheVixhal/status/2061108499636265114

3•ibobev•22m ago•0 comments

Ask HN: Are there companies that use agent-based modeling?

3•hamburgererror•23m ago•1 comments