frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Verge: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning

https://arxiv.org/abs/2601.20055
2•vikashjohn2505•1h ago

Comments

vikashjohn2505•1h ago
We present a neurosymbolic framework that combines LLMs with SMT solvers to produce verification-guided answers through iterative refinement. Our approach decomposes LLM outputs into atomic claims, autoformalizes them into first-order logic, and verifies their logical consistency using automated theorem proving. We introduce three key innovations: (1) multi-model consensus via formal semantic equivalence checking to ensure logic-level alignment between candidates, eliminating the syntactic bias of surface-form metrics, (2) semantic routing that directs different claim types to appropriate verification strategies: symbolic solvers for logical claims and LLM ensembles for commonsense reasoning, and (3) precise logical error localization via Minimal Correction Subsets (MCS), which pinpoint the exact subset of claims to revise, transforming binary failure signals into actionable feedback. Our framework classifies claims by their logical status and aggregates multiple verification signals into a unified score with variance-based penalty. The system iteratively refines answers using structured feedback until acceptance criteria are met or convergence is achieved. This hybrid approach delivers formal guarantees where possible and consensus verification elsewhere, advancing trustworthy AI. With the GPT-OSS-120B model, VERGE demonstrates an average performance uplift of 18.7% at convergence across a set of reasoning benchmarks compared to single-pass approaches.
phoenixrecruit•1h ago
how you made sure that your MCS feedback is actually used in next iteration?

Using AI Engines for DSP [pdf]

https://reach.avnet.com/rs/730-NST-988/images/AVNET-ADIUVO-AMD-AIE%20for%20DSP%20White%20Paper-FI...
1•signalhound•5m ago•0 comments

You can code only 4 hours per day. Here's why

https://newsletter.techworld-with-milan.com/p/you-can-code-only-4-hours-per-day
2•birdculture•7m ago•0 comments

Bun: New Markdown parser has builtin support for rendering to React

https://twitter.com/jarredsumner/status/2016733616743862462
1•tosh•7m ago•0 comments

Open Gaming Collective

https://opengamingcollective.org/
1•pentagrama•7m ago•0 comments

Gourmand Syndrome

https://en.wikipedia.org/wiki/Gourmand_syndrome
1•ag8•7m ago•0 comments

Building the Skyline: The Birth and Growth of Manhattan's Skyscrapers

https://buildingtheskyline.org/
1•danielam•10m ago•0 comments

Apple acquires Israeli audio AI startup Q.ai

https://www.reuters.com/business/apple-acquires-audio-ai-startup-qai-2026-01-29/
6•tosh•11m ago•1 comments

Ask HN: Who here works with just a laptop?

1•hu3•12m ago•0 comments

We Ran 11 AI PR Bots in Production

https://loganharless.com/blog/ai-pr-bot-rankings
1•myhandleisbest•12m ago•1 comments

Students Are Finding New Ways to Cheat on the SAT

https://www.nytimes.com/2026/01/28/us/politics/how-the-online-sat-may-be-vulnerable-to-cheating.html
3•JumpCrisscross•13m ago•1 comments

Show HN: Transcribee: YouTube transcriber that builds a knowledge base

https://github.com/itsfabioroma/transcribee
1•ofabioroma•13m ago•0 comments

PlayStation 2 Recompilation Project Is Absolutely Incredible

https://redgamingtech.com/playstation-2-recompilation-project-is-absolutely-incredible/
2•croes•13m ago•0 comments

The dev who asks too many questions is the one you need in your team

https://leadthroughmistakes.substack.com/p/the-teammate-who-asks-too-many-questions
4•birdculture•14m ago•0 comments

Cisco AI Agent Skills Security Scanner

https://github.com/cisco-ai-defense/skill-scanner
2•hsanthan•15m ago•0 comments

SpaceX in Merger Talks with xAI

https://www.reuters.com/world/musks-spacex-merger-talks-with-xai-ahead-planned-ipo-source-says-20...
5•m-hodges•15m ago•0 comments

Magit

https://magit.vc/
2•tosh•16m ago•0 comments

Agent Swarms, like the one Cursor created

https://mrinal.com/articles/agent-swarms-like-the-one-cursor-created/
1•rdegges•16m ago•0 comments

Anna's Archive is sued for $13T

https://www.nme.com/news/music/spotify-major-record-labels-sue-annas-archive-13trillion-allege-br...
8•antonmks•17m ago•4 comments

Light the Nuclear Candle – Incautious Optimism

https://jordanwtaylor2.substack.com/p/light-the-nuclear-candle
1•bilsbie•18m ago•0 comments

We created more tech debt in 6 months than in a 10-year-old system

https://superkacper4.github.io/portfolio-2023/blog/technical-debt-everyday
1•superkacper4•18m ago•0 comments

Show HN: GodScore CI – a CI gate that blocks risky changes before production

https://github.com/willshacklett/godscore-ci
1•PapaShack45•19m ago•0 comments

Exploring single-cell biosynthetic noise for enhanced production in E. coli

https://www.nature.com/articles/s41467-025-67733-1
1•PaulHoule•19m ago•0 comments

Show HN: VCluster Free – Free K8s Multi-Tenancy with Virtual Clusters

https://www.vcluster.com/blog/launching-vcluster-free-get-enterprise-features-at-no-cost
6•gentele•20m ago•1 comments

County pays $600k to pentesters it arrested for assessing courthouse security

https://arstechnica.com/security/2026/01/county-pays-600000-to-pentesters-it-arrested-for-assessi...
5•MBCook•20m ago•0 comments

Ask HN: Junior getting lost

5•TheRegularOne•21m ago•6 comments

Password is Too Damn Short (2015)

https://blog.codinghorror.com/your-password-is-too-damn-short/
1•giancarlostoro•22m ago•0 comments

Ask HN: Opinion on self driving cars breaking the law?

1•socalgal2•22m ago•0 comments

Founding Is a Snowball

https://blog.bawolf.com/p/founding-is-a-snowball
1•bryantwolf•22m ago•0 comments

My Mom and Dr. DeepSeek (2025)

https://restofworld.org/2025/ai-chatbot-china-sick/
28•kieto•23m ago•1 comments

I Stress-Tested Cube's New AI Analytics Agent. Here's What Happened [video]

https://www.youtube.com/watch?v=p3frGJOUl1E&list=PLtdXl_QTQjpZ0f_OHi2yMLTeH1n5cLZGF
1•fromthegut•23m ago•0 comments