frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

ERCP: Self-Correcting LLM Reasoning Using NLI-Based Neuro-Symbolic Constraints

https://zenodo.org/records/17602891
1•hemanm•1h ago

Comments

hemanm•1h ago
I'm sharing a summary of my recent research on a method for controlling large language models (LLMs) called Evo-Recursive Constraint Prompting (ERCP). We achieved a 20% absolute accuracy gain on the PIQA commonsense reasoning task. This approach goes beyond simple prompting; it involves a neuro-symbolic optimization loop designed to enforce logical consistency.

*Key Results on PIQA:* - *Baseline Accuracy:* 70.0% - *ERCP Final Accuracy:* 90.0% - *Absolute Gain:* 20.0% (a 28.6% relative boost) - *Efficiency:* Achieved in an average of 3.9 iterations.

*Methodology: Self-Correcting Logic* The core novelty of our approach lies in the use of external symbolic tools to oversee the LLM's neural output:

1. *Diagnosis:* Our system employs a DeBERTa NLI Oracle to autonomously identify logical contradictions and ambiguities within the LLM's reasoning chain. 2. *Constraint Generation:* These detected errors are immediately translated into formal, actionable constraints (the symbolic step). 3. *Refinement:* The LLM is re-prompted to solve the task, explicitly conditioned on these new constraints (the neuro step).

ERCP systematically transforms reasoning errors into performance gains by enabling the model to self-correct based on verifiable logical rules.

*The Real Research Challenge: The Convergence Problem* While a 90% accuracy rate is strong, our results showed that only 30% of runs fully converged to a high-quality constraint set (Score > 0.8).

- *Initial Constraint Score:* 0.198 - *Final Constraint Score:* 0.377

This indicates that 70% of the successful results were achieved with suboptimal constraint guidance. The next frontier is refining our optimizer to ensure constraint quality and guarantee convergence across all runs.

The whitepaper detailing the full protocol is linked in the submission. I look forward to hearing your thoughts on building truly robust, self-correcting LLM systems with this level of precision.

Show HN: Stickerbox, a Kid-Safe, AI-Powered Voice to Sticker Printer

https://stickerbox.com/
1•spydertennis•28s ago•0 comments

Waymo to Go Driverless in Miami, Houston, Dallas, San Antonio, and Orlando

https://www.fastcompany.com/91443871/waymo-fully-autonomous-miami-orlando-dallas-houston-san-antonio
1•proudestmonkey•32s ago•0 comments

Housing and Affordability – No Easy Solutions

https://www.nominalnews.com/p/housing-and-affordability
1•MPLan•2m ago•1 comments

'You can't make this stuff up': Jordan Orelli on lobste.rs admin and McSweeney's

https://bsky.app/profile/jordan.orel.li/post/3m5tw7fdd7k2k
1•vintagedave•6m ago•2 comments

Ask HN: Reason for the DDoS attacks on DALnet circa 2002?

1•Meekro•7m ago•0 comments

Resiliency and Scale

https://stratechery.com/2025/resiliency-and-scale/
1•kaonwarb•7m ago•0 comments

WhatsApp Owns India

https://newsletter.theindianotes.com/p/whatsapp-owns-india
1•ilamont•8m ago•0 comments

Show HN: GitBalance – Daily health commits for developers

https://gitbalance.com
4•windystockholm•9m ago•0 comments

Enumerating 3B Accounts for Security and Privacy [pdf]

https://github.com/sbaresearch/whatsapp-census/blob/main/Hey_there_You_are_using_WhatsApp.pdf
3•staticBr•9m ago•1 comments

Watt v3.18 Unlocks Next.js 16's Revolutionary 'use cache' Directive with

https://blog.platformatic.dev/watt-v318-unlocks-nextjs-16s-revolutionary-use-cache-directive-with...
1•feross•9m ago•0 comments

Journalism in an Age of Authoritarianism

https://www.noemamag.com/journalism-in-an-age-of-authoritarianism/
1•Brajeshwar•9m ago•1 comments

Show HN: I made a human-first livechat

https://heyo.so?src=hackernews
1•Jeannen•10m ago•0 comments

Google DeepMind won Nobel Prize for AI: can it produce next big breakthrough?

https://www.nature.com/articles/d41586-025-03713-1
2•bookofjoe•11m ago•0 comments

Show HN: Murmurs – an app to discover local events and quickly plan with friends

https://murmurs.us
1•ameenba•11m ago•0 comments

Microsoft, Nvidia and Anthropic Announce Strategic Partnerships

https://blogs.nvidia.com/blog/microsoft-nvidia-anthropic-announce-partnership/
3•saanville•12m ago•0 comments

Microsoft, Nvidia and Anthropic announce strategic partnerships

https://www.anthropic.com/news/microsoft-nvidia-anthropic-announce-strategic-partnerships
1•kerim-ca•13m ago•0 comments

Python's Truthiness: A Code Smell Worth Sniffing

https://owl.billpg.com/pythons-truthiness-a-code-smell-worth-sniffing/
1•billpg•13m ago•0 comments

Scientists uncover an ant assassination scheme for a queen's rise to power

https://www.cnn.com/2025/11/17/science/parasitic-ant-queen-workers-kill-mother
1•Brajeshwar•14m ago•0 comments

Nvidia's new AI physics model can help design chips

https://www.computerworld.com/article/4091444/nvidias-new-ai-physics-model-can-help-design-chips-...
1•Brajeshwar•14m ago•0 comments

GLP-1 weight-loss treatment is being used for alcohol and drug addiction

https://www.washingtonpost.com/health/2025/11/16/glp1-weight-loss-addiction-drug-alcohol/
1•randycupertino•17m ago•1 comments

Gemini 3 Pro Preview Live in AI Studio

https://aistudio.google.com/prompts/new_chat?model=gemini-3-pro-preview
29•preek•17m ago•6 comments

Show HN: MCP Server for OpenTelemetry

https://github.com/traceloop/opentelemetry-mcp-server
7•GalKlm•18m ago•0 comments

Requesting design feedback for this page? Would you use it in the current design

https://thebreathingexercises.com/breathing-exercises/4-7-8-breathing
2•naveen-zerocool•19m ago•0 comments

Gemini-3-pro-preview available on AI Studio

https://aistudio.google./
2•Topfi•19m ago•3 comments

If It Were My Home (country comparison tool)

https://www.ifitweremyhome.com/
2•andrewstetsenko•19m ago•0 comments

Valve unveils new console to rival Xbox and PlayStation

https://www.bbc.com/news/articles/cd679n9lnx5o
2•taylorbuley•20m ago•0 comments

Lix 2.94 "Açaí na tigela"

https://lix.systems/blog/2025-11-18-lix-2.94-release/
1•todsacerdoti•20m ago•0 comments

The 70-year anniversary of two important IBM Selectric patents

https://sharktastica.co.uk/articles/selectric-golfball-70
2•tart-lemonade•20m ago•0 comments

Gemini 3 Pro is now live on Google AI Studio

https://aistudio.google.com/app/prompts/new_chat?model=gemini-3-pro-preview
2•mil22•20m ago•0 comments

Anthropic to buy $30B in Azure capacity in deal with Microsoft, Nvidia

https://www.cnbc.com/2025/11/18/anthropic-ai-azure-microsoft-nvidia.html
2•stygiansonic•21m ago•0 comments