frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

V1: Unifying Generation and Self-Verification for Parallel Reasoners (ArXiv)

https://arxiv.org/abs/2603.04304
2•harman2607•1h ago

Comments

harman2607•1h ago
Hi HN, I’m one of the authors.

This paper studies how LLMs self-verify candidate solutions when doing test-time scaling (parallel reasoning / Best-of-N style generation).

We found that models are often much better at pairwise comparisons, where the model scores two solutions (A and B) jointly, than at assigning absolute scores to their own solutions independently.

The paper introduces:

• Pairwise self-verification instead of pointwise scoring

• V1-Infer, a ranking algorithm that selects good candidates efficiently

• V1-PairRL, RL training where generation and verification co-evolve to produce stronger self-verifiers

Across coding and reasoning benchmarks, we observe improved verification accuracy and good scaling when increasing verification compute budget.

One motivation is that many recent test-time scaling approaches (for example RSA: https://arxiv.org/abs/2509.26626 ) rely on sequential aggregation loops. Pairwise verification enables a more parallel form of selection, which may reduce latency in deep thinking pipelines and scaffolds.

Happy to answer questions.

"I'm obviously taking a risk here by advertising emoji directly."

https://unsung.aresluna.org/im-obviously-taking-a-risk-here-by-advertising-emoji-directly/
1•tobr•7m ago•0 comments

C++ Performance Improvements in MSVC Build Tools v14.51

https://devblogs.microsoft.com/cppblog/c-performance-improvements-in-msvc-build-tools-v14-51/
1•pjmlp•8m ago•0 comments

Ladybird browser update (February 2026) [video]

https://www.youtube.com/watch?v=Y3tteHSrJlY
1•radikalerludwig•8m ago•1 comments

JSR: The open-source package registry for modern JavaScript and TypeScript

https://jsr.io/
1•maxloh•12m ago•0 comments

DTOs at the Speed of Plain PHP

https://www.dereuromark.de/2026/03/02/dtos-at-the-speed-of-plain-php/
1•that_guy_iain•12m ago•0 comments

Show HN: I measured my context switching by scanning Git commits

https://github.com/MuhammadBaibarsZainUlAbideen/context-tracker
1•muhammadbaibars•13m ago•1 comments

Show HN: Introducing Kite AI Agent: Conversational Operations for Kubernetes

https://github.com/kite-org/kite/discussions/409
1•xdasf•13m ago•0 comments

Online harassment is entering its AI era

https://www.technologyreview.com/2026/03/05/1133962/online-harassment-is-entering-its-ai-era/
1•joozio•14m ago•0 comments

Cursor is now available in IntelliJ and other JetBrains IDEs through ACP

https://cursor.com/blog/jetbrains-acp#coding-with-cursor-in-jetbrains-ides
1•saharshpruthi•16m ago•1 comments

Show HN: Claude Code for iPad – Agentic AI coding tool with file ops, Git, shell

1•reviewpulse•19m ago•0 comments

How to Survive Your Project's First 100k Lines

https://verdagon.dev/blog/first-100k-lines
2•randomrainbow•19m ago•0 comments

Unlimited users, free and ad-free remote employee management tool

1•chronotrigger•20m ago•1 comments

Don't Be a Wrapper, Be a Container

https://www.hopsworks.ai/post/coding-agents-inside-data-platforms
1•LexSiga•22m ago•0 comments

A claudeism that I want to confirm if anyone else is experiencing

1•ramenprofitable•22m ago•0 comments

Show HN: Desktop Automation with Codex

https://github.com/nickbarth/closedbots/
1•nicbarth•24m ago•0 comments

Linux Mint is getting a new Wayland-compatible screensaver

https://www.neowin.net/news/linux-mint-is-getting-a-new-wayland-compatible-screensaver/
1•bundie•25m ago•0 comments

Fortify your app: Essential strategies to strengthen security – Meet with Apple [video]

https://www.youtube.com/watch?v=UZeSyodAszc
2•pjmlp•25m ago•0 comments

The Ugliest Beautiful Codebase

https://jimmyhmiller.com/ugliest-beautiful-codebase
1•harperlee•26m ago•0 comments

Show HN: Making remote MCP servers handle local files and generated artifacts

https://github.com/aakashh242/remote-mcp-adapter/
1•aakashh242•29m ago•0 comments

Show HN: Koshei AI – a voice-native AI language university (A1 to D2)

https://github.com/Bugsbuny24/Koshe-Al-
1•bugsbuny24•30m ago•0 comments

Federated torrent tracker based on Nostr

https://ygg.gratis/
1•routeroff•31m ago•0 comments

Towards Self-Replication: Claude Opus Designs Hardware to Run Itself

https://cpldcpu.github.io/smollm.c/
1•cpldcpu•32m ago•0 comments

California's Problematic Attempt to Add Age-Verification to Software

https://hackaday.com/2026/03/05/californias-problematic-attempt-to-add-age-verification-to-software/
2•beardyw•37m ago•0 comments

Founding Engineer (Equity) – Real Infrastructure/Real World Data Systems -Haplon

1•achittil•38m ago•0 comments

Show HN: Detecting problem–market drift with an OpenClaw agent

https://github.com/thomasbln/openclaw-marketing-agent
1•thomasBln•39m ago•0 comments

Iran hits Amazon data centres

https://www.ft.com/content/09fa5c20-2c8f-4f41-9d91-c78476eaac20
1•KnuthIsGod•41m ago•0 comments

Show HN: Codebase-md – Creates Claude.md, .cursorrules, AGENTS.md from any repo

https://github.com/sauravanand542/codebase-md
1•anandsaurav668•42m ago•0 comments

Faecal transplants–a treatment for bipolar disorder?

https://economist.com/science-and-technology/2026/03/05/faecal-transplants-a-treatment-for-bipola...
1•uxhacker•49m ago•1 comments

Kuberna Labs: AI's Economic Engine

https://github.com/kawacukennedy/kuberna-labs
1•n3on250•49m ago•0 comments

Microsoft kicked off a Copilot revolt by banning the word "Microslop" on Discord

https://www.windowscentral.com/artificial-intelligence/microsoft-copilot/microsoft-accidentally-k...
2•classified•55m ago•1 comments