frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Absolute Zero: Reinforced Self-Play Reasoning with Zero Data

https://arxiv.org/abs/2505.03335
3•sinuhe69•7h ago

Comments

sinuhe69•7h ago
Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent RLVR works that operate under the zero setting avoid supervision in labeling the reasoning process, but still depend on manually curated collections of questions and answers for training. The scarcity of high-quality, human-produced examples raises concerns about the long-term scalability of relying on human supervision, a challenge already evident in the domain of language model pretraining. Furthermore, in a hypothetical future where AI surpasses human intelligence, tasks provided by humans may offer limited learning potential for a superintelligent system. To address these concerns, we propose a new RLVR paradigm called Absolute Zero, in which a single model learns to propose tasks that maximize its own learning progress and improves reasoning by solving them, without relying on any external data. Under this paradigm, we introduce the Absolute Zero Reasoner (AZR), a system that self-evolves its training curriculum and reasoning ability by using a code executor to both validate proposed code reasoning tasks and verify answers, serving as an unified source of verifiable reward to guide open-ended yet grounded learning. Despite being trained entirely without external data, AZR achieves overall SOTA performance on coding and mathematical reasoning tasks, outperforming existing zero-setting models that rely on tens of thousands of in-domain human-curated examples. Furthermore, we demonstrate that AZR can be effectively applied across different model scales and is compatible with various model classes.
sinuhe69•6h ago
This approach aims to train reasoning models without relying on human-curated data, allowing models to learn by proposing tasks, solving them, and learning from both stages through self-play with the aid of an environment.

The core of this research is the Absolute Zero Reasoner (AZR), which focuses on proposing and solving coding tasks, utilizing a code executor for verifiable feedback.

Key Findings and Contributions:

    State-of-the-Art Performance: AZR has demonstrated state-of-the-art performance in coding and mathematical reasoning tasks, outperforming models trained on traditional human-curated datasets.
    Enhanced Reasoning Capabilities: The study suggests that coding capabilities developed through AZR training may amplify overall improvements in reasoning. Models trained with AZR showed stronger gains in generalized reasoning compared to those trained with expert code.
    Scalability: The performance improvements observed with AZR appear to scale with the size of the model.
    Cognitive Behaviors: AZR exhibits emergent cognitive behaviors such as step-by-step reasoning and trial-and-error. The research also noted that token counts grow with training and vary depending on the type of task.
(Summarized by Gemini)

Tim O'Reilly/O'Reilly Media now wants every human programmer replaced by Gen AI

https://old.reddit.com/r/programming/comments/1kimr4a/warning_tim_oreilly_of_oreilly_media_now_wants/
1•Crowgirl•31s ago•1 comments

Microwriter – Computer Ads from the Past

https://computeradsfromthepast.substack.com/p/microwriter
1•rbanffy•1m ago•0 comments

A founder's guide to moving abroad

https://medium.com/@bradhe/a-founders-guide-to-moving-aboard-32584b29f50f
1•bradhe•1m ago•0 comments

TikTok trend sees kids setting Chromebooks on fire; at least one hospitalized

https://arstechnica.com/gadgets/2025/05/tiktok-trend-sees-kids-setting-chromebooks-on-fire-at-least-one-kid-hospitalized/
1•rntn•2m ago•0 comments

Escaping the SES Sandbox: An Adventure in Sunk Costs

https://alex-dawkins.com/posts/2025/05/09/simple-email-service.html
1•ouked•3m ago•0 comments

A Soviet-era spacecraft built to land on Venus is falling to Earth instead

https://arstechnica.com/space/2025/05/a-soviet-era-spacecraft-built-to-land-on-venus-is-falling-to-earth-instead/
1•voxadam•4m ago•0 comments

From Budapest to Hanoi: Comparing the COE and UN Cybercrime Conventions

https://www.lawfaremedia.org/article/from-budapest-to-hanoi--comparing-the-coe-and-un-cybercrime-conventions
2•hn_acker•7m ago•0 comments

The Anarchitecture Group

https://www.spatialagency.net/database/the.anarchitecture.group
3•jruohonen•7m ago•0 comments

Launch HN: Nao Labs (YC X25) – Cursor for Data

5•ClaireGz•8m ago•0 comments

Newark Airport Suffers Another Tech Outage, FAA Says

https://www.wsj.com/business/airlines/newark-airport-radar-outage-faa-b52da5e5
2•thm•9m ago•0 comments

Why Do Americans Pay More for Prescription Drugs?

https://www.propublica.org/article/why-americans-pay-more-for-prescription-drugs
2•hn_acker•9m ago•0 comments

Show HN: UpToTrial – OSS AI agent for clinicaltrials.gov that streams custom UI

https://uptotrial.com
2•ivalm•10m ago•0 comments

The FCC Must Reject Efforts to Lock Up Public Airwaves

https://www.eff.org/deeplinks/2025/05/fcc-must-reject-broadcast-drm
2•hn_acker•12m ago•0 comments

A Decade of Employment

https://blakewatson.com/journal/a-decade-of-employment/
1•blakewatson•12m ago•1 comments

Infinity AI – AI Coin on Ton

https://bitcointalk.org/index.php?topic=5542281.0
1•haghiri•13m ago•0 comments

High-Performance Pure-Red Perovskite LEDs via 3D Intragrain Heterostructure

https://en.ustc.edu.cn/info/1011/5047.htm
1•gnabgib•14m ago•0 comments

Show HN: Oliphaunt – A Native Mastodon Client for macOS

https://testflight.apple.com/join/Epq1P3Cw
9•anosidium•15m ago•2 comments

Show HN: BlenderQ – A TUI for managing multiple Blender renders

https://github.com/KyleTryon/BlenderQ
7•TechSquidTV•19m ago•0 comments

Distributed NER model training and inference at scale using Accelerate

https://medium.com/walmartglobaltech/distributed-ner-model-training-inference-at-scale-using-accelerate-16b2428fe86b
1•gray_amps•22m ago•0 comments

Rollstack (YC W23) Is Hiring TypeScript Engineers (Remote US/CA)

https://www.ycombinator.com/companies/rollstack-2/jobs/QPqpb1n-software-engineer-typescript-us-canada
1•yjallouli•22m ago•0 comments

On Lighter Bows

https://acoup.blog/2025/05/09/fireside-friday-may-9-2025-on-lighter-bows/
1•Tomte•23m ago•0 comments

Coinbase Machine Learning and Blockchain Research Summit [video]

https://vimeo.com/1077316333
1•dubrado•25m ago•0 comments

Sen. Cotton's bill would require AI chips to track location, curbing Chinese use

https://www.reuters.com/world/us/us-senator-introduces-bill-calling-location-tracking-ai-chips-limit-china-access-2025-05-09/
3•byte-bolter•27m ago•1 comments

Past, Present, and Future of Sorbet Type Syntax

https://blog.jez.io/history-of-sorbet-syntax/
11•PaulHoule•27m ago•0 comments

The Physical Turing Test: Jim Fan on Nvidia's Roadmap for Embodied AI [video]

https://www.youtube.com/watch?v=_2NijXqBESI
1•abetaha•28m ago•0 comments

National Snow and Ice Data Center changes service level to key sea ice datasets

https://nsidc.org/data/user-resources/data-announcements/user-notice-level-service-update-data-products
2•waterthrowaway•29m ago•1 comments

PyRoki: A Modular Toolkit for Robot Kinematic Optimization

https://pyroki-toolkit.github.io/
1•abetaha•30m ago•0 comments

Valibot v1.1

https://valibot.dev/blog/valibot-v1.1-release-notes/
1•bpierre•33m ago•0 comments

Newark Airport Hit by 90-Second Radar Outage, the Second in Weeks

https://www.bloomberg.com/news/articles/2025-05-09/newark-airport-hit-by-90-second-radar-radio-outage-on-friday
3•marc__1•34m ago•1 comments

Google inks deal to develop 1.8 GW of advanced nuclear power

https://techcrunch.com/2025/05/09/google-inks-deal-to-develop-1-8-gw-of-advanced-nuclear-power/
3•mikece•34m ago•0 comments