frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Reinforcement Pre-Training

https://arxiv.org/abs/2506.08007
70•frozenseven•8mo ago

Comments

hzia•8mo ago
This is very exciting! Existing data will become a lot more valuable and it brings it one step closer to how we learn as humans!

The downside is that this is going to be extremely expensive, so the data set to conduct RL will need to be curated.

watsonmusic•8mo ago
cannot wait seeing how it goes beyond the current llm training pipeline
nsagent•8mo ago
It's clear that you're either one of the authors or a friend of theirs. You created this account 8 months ago to comment on another paper [1] that was released by the same authors.

[1]: https://news.ycombinator.com/item?id=41776324

dgshsg•8mo ago
I notice that you can do this recursively to arbitrary depth. The cost is terrible though.
watsonmusic•8mo ago
it could be adaptive. only high-value tokens were allocated with more compute
babelfish•8mo ago
So marginally better (and occasionally worse) performance for an order of magnitude larger training costs…?
watsonmusic•8mo ago
14b model performs comparably with 32b size. the improvement is huge
85392_school•8mo ago
are we only comparing them in terms of text completion accuracy? does it also improve perf on benchmarks?
watsonmusic•8mo ago
A new scaling paradigm finally comes out!
beauzero•8mo ago
Interesting
NotAnOtter•8mo ago
I'm interested how an innovation like this affects the business prospects.

Let's assume this is a paradigm shift on the scale of Transformers / `Attention is all you need`. Companies build out new models and pump another $100 Billion through it. And then a year from now, another innovation comes out. Same circus. And again.

No one wants to be left behind but trying to keep up will sink smaller companies.

curious_cat_163•8mo ago
I am not sure why this ought to require "pump another $100 Billion". Could you elaborate?

Yes, the more recent generation of GPUs optimize for attention math. But they are still fairly "general-purpose" accelerators as well. So when I see papers like this (interesting idea, btw!), my mental model for costs suggests that the CapEx to buy up the GPUs and build out the data centers would get re-used for this and 100s of other ideas and experiments.

And then the hope is that the best ideas will occupy more of the available capacity...

gessha•8mo ago
Sir, this is an arxiv paper
NotAnOtter•8mo ago
So true, just like this one: https://arxiv.org/abs/1706.03762
Imnimo•8mo ago
This is an interesting way of squeezing extra feedback from raw text, but I'm a little skeptical that it's the best way to spend training flops. It feels like most "next tokens" are pretty low information (even after filtering for entropy like they do). Does it make sense to spend a bunch of compute on a reasoning trace on them? Maybe if you're harshly data limited, but not compute limited?
rafaelero•8mo ago
This should be used for high entropy tokens during pre-training.
ntonozzi•8mo ago
Is there any work related to using some kind of soft tokens for reasoning? It seems so inefficient to try to encode so much information down into a single token for the next pass of the model, when you could output a large vector for each forward pass, and have a drastically larger working memory/scratchpad, and have much higher bandwidth for the models to pass information forward to the next token call. If a single token has 17 bits of information, a vector of 1024 floats could have 32,768 bits of information.
ntonozzi•8mo ago
I just found a recent paper about this: https://arxiv.org/abs/2505.15778. It's really thoughtful and well written. They mix the different token outputs together.

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

https://github.com/joelparkerhenderson/queueing-theory
1•jph•5m ago•0 comments

Show HN: Hibana – choreography-first protocol safety for Rust

https://hibanaworks.dev/
1•o8vm•7m ago•0 comments

Haniri: A live autonomous world where AI agents survive or collapse

https://www.haniri.com
1•donangrey•7m ago•1 comments

GPT-5.3-Codex System Card [pdf]

https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
1•tosh•20m ago•0 comments

Atlas: Manage your database schema as code

https://github.com/ariga/atlas
1•quectophoton•23m ago•0 comments

Geist Pixel

https://vercel.com/blog/introducing-geist-pixel
1•helloplanets•26m ago•0 comments

Show HN: MCP to get latest dependency package and tool versions

https://github.com/MShekow/package-version-check-mcp
1•mshekow•34m ago•0 comments

The better you get at something, the harder it becomes to do

https://seekingtrust.substack.com/p/improving-at-writing-made-me-almost
2•FinnLobsien•35m ago•0 comments

Show HN: WP Float – Archive WordPress blogs to free static hosting

https://wpfloat.netlify.app/
1•zizoulegrande•37m ago•0 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
1•melvinzammit•37m ago•0 comments

Sony BMG copy protection rootkit scandal

https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootkit_scandal
1•basilikum•40m ago•0 comments

The Future of Systems

https://novlabs.ai/mission/
2•tekbog•40m ago•1 comments

NASA now allowing astronauts to bring their smartphones on space missions

https://twitter.com/NASAAdmin/status/2019259382962307393
2•gbugniot•45m ago•0 comments

Claude Code Is the Inflection Point

https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point
3•throwaw12•46m ago•1 comments

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

https://github.com/microclaw/microclaw
1•everettjf•47m ago•2 comments

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

https://github.com/AleatorAI/OMNI-BLAS
1•LowSpecEng•47m ago•1 comments

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

https://codemanship.wordpress.com/2026/01/05/the-ai-ready-software-developer-conclusion-same-game...
1•lifeisstillgood•49m ago•0 comments

AI Agent Automates Google Stock Analysis from Financial Reports

https://pardusai.org/view/54c6646b9e273bbe103b76256a91a7f30da624062a8a6eeb16febfe403efd078
1•JasonHEIN•53m ago•0 comments

Voxtral Realtime 4B Pure C Implementation

https://github.com/antirez/voxtral.c
2•andreabat•55m ago•1 comments

I Was Trapped in Chinese Mafia Crypto Slavery [video]

https://www.youtube.com/watch?v=zOcNaWmmn0A
2•mgh2•1h ago•0 comments

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

https://www.cbp.gov/newsroom/stats/reported-employee-arrests
1•ludicrousdispla•1h ago•0 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•1h ago•1 comments

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

https://github.com/thealidev/VectorVision-SVGV
1•thealidev•1h ago•0 comments

Study of 150 developers shows AI generated code no harder to maintain long term

https://www.youtube.com/watch?v=b9EbCb5A408
1•lifeisstillgood•1h ago•0 comments

Spotify now requires premium accounts for developer mode API access

https://www.neowin.net/news/spotify-now-requires-premium-accounts-for-developer-mode-api-access/
1•bundie•1h ago•0 comments

When Albert Einstein Moved to Princeton

https://twitter.com/Math_files/status/2020017485815456224
1•keepamovin•1h ago•0 comments

Agents.md as a Dark Signal

https://joshmock.com/post/2026-agents-md-as-a-dark-signal/
2•birdculture•1h ago•0 comments

System time, clocks, and their syncing in macOS

https://eclecticlight.co/2025/05/21/system-time-clocks-and-their-syncing-in-macos/
1•fanf2•1h ago•0 comments

McCLIM and 7GUIs – Part 1: The Counter

https://turtleware.eu/posts/McCLIM-and-7GUIs---Part-1-The-Counter.html
2•ramenbytes•1h ago•0 comments

So whats the next word, then? Almost-no-math intro to transformer models

https://matthias-kainer.de/blog/posts/so-whats-the-next-word-then-/
1•oesimania•1h ago•0 comments