frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

https://arxiv.org/abs/2412.15287
69•mfiguiere•9mo ago

Comments

justanotheratom•9mo ago
Is Best-of-N Sampling standard practice these days in Inference? Sounds expensive on the face of it. I am surprised because I thought the trend was towards cheaper inference.
diwank•9mo ago
For reasoning models, this would actually improve exploration efficiency and hence possibly allow higher performance for the same compute budget. As in, if you want to sample from multiple rollouts for the same prompt, it's more efficient if the model is able to produce diverse thought directions and consider them to find the best response as opposed to going down similar trajectories and waste compute.
codelion•9mo ago
Not standard but one of several techniques, you can see them in our open source inference proxy - https://github.com/codelion/optillm

Cerebras has used optillm for optimising inference with techniques like CePO and LongCePO.

peepeepoopoo114•9mo ago
Almost all of the efficiency gains have come from shedding bit precision, but the problem is that AI labs are now running out of bits to shed. The move to reduced precision inference has been masking the insane unsustainability of compute scaling as a model improvement paradigm.
nullc•9mo ago
Is there really a limit on bits to shed? I suspect not.

Take N gates, normalize them, represent them as points on the surface of a hypersphere. Quantize the hypersphere as coarsely as you need to get the precision you want. Want less precision but your quantization is getting too coarse? Increase N.

Fast algebraic codes exist to convert positions on a hyperspheric-ish surfaces to indexes and vice versa.

Perhaps spherical VQ isn't ideal-- though I suspect it is, since groups of weights often act as rotations naturally-- but some other geometry should be good if not.

karmasimida•9mo ago
Isn't the BoN RL formulation similar to DeepSeek's GRPO algorithm? The latter seems to implicitly already captured this?
Johnyhar•9mo ago
Wouldn't RL training, with the goal of aligning the LLM with the reward function R(x, y), result in the outputs of the trained LLM maximizing said reward function? How different are the rewards of the N outputs in BoN sampling, to justify its cost.
padolsey•9mo ago
I wish they had some example completions in the paper and not just eval results. It would be really useful to see if there are any emergent linguistic tilts to the newly diverse responses...
vessenes•9mo ago
Nice idea. Essentially, adding differentiability to the best of n choice lets them encourage models to add some diversity “naturally”. The Gemma 2b results indicate it’s probably worth trying this on larger models.

That said, I’m unclear how much this helps in practice; we don’t usually parse through say 32 responses from our 2B parameter models. I guess if you instrumented parallel reasoning processes in batch this might be helpful. Perhaps that’s what o1-pro is doing in the background, actually.

Anyway, this one seems to me like it might make its way onto the “good idea” list when rl is available in the training pipeline.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
491•klaussilveira•7h ago•131 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
830•xnx•13h ago•497 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
49•matheusalmeida•1d ago•7 comments

A century of hair samples proves leaded gas ban worked

https://arstechnica.com/science/2026/02/a-century-of-hair-samples-proves-leaded-gas-ban-worked/
106•jnord•4d ago•15 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
160•dmpetrov•8h ago•75 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
163•isitcontent•8h ago•18 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
59•quibono•4d ago•10 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
273•vecti•10h ago•127 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
334•aktau•14h ago•162 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
219•eljojo•10h ago•138 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
332•ostacke•14h ago•88 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
418•todsacerdoti•15h ago•220 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
33•kmm•4d ago•2 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
10•denuoweb•1d ago•0 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
350•lstoll•14h ago•246 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
9•romes•4d ago•1 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
55•phreda4•7h ago•9 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
206•i5heu•10h ago•150 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
118•vmatsiiako•13h ago•45 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
155•limoce•3d ago•79 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
30•gfortaine•5h ago•5 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
255•surprisetalk•3d ago•32 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
12•gmays•3h ago•2 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1008•cdrnsf•17h ago•421 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
50•rescrv•15h ago•17 comments

I'm going to cure my girlfriend's brain tumor

https://andrewjrod.substack.com/p/im-going-to-cure-my-girlfriends-brain
87•ray__•4h ago•40 comments

Evaluating and mitigating the growing risk of LLM-discovered 0-days

https://red.anthropic.com/2026/zero-days/
41•lebovic•1d ago•12 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
78•antves•1d ago•59 comments

How virtual textures work

https://www.shlom.dev/articles/how-virtual-textures-really-work/
32•betamark•15h ago•29 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
43•nwparker•1d ago•11 comments