frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Empusa – Visual debugger to catch and resume AI agent retry loops

https://github.com/justin55afdfdsf5ds45f4ds5f45ds4/EmpusaAI
1•justinlord•21s ago•0 comments

Show HN: Bitcoin wallet on NXP SE050 secure element, Tor-only open source

https://github.com/0xdeadbeefnetwork/sigil-web
1•sickthecat•2m ago•0 comments

White House Explores Opening Antitrust Probe on Homebuilders

https://www.bloomberg.com/news/articles/2026-02-06/white-house-explores-opening-antitrust-probe-i...
1•petethomas•2m ago•0 comments

Show HN: MindDraft – AI task app with smart actions and auto expense tracking

https://minddraft.ai
1•imthepk•7m ago•0 comments

How do you estimate AI app development costs accurately?

1•insights123•8m ago•0 comments

Going Through Snowden Documents, Part 5

https://libroot.org/posts/going-through-snowden-documents-part-5/
1•goto1•9m ago•0 comments

Show HN: MCP Server for TradeStation

https://github.com/theelderwand/tradestation-mcp
1•theelderwand•12m ago•0 comments

Canada unveils auto industry plan in latest pivot away from US

https://www.bbc.com/news/articles/cvgd2j80klmo
1•breve•13m ago•0 comments

The essential Reinhold Niebuhr: selected essays and addresses

https://archive.org/details/essentialreinhol0000nieb
1•baxtr•15m ago•0 comments

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

https://www.forbes.com/sites/ronschmelzer/2026/02/05/when-ai-agents-start-hiring-humans-rentahuma...
1•tempodox•17m ago•0 comments

StovexGlobal – Compliance Gaps to Note

1•ReviewShield•20m ago•1 comments

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

https://afelyon.com/
1•AbduNebu•21m ago•0 comments

Trump says America should move on from Epstein – it may not be that easy

https://www.bbc.com/news/articles/cy4gj71z0m0o
5•tempodox•21m ago•1 comments

Tiny Clippy – A native Office Assistant built in Rust and egui

https://github.com/salva-imm/tiny-clippy
1•salvadorda656•26m ago•0 comments

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

https://www.youtube.com/watch?v=cmMQbsOTX-o
1•adityaathalye•29m ago•0 comments

US moves to deport 5-year-old detained in Minnesota

https://www.reuters.com/legal/government/us-moves-deport-5-year-old-detained-minnesota-2026-02-06/
4•petethomas•32m ago•2 comments

If you lose your passport in Austria, head for McDonald's Golden Arches

https://www.cbsnews.com/news/us-embassy-mcdonalds-restaurants-austria-hotline-americans-consular-...
1•thunderbong•37m ago•0 comments

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

https://github.com/chenyanchen/mermaid-formatter
1•astm•52m ago•0 comments

RFCs vs. READMEs: The Evolution of Protocols

https://h3manth.com/scribe/rfcs-vs-readmes/
2•init0•59m ago•1 comments

Kanchipuram Saris and Thinking Machines

https://altermag.com/articles/kanchipuram-saris-and-thinking-machines
1•trojanalert•59m ago•0 comments

Chinese chemical supplier causes global baby formula recall

https://www.reuters.com/business/healthcare-pharmaceuticals/nestle-widens-french-infant-formula-r...
2•fkdk•1h ago•0 comments

I've used AI to write 100% of my code for a year as an engineer

https://old.reddit.com/r/ClaudeCode/comments/1qxvobt/ive_used_ai_to_write_100_of_my_code_for_1_ye...
2•ukuina•1h ago•1 comments

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

1•au-ai-aisl•1h ago•1 comments

AI-native capabilities, a new API Catalog, and updated plans and pricing

https://blog.postman.com/new-capabilities-march-2026/
1•thunderbong•1h ago•0 comments

What changed in tech from 2010 to 2020?

https://www.tedsanders.com/what-changed-in-tech-from-2010-to-2020/
3•endorphine•1h ago•0 comments

From Human Ergonomics to Agent Ergonomics

https://wesmckinney.com/blog/agent-ergonomics/
1•Anon84•1h ago•0 comments

Advanced Inertial Reference Sphere

https://en.wikipedia.org/wiki/Advanced_Inertial_Reference_Sphere
1•cyanf•1h ago•0 comments

Toyota Developing a Console-Grade, Open-Source Game Engine with Flutter and Dart

https://www.phoronix.com/news/Fluorite-Toyota-Game-Engine
2•computer23•1h ago•0 comments

Typing for Love or Money: The Hidden Labor Behind Modern Literary Masterpieces

https://publicdomainreview.org/essay/typing-for-love-or-money/
1•prismatic•1h ago•0 comments

Show HN: A longitudinal health record built from fragmented medical data

https://myaether.live
1•takmak007•1h ago•0 comments
Open in hackernews

NeurIPS 2025 Best Paper Awards

https://blog.neurips.cc/2025/11/26/announcing-the-neurips-2025-best-paper-awards/
176•ivansavz•2mo ago

Comments

Scene_Cast2•2mo ago
I think my favorite of the bunch is the "Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model" paper. Easy to read, gets the point across very intuitively and quickly, and the point is very interesting and relevant to a lot of people.

About the Superposition paper - this is close to what I've been thinking about over the past week. I'm thinking that concepts or choices in a "superposition" are harder for a fully-differentiable neural net to reason about. For example, if there's a "green" vs "purple" choice to be made, it can't fully commit to either (especially if they're 50-50), and will have to reason about both simultaneously (difficult due to nonlinear manifold space). Discretizing to tokens (non-differentiable argmax) forces a choice, and that allows it to reason about a single concept separately and easier.

energy123•2mo ago
I am not sure how to interpret the first paper's results.

If we use a random number generator then we will converge to 100% correct answers under pass@n in the limit.

A random number generator will eventually outperform or match all models (for large n) whenever top-p is less than 1 because the other models will most likely have some level of bias that makes correct CoTs mathematically impossible due to the tokens being too improbable and being filtered out by top-p, meaning that other models will asymptote to below 100% while the RNG will reach 100% in an almost surely sense.

Under this paper's logic doesn't that mean that the random number generator is a superior reasoner?

Scene_Cast2•2mo ago
I'm not sure how likely it is that an answer would fall outside of the top-p of 0.95 (used in the paper). A random number generator would also need an unreasonably high number of samples to get a correct answer. I think figures 17 and 18 are interesting for this discussion too, they show performance at various sampling temperatures. I think the point of the paper is that RL "sharpens" the distribution of non-RL nets, but it does not uncover any new reasoning paths - non-RL nets already had multiple decently high probability paths of answering questions to begin with, and RL reuses a subset of those.
energy123•2mo ago

  > I think the point of the paper is that RL "sharpens" the distribution of non-RL nets, but it does not uncover any new reasoning paths
This is an implication of the results that's intuitive and likely to be correct, but isn't guaranteed to be correct. The results do show worse answer correctness for large k. But answers and reasoning strategies to arrive at these answers are different things. It's impractical to inspect the CoTs in both the RL and Base to show that all the reasoning strategies used by the former are a subset of the latter. For all we know the venn diagram might not be fully overlapping. It could be that the RL did uncover some novel and subtle reasoning strategies not present in the Base, but it also introduced separate handicaps for some unknown reason, which nerfed answer correctness for large k. We need some theory to bridge that understanding which seems lacking in the paper? Not that I fault them for an absence of such a theory because it seems intractable. But then I am doubtful one could reach such a neat conclusion as they have tried to do, beyond the appeal to strong intuition (which I also share).
Scene_Cast2•2mo ago
Ah, I think I agree. There could be a potential unrelated handicap, so there is a lack of a guarantee or a proof.
robrenaud•2mo ago
I agree that pass@k feels a bit weird for large k. But for LLMs, it's a decent proxy for "are the knowledge/skills/circuit necessary to solve the problem somewhere in the model". Note that choices for large k is on the order of 256, and the range of valid answers is much larger than that. So your infinite monkeys critique, while true in the limit, wouldn't actually outperform models in the tested regime.

Also, in practice, models don't have that much semantic entropy of a given prompt. With temperature based sampling, models will tend to generate very similar but not identical responses.

boroboro4•2mo ago
To me intellect has two parts to it: "creativity" and "correctness". And from this perspective random sampler is infinitely "creative" - over (infinite) time it can come up with answer to any given problem. And from this perspective it does feel natural that base models are more "creative" (because that's what being measured in the paper), while RL models are more "correct" (that's a slope of the curve from the paper).
tipsytoad•2mo ago
It’s a quite deceptive paper. The main headline benchmarks (math500, aime24 /25) final answer is just a number from 0-1000, so what is the takeaway supposed to be for pass@k of 512/1024?

On the unstructured outputs, where you can’t just ratchet up the pass@k until it’s almost random, it switches the base model out for instruct, and in the worse case on livecodebench it uses a qwen-r1-distill as a _base_ model(!?) that’s an instruct model further fine tuned on R1’s reasoning traces. I assume that was because no matter how high the pass@k, a base model won’t output correct python.

mountainriver•2mo ago
> "Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model"

I believe NVidia’s ProRL showed otherwise right?

ilaksh•2mo ago
Does some have a similar award for papers that are innovative? Like new, relatively unproven architectures?
robrenaud•2mo ago
From what I've seen at neurips, in terms of most different but maybe viable, it would be this.

https://sakana.ai/ctm/

In terms of a fresh perspective on designing learning systems, nested learning seems very interesting.

https://abehrouz.github.io/files/NL.pdf

Hearing the clarity, creativity, and force behind his thoughts and speech, I'd give a more than 1/200 chance Ali Behrouz gets himself a Turing award. At the very least, I think he will end making major contributions to AI.

djrhails•2mo ago
There is TITANS - https://arxiv.org/abs/2501.00663
chermi•2mo ago
Interesting that 3 names I recognized as physicists from stat mech adjacent fields. They continue to punch above their expectations (as sampled by general dismissal of physicists in AI/ML on HN and reddit).
chatmasta•2mo ago
Some of the best software engineers I know are ex-physics PhDs… it’s one of those “can’t fake it” skillsets that also happens to have high transferability to ML/AI fields. On the other hand, I snuck through the CS major without ever multiplying a matrix.
ctxc•2mo ago
Haha, nice bio. Seeing that font on HN is quite a shock.
miki123211•2mo ago
> I snuck through the CS major without ever multiplying a matrix

I didn't, but only because I became personally interested in AI/ML at some point, so I actually had to learn it myself.

As an AI practitioner, I still couldn't explain eigenvectors or singular-value decomposition to you though.

mnky9800n•2mo ago
Do people not like physicists?
jmalicki•2mo ago
https://xkcd.com/793/ captures the stereotype well.
peterfirefly•2mo ago
Especially because those annoying dilettante know-it-all physicists are often right.
niceguy4•2mo ago
Are there any talks about these papers on youtube or somewhere? I think I find it easier to listen and watch then read or maybe I'm just lazy, not sure.
neves•2mo ago
There conference had interesting lectures. Will they be posted online?
cosmic_ape•2mo ago
most papers have slides with audio, and some, including the awards ones will have short frontal talks. this will be released at some point after the conference, but right now looks like you'd have to be registered to see it.
FrozenSynapse•2mo ago
use NotebookLM
cubefox•2mo ago
Whenever I search for the title a new machine learning paper, there are a bunch of YouTube videos about it which are just NotebookLM slop. It's straight up environmental pollution.
Der_Einzige•2mo ago
One of the most popular of those slop videos was about our antislop sampler. Ironic.

https://youtu.be/PHSqcdIc5gM?si=I62bduoDgnlNFPZ6

niceguy4•2mo ago
Wow! What a cool project! Thank you for the suggestion.
gradascent•2mo ago
From the figure in the first paper listed:

> Responses to the query “Write a metaphor about time” clustered by applying PCA to reduce sentence embeddings to two dimensions. […] The responses form just two primary clusters: a dominant cluster on the left centered on the metaphor “time is a river,” and a smaller cluster on the right revolving around variations of “time is a weaver.”

I just gave Gemini 3 the same prompt and got something quite different:

>Time is a patient wind against the cliff face of memory. It does not strike with a hammer to break us; it simply breathes, grain by grain, until the sharp edges of grief are smoothed into rolling hills, and the names we thought were carved in stone are weathered into soft whispers.

SiempreViernes•2mo ago
Constantly flowing and makes things smooth like river stones; compared to Tait's "time is a series if staric pictures", Gemini's output is not so different from a river metaphor.
djoldman•2mo ago
Oh man, this link is worth it just for the "Reflections from the Selection Committee."

These days, abstracts are so marketing/advertising forward that it's hard to even understand the claim.

yanhangyhy•2mo ago
seems lots of chinese.