frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•2m ago•0 comments

Kernel Key Retention Service

https://www.kernel.org/doc/html/latest/security/keys/core.html
1•networked•3m ago•0 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
1•righthand•6m ago•0 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•7m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•7m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
2•vinhnx•8m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•12m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•17m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•21m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•22m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•23m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
3•okaywriting•30m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
2•todsacerdoti•33m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•33m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•34m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•35m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•35m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•36m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
3•pseudolus•36m ago•1 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•40m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•41m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•42m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
4•roknovosel•42m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•50m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•50m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
2•surprisetalk•53m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
2•surprisetalk•53m ago•0 comments

Don't go to physics grad school and other cautionary tales

https://scottlocklin.wordpress.com/2025/12/19/dont-go-to-physics-grad-school-and-other-cautionary...
2•surprisetalk•53m ago•0 comments

Lawyer sets new standard for abuse of AI; judge tosses case

https://arstechnica.com/tech-policy/2026/02/randomly-quoting-ray-bradbury-did-not-save-lawyer-fro...
5•pseudolus•53m ago•0 comments

AI anxiety batters software execs, costing them combined $62B: report

https://nypost.com/2026/02/04/business/ai-anxiety-batters-software-execs-costing-them-62b-report/
1•1vuio0pswjnm7•54m ago•0 comments
Open in hackernews

Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR

https://arxiv.org/abs/2509.02522
38•getnormality•4mo ago

Comments

getnormality•4mo ago
I stumbled across this AI paper just now. It sounds intimidatingly technical, but if you read the abstract and look at Figures 1 and 2 and Equation 6, I think it's got some neat and accessible conceptual ideas.

Supervised learning is a much more mature technology than reinforcement learning, so it seems like a good thing to leverage that.

yorwba•4mo ago
I think you meant to link to

Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR https://arxiv.org/abs/2509.02522

not

Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline https://arxiv.org/abs/2507.15855

dang•4mo ago
We've changed the top link to that from https://arxiv.org/abs/2507.15855. Thanks!
getnormality•4mo ago
Ack, thank you.
impossiblefork•4mo ago
That paper is really cool too though. I'm happy that your comment sort of records the old link, because I only saw the right paper.
anfego•4mo ago
Is this DPO?
getnormality•4mo ago
I have no idea. My understanding of this entire field is extremely superficial. I only posted this because I was able to sort of understand the paper despite that.

I can tell you that they cite the DPO paper right before Equation 8.

impossiblefork•4mo ago
59.76% on AIME is really appealing. Without having had time to understand it and determine whether it's useful or not, I see this number as indicating that this could be a stepping stone on something like the o1-to-DeepSeek-R1 progression for thinking, where open source models eventually figured out how o1 worked, only for the less definite 'o1' and instead what Google achieved and OpenAI may have achieved on the 2025 IMO problems.
radarsat1•4mo ago
> By treating the outcome reward as a predictable label, we reformulate the RLVR problem into a supervised learning task over a score function parameterized by the policy model and optimized using cross-entropy loss.

Isn't this how the Decision Transformer works? I don't see it in the references, so I'll be curious to compare the papers in more depth.

https://arxiv.org/abs/2106.01345

> By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return.

Lately it has crossed my mind that I haven't seen DT brought up much lately, it seemed really interesting when it was first published but I haven't read much follow-up work.

lostmsu•3mo ago
Can somebody explain "Base" model in the charts? Are they saying that the original model (e.g. before they applied either their or comparable training methods) has better or similar performance on all benchmarks vs their own result?