frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The purpose of Continuous Integration is to fail

https://blog.nix-ci.com/post/2026-02-05_the-purpose-of-ci-is-to-fail
1•zdw•2m ago•0 comments

Apfelstrudel: Live coding music environment with AI agent chat

https://github.com/rcarmo/apfelstrudel
1•rcarmo•3m ago•0 comments

What Is Stoicism?

https://stoacentral.com/guides/what-is-stoicism
3•0xmattf•3m ago•0 comments

What happens when a neighborhood is built around a farm

https://grist.org/cities/what-happens-when-a-neighborhood-is-built-around-a-farm/
1•Brajeshwar•3m ago•0 comments

Every major galaxy is speeding away from the Milky Way, except one

https://www.livescience.com/space/cosmology/every-major-galaxy-is-speeding-away-from-the-milky-wa...
2•Brajeshwar•3m ago•0 comments

Extreme Inequality Presages the Revolt Against It

https://www.noemamag.com/extreme-inequality-presages-the-revolt-against-it/
1•Brajeshwar•4m ago•0 comments

There's no such thing as "tech" (Ten years later)

1•dtjb•4m ago•0 comments

What Really Killed Flash Player: A Six-Year Campaign of Deliberate Platform Work

https://medium.com/@aglaforge/what-really-killed-flash-player-a-six-year-campaign-of-deliberate-p...
1•jbegley•5m ago•0 comments

Ask HN: Anyone orchestrating multiple AI coding agents in parallel?

1•buildingwdavid•6m ago•0 comments

Show HN: Knowledge-Bank

https://github.com/gabrywu-public/knowledge-bank
1•gabrywu•12m ago•0 comments

Show HN: The Codeverse Hub Linux

https://github.com/TheCodeVerseHub/CodeVerseLinuxDistro
3•sinisterMage•13m ago•2 comments

Take a trip to Japan's Dododo Land, the most irritating place on Earth

https://soranews24.com/2026/02/07/take-a-trip-to-japans-dododo-land-the-most-irritating-place-on-...
2•zdw•13m ago•0 comments

British drivers over 70 to face eye tests every three years

https://www.bbc.com/news/articles/c205nxy0p31o
14•bookofjoe•13m ago•5 comments

BookTalk: A Reading Companion That Captures Your Voice

https://github.com/bramses/BookTalk
1•_bramses•14m ago•0 comments

Is AI "good" yet? – tracking HN's sentiment on AI coding

https://www.is-ai-good-yet.com/#home
1•ilyaizen•15m ago•1 comments

Show HN: Amdb – Tree-sitter based memory for AI agents (Rust)

https://github.com/BETAER-08/amdb
1•try_betaer•16m ago•0 comments

OpenClaw Partners with VirusTotal for Skill Security

https://openclaw.ai/blog/virustotal-partnership
2•anhxuan•16m ago•0 comments

Show HN: Seedance 2.0 Release

https://seedancy2.com/
2•funnycoding•17m ago•0 comments

Leisure Suit Larry's Al Lowe on model trains, funny deaths and Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
1•thelok•17m ago•0 comments

Towards Self-Driving Codebases

https://cursor.com/blog/self-driving-codebases
1•edwinarbus•17m ago•0 comments

VCF West: Whirlwind Software Restoration – Guy Fedorkow [video]

https://www.youtube.com/watch?v=YLoXodz1N9A
1•stmw•18m ago•1 comments

Show HN: COGext – A minimalist, open-source system monitor for Chrome (<550KB)

https://github.com/tchoa91/cog-ext
1•tchoa91•19m ago•1 comments

FOSDEM 26 – My Hallway Track Takeaways

https://sluongng.substack.com/p/fosdem-26-my-hallway-track-takeaways
1•birdculture•19m ago•0 comments

Show HN: Env-shelf – Open-source desktop app to manage .env files

https://env-shelf.vercel.app/
1•ivanglpz•23m ago•0 comments

Show HN: Almostnode – Run Node.js, Next.js, and Express in the Browser

https://almostnode.dev/
1•PetrBrzyBrzek•23m ago•0 comments

Dell support (and hardware) is so bad, I almost sued them

https://blog.joshattic.us/posts/2026-02-07-dell-support-lawsuit
1•radeeyate•24m ago•0 comments

Project Pterodactyl: Incremental Architecture

https://www.jonmsterling.com/01K7/
1•matt_d•24m ago•0 comments

Styling: Search-Text and Other Highlight-Y Pseudo-Elements

https://css-tricks.com/how-to-style-the-new-search-text-and-other-highlight-pseudo-elements/
1•blenderob•26m ago•0 comments

Crypto firm accidentally sends $40B in Bitcoin to users

https://finance.yahoo.com/news/crypto-firm-accidentally-sends-40-055054321.html
1•CommonGuy•26m ago•0 comments

Magnetic fields can change carbon diffusion in steel

https://www.sciencedaily.com/releases/2026/01/260125083427.htm
1•fanf2•27m ago•0 comments
Open in hackernews

Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR

https://arxiv.org/abs/2509.02522
38•getnormality•4mo ago

Comments

getnormality•4mo ago
I stumbled across this AI paper just now. It sounds intimidatingly technical, but if you read the abstract and look at Figures 1 and 2 and Equation 6, I think it's got some neat and accessible conceptual ideas.

Supervised learning is a much more mature technology than reinforcement learning, so it seems like a good thing to leverage that.

yorwba•4mo ago
I think you meant to link to

Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR https://arxiv.org/abs/2509.02522

not

Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline https://arxiv.org/abs/2507.15855

dang•4mo ago
We've changed the top link to that from https://arxiv.org/abs/2507.15855. Thanks!
getnormality•4mo ago
Ack, thank you.
impossiblefork•4mo ago
That paper is really cool too though. I'm happy that your comment sort of records the old link, because I only saw the right paper.
anfego•4mo ago
Is this DPO?
getnormality•4mo ago
I have no idea. My understanding of this entire field is extremely superficial. I only posted this because I was able to sort of understand the paper despite that.

I can tell you that they cite the DPO paper right before Equation 8.

impossiblefork•4mo ago
59.76% on AIME is really appealing. Without having had time to understand it and determine whether it's useful or not, I see this number as indicating that this could be a stepping stone on something like the o1-to-DeepSeek-R1 progression for thinking, where open source models eventually figured out how o1 worked, only for the less definite 'o1' and instead what Google achieved and OpenAI may have achieved on the 2025 IMO problems.
radarsat1•4mo ago
> By treating the outcome reward as a predictable label, we reformulate the RLVR problem into a supervised learning task over a score function parameterized by the policy model and optimized using cross-entropy loss.

Isn't this how the Decision Transformer works? I don't see it in the references, so I'll be curious to compare the papers in more depth.

https://arxiv.org/abs/2106.01345

> By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return.

Lately it has crossed my mind that I haven't seen DT brought up much lately, it seemed really interesting when it was first published but I haven't read much follow-up work.

lostmsu•3mo ago
Can somebody explain "Base" model in the charts? Are they saying that the original model (e.g. before they applied either their or comparable training methods) has better or similar performance on all benchmarks vs their own result?