frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

1•laurex•1m ago•0 comments

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

https://github.com/amtiYo/agents
1•amtiyo•2m ago•0 comments

Hello

1•otrebladih•4m ago•0 comments

FSD helped save my father's life during a heart attack

https://twitter.com/JJackBrandt/status/2019852423980875794
1•blacktulip•6m ago•0 comments

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

https://writtte.xyz
1•lasgawe•8m ago•0 comments

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]

https://www.youtube.com/watch?v=e9FUdOfp8ME
1•zeristor•10m ago•0 comments

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
2•gnufx•12m ago•0 comments

Transcribe your aunts post cards with Gemini 3 Pro

https://leserli.ch/ocr/
1•nielstron•16m ago•0 comments

.72% Variance Lance

1•mav5431•17m ago•0 comments

ReKindle – web-based operating system designed specifically for E-ink devices

https://rekindle.ink
1•JSLegendDev•19m ago•0 comments

Encrypt It

https://encryptitalready.org/
1•u1hcw9nx•19m ago•1 comments

NextMatch – 5-minute video speed dating to reduce ghosting

https://nextmatchdating.netlify.app/
1•Halinani8•20m ago•1 comments

Personalizing esketamine treatment in TRD and TRBD

https://www.frontiersin.org/articles/10.3389/fpsyt.2025.1736114
1•PaulHoule•21m ago•0 comments

SpaceKit.xyz – a browser‑native VM for decentralized compute

https://spacekit.xyz
1•astorrivera•22m ago•0 comments

NotebookLM: The AI that only learns from you

https://byandrev.dev/en/blog/what-is-notebooklm
1•byandrev•22m ago•1 comments

Show HN: An open-source starter kit for developing with Postgres and ClickHouse

https://github.com/ClickHouse/postgres-clickhouse-stack
1•saisrirampur•23m ago•0 comments

Game Boy Advance d-pad capacitor measurements

https://gekkio.fi/blog/2026/game-boy-advance-d-pad-capacitor-measurements/
1•todsacerdoti•23m ago•0 comments

South Korean crypto firm accidentally sends $44B in bitcoins to users

https://www.reuters.com/world/asia-pacific/crypto-firm-accidentally-sends-44-billion-bitcoins-use...
2•layer8•24m ago•0 comments

Apache Poison Fountain

https://gist.github.com/jwakely/a511a5cab5eb36d088ecd1659fcee1d5
1•atomic128•26m ago•2 comments

Web.whatsapp.com appears to be having issues syncing and sending messages

http://web.whatsapp.com
1•sabujp•26m ago•2 comments

Google in Your Terminal

https://gogcli.sh/
1•johlo•27m ago•0 comments

Shannon: Claude Code for Pen Testing: #1 on Github today

https://github.com/KeygraphHQ/shannon
1•hendler•28m ago•0 comments

Anthropic: Latest Claude model finds more than 500 vulnerabilities

https://www.scworld.com/news/anthropic-latest-claude-model-finds-more-than-500-vulnerabilities
2•Bender•32m ago•0 comments

Brooklyn cemetery plans human composting option, stirring interest and debate

https://www.cbsnews.com/newyork/news/brooklyn-green-wood-cemetery-human-composting/
1•geox•32m ago•0 comments

Why the 'Strivers' Are Right

https://greyenlightenment.com/2026/02/03/the-strivers-were-right-all-along/
1•paulpauper•34m ago•0 comments

Brain Dumps as a Literary Form

https://davegriffith.substack.com/p/brain-dumps-as-a-literary-form
1•gmays•34m ago•0 comments

Agentic Coding and the Problem of Oracles

https://epkconsulting.substack.com/p/agentic-coding-and-the-problem-of
1•qingsworkshop•35m ago•0 comments

Malicious packages for dYdX cryptocurrency exchange empties user wallets

https://arstechnica.com/security/2026/02/malicious-packages-for-dydx-cryptocurrency-exchange-empt...
1•Bender•35m ago•0 comments

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

https://github.com/pheonix-delta/axiom-voice-agent
1•shubham-coder•35m ago•0 comments

Penisgate erupts at Olympics; scandal exposes risks of bulking your bulge

https://arstechnica.com/health/2026/02/penisgate-erupts-at-olympics-scandal-exposes-risks-of-bulk...
4•Bender•36m ago•0 comments
Open in hackernews

Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR

https://arxiv.org/abs/2509.02522
38•getnormality•4mo ago

Comments

getnormality•4mo ago
I stumbled across this AI paper just now. It sounds intimidatingly technical, but if you read the abstract and look at Figures 1 and 2 and Equation 6, I think it's got some neat and accessible conceptual ideas.

Supervised learning is a much more mature technology than reinforcement learning, so it seems like a good thing to leverage that.

yorwba•4mo ago
I think you meant to link to

Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR https://arxiv.org/abs/2509.02522

not

Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline https://arxiv.org/abs/2507.15855

dang•4mo ago
We've changed the top link to that from https://arxiv.org/abs/2507.15855. Thanks!
getnormality•4mo ago
Ack, thank you.
impossiblefork•4mo ago
That paper is really cool too though. I'm happy that your comment sort of records the old link, because I only saw the right paper.
anfego•4mo ago
Is this DPO?
getnormality•4mo ago
I have no idea. My understanding of this entire field is extremely superficial. I only posted this because I was able to sort of understand the paper despite that.

I can tell you that they cite the DPO paper right before Equation 8.

impossiblefork•4mo ago
59.76% on AIME is really appealing. Without having had time to understand it and determine whether it's useful or not, I see this number as indicating that this could be a stepping stone on something like the o1-to-DeepSeek-R1 progression for thinking, where open source models eventually figured out how o1 worked, only for the less definite 'o1' and instead what Google achieved and OpenAI may have achieved on the 2025 IMO problems.
radarsat1•4mo ago
> By treating the outcome reward as a predictable label, we reformulate the RLVR problem into a supervised learning task over a score function parameterized by the policy model and optimized using cross-entropy loss.

Isn't this how the Decision Transformer works? I don't see it in the references, so I'll be curious to compare the papers in more depth.

https://arxiv.org/abs/2106.01345

> By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return.

Lately it has crossed my mind that I haven't seen DT brought up much lately, it seemed really interesting when it was first published but I haven't read much follow-up work.

lostmsu•3mo ago
Can somebody explain "Base" model in the charts? Are they saying that the original model (e.g. before they applied either their or comparable training methods) has better or similar performance on all benchmarks vs their own result?