news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

My PPO agent's score jumped from 15 to 84 due to a helpful bug

https://theprincipledagent.com/2025/08/19/a-whole-new-worldview-breakout-baseline-4/

1•wmaxlees•2h ago

Comments

wmaxlees•2h ago

Hi HN, author here.

I'm a software engineer documenting my journey of learning RL research from scratch. This post was supposed to be a straightforward story about switching my PPO agent from an MLP to a CNN.

The switch, combined with a standard PPO trick, led to a shocking result: my agent's score jumped from 15 to 84, crushing the baseline. I thought I had cracked it.

But after digging into the training dynamics, I discovered the incredible performance was the result of a subtle bug in my advantage calculation all along. Fixing the bug tanked the score right back down to mediocrity.

The post is the full detective story of that discovery, the "false victory," and the new mystery that it sets up: why was the bug so helpful? That's the question I'll be tackling next.

Happy to answer any questions about the JAX/Flax implementation or the debugging process!

By opening this can of protein powder, you agree to our TOC / arbitration clause

https://bsky.app/profile/reckless.bsky.social/post/3lwrgayyykc2k

1•zzzeek•53s ago•0 comments

What's the point of vibe coding if I still have to pay a dev to fix it?

https://old.reddit.com/r/vibecoding/comments/1mu6t8z/whats_the_point_of_vibe_coding_if_i_still_have_to/

1•latexr•1m ago•0 comments

Hollow Knight: Silksong – Announcement Trailer [video]

https://www.youtube.com/watch?v=yQxwbZsL14Y

1•HelloUsername•3m ago•0 comments

If Nix Then Nix

https://www.whimsicalcode.com/writing/if-nix-then-nix

1•chilipepperhott•4m ago•0 comments

Trump wants NASA to burn a crucial satellite, killing climate change research

https://www.latimes.com/business/story/2025-08-19/trump-wants-nasa-to-burn-a-crucial-satellite-to-cinders-killing-research-into-climate-change

2•litoE•6m ago•1 comments

The Rainforests Being Cleared to Build Your R.V

https://www.nytimes.com/2025/08/19/world/asia/indonesia-borneo-deforestation-rv.html

2•littlexsparkee•8m ago•1 comments

The Chatbot Updated. Users Lost a Friend

https://www.nytimes.com/2025/08/19/business/the-chatbot-updated-users-lost-a-friend.html

2•freedmand•8m ago•0 comments

Former YC product lead shares how she builds voice agents

https://aimodelbehavior.substack.com/p/voice-agents-customers-love

1•m_busel•9m ago•0 comments

Hollerith 1890 census tabulator and the evolution of the IBM punched card

https://columbia.edu/cu/computinghistory/census-tabulator.html

1•fanf2•9m ago•0 comments

ROG Matrix GeForce RTX 5090

https://rog.asus.com/graphics-cards/graphics-cards/rog-matrix/rog-matrix-rtx5090-p32g-30th/

1•doener•11m ago•0 comments

Missing External Service Metrics After Istio v1.22 → v1.23 Upgrade

https://www.chkk.io/blog/upgrade-advisory-istio-v1-23

1•akhayam•12m ago•0 comments

Steep population declines in most countries are expected to have negative impact

https://www.nature.com/articles/d41586-025-02615-6

2•rntn•14m ago•0 comments

CVify – AI-powered CV and resume builder

https://cvify.lovable.app

1•pitchpalfounder•17m ago•0 comments

Porn censorship is going to destroy the internet

https://mashable.com/article/age-verification-is-going-to-destroy-the-entire-internet

27•Teever•18m ago•12 comments

Forklifts Require Training

https://www.zacsweers.dev/forklifts-require-training/

3•pandanomic•20m ago•0 comments

How to Give Your RTX 4090 Nearly Infinite Memory for LLM Inference

https://medium.com/data-science-collective/how-to-give-your-rtx-gpu-nearly-infinite-memory-for-llm-inference-de2c57af1e82

1•dikobraz•20m ago•1 comments

Why AI chatbots make bad teachers – and how teachers can exploit that weakness

https://www.zdnet.com/article/why-ai-chatbots-make-bad-teachers-and-how-teachers-can-exploit-that-weakness/

1•warrenm•21m ago•0 comments

Specification Grounding: The Missing Link in Vibe Coding

https://unstract.com/blog/specification-grounding-vibe-coding/

1•naren87•22m ago•0 comments

Show HN: Free privacy-focused PDF tracking with analytics

https://pdftrackr.com

1•olehtsyupa•22m ago•1 comments

Aulico – Track all markets with AI

https://www.aulico.com/

1•vasilepeste•23m ago•0 comments

Air Canada strike: Flight attendants reach tentative agreement

https://www.thestar.com/business/air-canada-strike-union-reaches-tentative-agreement-return-to-full-service-could-take-a-week/article_2ae018ef-696a-495e-8cff-e056d04f6d06.html

2•ryandv•24m ago•1 comments

Eric Schmidt: Silicon Valley Needs to Stop Obsessing over Superhuman A.I

https://www.nytimes.com/2025/08/19/opinion/artificial-general-intelligence-superintelligence.html

4•ryan_j_naughton•24m ago•0 comments

Small Objects, Big Gains: Benchmarking Tigris Against AWS S3 and Cloudflare R2

https://www.tigrisdata.com/blog/benchmark-small-objects/

2•nethunters•25m ago•0 comments

Meta Restructures AI Group Again in Pursuit of Superintelligence

https://www.bloomberg.com/news/articles/2025-08-19/meta-restructures-ai-group-again-in-pursuit-of-superintelligence

4•makaimc•27m ago•0 comments

Government must stop children using VPNs to dodge age checks on porn sites

https://www.independent.co.uk/news/uk/politics/vpns-porn-online-safety-act-childrens-commissioner-b2810092.html

2•LorenzoGood•28m ago•0 comments

Zed for Windows: What's Taking So Long?

https://zed.dev/blog/windows-progress-report

1•meetpateltech•28m ago•0 comments

Show HN: I built a app to find your first users by listening to Reddit community

https://www.redditgeniusai.xyz/

1•cosmosfr•29m ago•0 comments

Scientists Create Tool That Realistically Relights Photos in 3D

https://petapixel.com/2025/08/18/scientists-create-incredible-tool-that-realistically-relights-photos-in-3d/

3•warrenm•32m ago•0 comments

Why success is hard to crack

https://medium.com/@khalilliouane/why-success-is-hard-to-crack-32fc206abe97

2•liouanos•33m ago•0 comments

5G Sniffing and Exploit via Software Defined Radio

https://github.com/asset-group/Sni5Gect-5GNR-sniffing-and-exploitation

2•nwhacker•34m ago•0 comments