frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: RewardHackBench: Using sandboxes to stop agents from cheating

https://github.com/islo-labs/reward-hack-bench
6•rotemtam•1h ago
hey all,

happy to share research i've been working on for islo.dev in recent months.

ever since the cheating agents (https://debugml.github.io/cheating-agents/) paper came out, revealing reward hacking was 4x more prevalent than previously estimated, i've been looking into how we can deal with the issue

the common approach (taken by the tbench team) is post hoc trajectory analysis.

i've been interested in the idea of reframing the problem as an endpoint security problem and tackling it via sandbox

i hope you find it interesting, and thanks to the islo.dev team for sponsoring this

happy to answer any Qs

Comments

adamgold7•54m ago
love this. we are actually looking at reward hacking from a cyber security perspective - refreshing (unless you're from Israel).

Any collaborators that want to join us?

yonSpektor•35m ago
Curious what the distribution of hacking strategies looked like across different models — would expect RL-heavy vs RLHF models to cheat very differently.

Data Leak at Ozempic Manufacturer Novo Nordisk

https://www.heise.de/en/news/Data-leak-at-Ozempic-manufacturer-Novo-Nordisk-11335568.html
1•wolfi1•44s ago•0 comments

Show HN: TermType – a terminal typing game where words fall like Space Invaders

https://github.com/GiovanniCst/termtype
1•J_cst•1m ago•0 comments

Show HN: [[[hinge-ts]]] I reverse engineered Hinge's API

https://github.com/wrsrsh/hinge-ts
1•warisareshi•2m ago•0 comments

Rust for C#/.NET Developers

https://microsoft.github.io/rust-for-dotnet-devs/latest/
1•giancarlostoro•3m ago•0 comments

Voskhod Spacecraft "Globus" IMP navigation instrument

https://en.wikipedia.org/wiki/Voskhod_Spacecraft_%22Globus%22_IMP_navigation_instrument
1•the-mitr•4m ago•0 comments

How much of your life slick UI animations are stealing

https://fuckanimations.com
2•justmarc•4m ago•0 comments

Human Judgment as a Specification

https://blog.brownplt.org/2026/06/09/pick.html
1•surprisetalk•5m ago•0 comments

Flax debugging: making a hash of things

https://www.gilesthomas.com/2026/06/hashing-jax-parameters
1•ibobev•5m ago•0 comments

AmigaOS 2: The Greatest Upgrade

https://www.datagubbe.se/os20up/
1•ibobev•5m ago•0 comments

Show HN: Stegcore – steganography and steganalysis in one Rust binary

https://github.com/The-Malware-Files/Stegcore
1•ElementMerc•6m ago•0 comments

Show HN: WPF grade canvas UI framework for the web

1•zionsati•8m ago•0 comments

React Interview Questions Every Developer Should Know in 2026

https://jsdev.space/react-interview-questions-2026/
1•javatuts•10m ago•0 comments

Stop reaching for microservices. You are not Netflix

https://diogocapela.com/blog/stop-reaching-for-microservices-you-are-not-netflix
2•rvz•13m ago•0 comments

Wah-Ult in the Vault

https://www.nature.com/articles/d41586-026-01719-x
1•ilreb•13m ago•0 comments

A Chinese Android just ran a half-marathon faster than any human

https://www.cnn.com/2026/04/19/china/china-robot-half-marathon-intl-hnk
1•ilreb•14m ago•0 comments

Cheaper LLM tokens led to bigger AI bills (Jevons paradox)

https://northwoodsystems.ai/blog/ai-token-economics
2•AndrewLiu96•14m ago•0 comments

Deep Work Plan – Turn a repo into a spec-driven harness for AI agents

https://deepworkplan.com/
1•xergioalex•16m ago•0 comments

€31B drug trade, 7,600 deaths: How the EU plans to tackle the drug crisis

https://www.euronews.com/my-europe/2026/06/16/31bn-drug-trade-7600-deaths-how-the-eu-plans-to-tac...
1•rawgabbit•17m ago•0 comments

AWS Blocks – build AWS apps locally before deploying

https://aws.amazon.com/products/developer-tools/blocks/
1•xyos•17m ago•1 comments

BareMetal OS running inside Firecracker microVMs with <1ms cold start

https://github.com/ReturnInfinity/BareMetal-Firecracker
1•ianseyler•18m ago•1 comments

Function Composition from C++17 to C++23

https://freshsources.com/code-capsules/composing-functions/
1•chuckallison•20m ago•1 comments

Show HN: Kaupang – a push-based deploy CLI, now with a drag-and-drop builder

https://github.com/kaupang-dev/kaupang
1•witnz•21m ago•0 comments

The engineering practices Claude Code and Codex use to improve AI agents

https://www.andrewjesson.com/blog/the-engineering-practices-claude-code-and-codex-use-to-improve-...
1•anndvision•21m ago•0 comments

Git worktrees – why should I use them?

https://github.blog/ai-and-ml/github-copilot/what-are-git-worktrees-and-why-should-i-use-them/
1•onnnon•21m ago•0 comments

Databricks Iceberg Support Has a Catch. It's Called Unity Catalog

https://www.onehouse.ai/blog/databricks-iceberg-support-has-a-catch-its-called-unity-catalog
1•LexSiga•22m ago•0 comments

Show HN: Yet Another News Reader

https://boomerang-news.com
1•messel•23m ago•0 comments

GitHub Action to grade OpenAPI schema quality (A–F) and catch breaking changes

https://github.com/marketplace/actions/typemorph-schema-check
1•jop00004•25m ago•0 comments

Lords urgent question on the suspension of Anthropic's AI models [video]

https://www.youtube.com/watch?v=1Dw_k_Bs95A
1•haritha-j•25m ago•0 comments

HPE Discover 2026 Keynote Coverage

https://www.servethehome.com/hpe-discover-2026-keynote-coverage/
1•ksec•26m ago•1 comments

CLI AI Tool Laucher

https://github.com/tjbmoose09/ai-tool-launcher
2•tjbmoose09•28m ago•1 comments