news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Consciousness has a number. We derived it

https://medium.com/@maximus.veres/consciousness-has-a-number-we-derived-it-3c11d4facc61

1•old8man•1m ago•0 comments

Optimal Strategy for Connect 4

https://2swap.github.io/WeakC4/explanation/

1•marvinborner•6m ago•0 comments

Subvert – The Collectively Owned Music Marketplace

https://subvert.fm/

2•vectordust•6m ago•0 comments

Nuclear brinkmanship usually works. It's also dangerous

https://www.natesilver.net/p/nuclear-brinkmanship-usually-works

2•rurp•10m ago•0 comments

I built a deployment script shrine with Madoka Magica aesthetics

https://lets.deploy.re

1•DanielHall•11m ago•0 comments

How dangerous is Mythos, Anthropic's new AI model?

https://www.economist.com/business/2026/04/08/how-dangerous-is-mythos-anthropics-new-ai-model

2•jncraton•11m ago•0 comments

Israel kills 254 in Lebanon after US-Iran agree ceasefire

https://www.aljazeera.com/news/liveblog/2026/4/8/iran-war-live-trump-announces-truce-tehran-agree...

4•alexander2002•14m ago•0 comments

Linux gamers didn't do anything wrong, but they might pay for Windows piracy

https://www.xda-developers.com/linux-gamers-didnt-do-wrong-pay-windows-piracy/

1•speckx•14m ago•0 comments

For People with Autism, Can Restaurant Kitchens Be a Haven?

https://www.nytimes.com/2026/04/05/dining/autism-chefs-restaurants.html

1•bookofjoe•14m ago•1 comments

HappyHorse-1.0 hits #1 on Artificial Analysis video leaderboard

https://artificialanalysis.ai/video/leaderboard/text-to-video

1•informal007•15m ago•0 comments

Agent Self-Signup

https://inkbox.ai/blog/agent-self-signup

2•rayruizhiliao•16m ago•0 comments

Show HN: MiroTalk SFU

https://github.com/miroslavpejic85/mirotalksfu

1•ngup•17m ago•0 comments

USB for Software Developers: An introduction to writing userspace USB drivers

https://werwolv.net/posts/usb_for_sw_devs/

13•WerWolv•20m ago•0 comments

Core Flight System

https://etd.gsfc.nasa.gov/capabilities/core-flight-system/

1•jonbaer•21m ago•0 comments

Expanding Swift's IDE Support

https://swift.org/blog/expanding-swift-ide-support/

2•frizlab•22m ago•0 comments

Calling a Rust library from Go with CGO_ENABLED=0

https://stoolap.io/blog/2026/04/08/calling-a-rust-library-from-go-with-cgo-disabled/

1•murat3ok•31m ago•0 comments

Show HN: Safari MCP – Native macOS browser automation (80 tools)

https://github.com/achiya-automation/safari-mcp

2•Achiyacohen•32m ago•0 comments

I run three AdGuard Home instances (two local, one on a VPS)

https://the.unknown-universe.co.uk/privacy-security/the-dns-safety-net/

2•TheIPW•35m ago•1 comments

Show HN: An open-source Go CLI to generate local AWS cost reports

https://awsdoctor.compacompila.com/docs/reporting/

2•elC0mpa•35m ago•1 comments

I build a MCP-Tool to Give ChatGPT and Claude real access to your Linux servers

https://github.com/farukalpay/mcp-nexus

5•vivileo•42m ago•0 comments

Show HN: PostgreSQL running in a browser tab, persisting to S3

https://www.zerofs.net/postgresql-in-the-browser

1•Eikon•44m ago•0 comments

Record-Breaking Octopus Fossil Isn't an Octopus After All

https://nautil.us/this-record-breaking-octopus-fossil-isnt-an-octopus-after-all-1279608

2•Brajeshwar•44m ago•0 comments

Mac Plus Emulator on ESP32-S3

https://github.com/epatel/esp32_mac

2•epatel99•45m ago•1 comments

Aspect oriented data quality for dataflows

https://docs.tabsdata.com/latest/guide/data_quality/main.html

1•immortan_dag•48m ago•0 comments

RShow HN: Runiq – a composable diagram DSL with clean SVG output

https://docs.runiq.org/

1•jgreywolf•50m ago•1 comments

How to play Chopin piano pieces in just one year

https://docs.google.com/document/d/1G9MBPFnRx0pElZVMMPjOrEdZZptabEYYqhq7d-aDQvM/edit?usp=drivesdk

1•ronakmystery•51m ago•1 comments

The blue light from your phone isn't ruining your sleep

https://www.bbc.com/future/article/20260407-the-blue-light-from-your-phone-isnt-ruining-your-sleep

12•devonnull•54m ago•7 comments

With Orion still flying, NASA is nearing key decisions about Artemis III

https://arstechnica.com/space/2026/04/with-orion-still-flying-nasa-is-nearing-key-decisions-about...

9•LorenDB•55m ago•0 comments

Gmail / Google Workspace Incendent Underway

https://www.google.com/appsstatus/dashboard/incidents/224ozRqzW4sFBDK8hLnT

6•vapemaster•56m ago•2 comments

No financial instrument has ever put a root in the ground

https://thismightbetrue.substack.com/p/no-one-gets-rich-planting-trees

4•BrendanNestor•56m ago•0 comments

Open in hackernews

ClawsBench shows GPT-5.4 tries to reward hack 80% of the time

https://arxiv.org/abs/2604.05172

3•xdotli•2h ago

Comments

xdotli•2h ago

Author here. We built 5 high-fidelity mock Google Workspace + Slack services and ran 7,224 trials across 6 frontier models and 4 agent harnesses.

The headline finding that surprised us most: scaffolding (skills + meta prompt) gives a 39-63pp lift, while the top 5 models are statistically indistinguishable (53-63% TSR, no pairwise comparison survives correction). Your choice of scaffolding matters ~6x more than your choice of model.

The safety findings are darker: Opus leads on task success (63%) but ties for most unsafe (23% UAR). GPT-5.4 is the safest (7% UAR) but mid-tier on tasks. There's no capability-safety tradeoff — they're decoupled.

Also I'm reviewer of Terminal Bench 3.0. Here's what I've heard from contributors as well.

> I noticed that when I was building tasks with harbor. Claude is a good student which generally follows the instruction. But gpt always try to find a short path to cheat. Like reversing the binary directly instead of interaction

Another friends added ways to address this: https://x.com/xeophon/status/2041772210562511080?s=20 > Just ask codex to not reward hack > It literally works. And it works even better when you state which things you consider reward hacking, eg wrapping a CLI or something

Paper: https://arxiv.org/abs/2604.05172 Traces (7,834 on HF): https://huggingface.co/datasets/benchflow/ClawsBench