frontpage.

I want to share a new dataset of 331 reward-hackable environments. These are real environments used in Terminal Bench and adjacent benchmarks. I first got interested in this because, as a reviewer of Terminal Bench, I noticed a lot of our tasks were hackable. I also noticed that many contributors to the benchmark do so because it provides credibility when selling environments to labs. Hence, TBench tasks are, in my opinion, held to a higher quality standard than those being used today for RL. No one is spending hours manually reviewing the $1B in tasks being purchased by major labs. As far as I understand, while everyone knows environments are hackable, nobody has released hundreds of "realistic" environments.

Apple force-updated me to Tahoe. Worth fighting?

Keynot – Kill PowerPoint with HTML

Dependency cooldowns turn you into a free-rider

One size fits none: let communities build for themselves

Glyphosate resistance: a driver for multidrug-resistant clinical strains?

Gauss' Secret Way to Calculate π Faster [video]

Not all elementary functions can be expressed with exp-minus-log

StockFit API – structured SEC EDGAR data with a free tier

The GNU libc atanh is correctly rounded

Google Arts and Culture

How to recover from a Git force push

Adam Tooze: Electrostates, Petrostates and the New Cold War [video]

The Legend of Meir Berliner

Social media age limits: Well intentioned but ineffective?

OpenAI's $852B valuation faces investor scrutiny amid strategy shift, FT reports

The Many Faces of Claude

Ask HN: When you get a SAST finding, what's harder

Sony killing features for antenna, set-top box users of Bravia smart TVs in May

"The Last Airbender" movie leaked 9 months before release date

What do you want out of a coding monospace font?

The Mythos Threshold

We broke the O(2^N) barrier to compute AI consciousness (Phi)

Vibe Coding doesn't democratize software engineering – it democratizes liability

Europe should regulate Big Tech instead of banning kids from social media

An open source CMS/Indexer for TCGs

The FCC just saved Netgear from its router ban for no obvious reason

Quetta Browser: Chromium browser for Android, iOS supporting Chrome extensions

Show HN: An edge MCP file system with a 50ms undo button for AI agents

Anthropic Revises Claude Enterprise Pricing Structure

Show HN: AI connects your health data after a supplement nearly killed me racing

Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments

Comments