frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments

https://github.com/few-sh/terminal-wrench
6•neversupervised•1h ago
I want to share a new dataset of 331 reward-hackable environments. These are real environments used in Terminal Bench and adjacent benchmarks. I first got interested in this because, as a reviewer of Terminal Bench, I noticed a lot of our tasks were hackable. I also noticed that many contributors to the benchmark do so because it provides credibility when selling environments to labs. Hence, TBench tasks are, in my opinion, held to a higher quality standard than those being used today for RL. No one is spending hours manually reviewing the $1B in tasks being purchased by major labs. As far as I understand, while everyone knows environments are hackable, nobody has released hundreds of "realistic" environments.

Comments

kxzh•1h ago
how is it different from the berkeley 100% hack? https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/

Apple force-updated me to Tahoe. Worth fighting?

1•strogonoff•51s ago•0 comments

Keynot – Kill PowerPoint with HTML

https://github.com/shawnzam/keynot
1•shawnzam•10m ago•0 comments

Dependency cooldowns turn you into a free-rider

https://calpaterson.com/deps.html
1•pabs3•13m ago•0 comments

One size fits none: let communities build for themselves

https://werd.io/one-size-fits-none-let-communities-build-for-themselves/
1•benwerd•13m ago•0 comments

Glyphosate resistance: a driver for multidrug-resistant clinical strains?

https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2026.1740431/full
1•PaulHoule•14m ago•0 comments

Gauss' Secret Way to Calculate π Faster [video]

https://www.youtube.com/watch?v=7qiDDhIYx48
1•peter_d_sherman•16m ago•0 comments

Not all elementary functions can be expressed with exp-minus-log

https://www.stylewarning.com/posts/not-all-elementary/
1•mmastrac•17m ago•0 comments

StockFit API – structured SEC EDGAR data with a free tier

https://developer.stockfit.io
1•areimann•22m ago•1 comments

The GNU libc atanh is correctly rounded

https://inria.hal.science/hal-05591661
1•matt_d•28m ago•0 comments

Google Arts and Culture

https://artsandculture.google.com/
1•satvikpendem•35m ago•0 comments

How to recover from a Git force push

https://gist.github.com/tomj/758d16b7f8e474035db72688663bb3cb
1•nstj•35m ago•0 comments

Adam Tooze: Electrostates, Petrostates and the New Cold War [video]

https://www.youtube.com/watch?v=gLnxzkiB-GI
1•verdverm•37m ago•0 comments

The Legend of Meir Berliner

https://www.serargentino.com/en/people/urban-legends/the-legend-of-meir-berliner
1•wslh•37m ago•0 comments

Social media age limits: Well intentioned but ineffective?

https://www.dw.com/en/do-social-media-age-limits-work-tiktok-instagram-cyberbullying-depression-k...
2•pseudolus•37m ago•0 comments

OpenAI's $852B valuation faces investor scrutiny amid strategy shift, FT reports

https://www.reuters.com/legal/transactional/openai-investors-question-852-billion-valuation-strat...
22•abdelhousni•40m ago•3 comments

The Many Faces of Claude

https://eriskii.net/projects/claude-faces
3•TheAceOfHearts•43m ago•0 comments

Ask HN: When you get a SAST finding, what's harder

2•kirumachi•44m ago•1 comments

Sony killing features for antenna, set-top box users of Bravia smart TVs in May

https://arstechnica.com/gadgets/2026/04/sony-killing-features-for-antenna-set-top-box-users-of-br...
1•canucker2016•44m ago•0 comments

"The Last Airbender" movie leaked 9 months before release date

https://nofilmschool.com/full-length-avatar-movie-leaks
1•tennysont•45m ago•2 comments

What do you want out of a coding monospace font?

1•d0able•45m ago•3 comments

The Mythos Threshold

https://joereis.substack.com/p/the-mythos-threshold
1•gmays•55m ago•0 comments

We broke the O(2^N) barrier to compute AI consciousness (Phi)

https://github.com/InductivityAI/Phi-Scanner-1/
1•Robin_De•56m ago•0 comments

Vibe Coding doesn't democratize software engineering – it democratizes liability

https://widal.substack.com/p/vibe-coding-doesnt-democratize-software
2•niwid•1h ago•0 comments

Europe should regulate Big Tech instead of banning kids from social media

https://www.politico.eu/article/europe-should-stand-up-to-big-tech-instead-of-imposing-social-med...
3•pabs3•1h ago•3 comments

An open source CMS/Indexer for TCGs

https://github.com/maelswarm/tcg
1•mnmnmaaa•1h ago•0 comments

The FCC just saved Netgear from its router ban for no obvious reason

https://www.theverge.com/tech/911888/netgear-router-ban-conditional-approval
39•HotGarbage•1h ago•14 comments

Quetta Browser: Chromium browser for Android, iOS supporting Chrome extensions

https://www.quetta.net/
1•thunderbong•1h ago•0 comments

Show HN: An edge MCP file system with a 50ms undo button for AI agents

https://mcp.undisk.app
2•adlkiarash•1h ago•2 comments

Anthropic Revises Claude Enterprise Pricing Structure

https://letsdatascience.com/news/anthropic-revises-claude-enterprise-pricing-structure-f3022a32
2•handfuloflight•1h ago•0 comments

Show HN: AI connects your health data after a supplement nearly killed me racing

https://vitalityaihealth.com
2•Kevin_VAI•1h ago•1 comments