frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenClaw ClawHub Broken Windows Theory – If basic sorting isn't working what is?

https://www.loom.com/embed/e26a750c0c754312b032e2290630853d
1•kaicianflone•26s ago•0 comments

OpenBSD Copyright Policy

https://www.openbsd.org/policy.html
1•Panino•1m ago•0 comments

OpenClaw Creator: Why 80% of Apps Will Disappear

https://www.youtube.com/watch?v=4uzGDAoNOZc
1•schwentkerr•5m ago•0 comments

What Happens When Technical Debt Vanishes?

https://ieeexplore.ieee.org/document/11316905
1•blenderob•6m ago•0 comments

AI Is Finally Eating Software's Total Market: Here's What's Next

https://vinvashishta.substack.com/p/ai-is-finally-eating-softwares-total
1•gmays•6m ago•0 comments

Computer Science from the Bottom Up

https://www.bottomupcs.com/
1•gurjeet•7m ago•0 comments

Show HN: I built a toy compiler as a young dev

https://vire-lang.web.app
1•xeouz•8m ago•0 comments

You don't need Mac mini to run OpenClaw

https://runclaw.sh
1•rutagandasalim•9m ago•0 comments

Learning to Reason in 13 Parameters

https://arxiv.org/abs/2602.04118
1•nicholascarolan•11m ago•0 comments

Convergent Discovery of Critical Phenomena Mathematics Across Disciplines

https://arxiv.org/abs/2601.22389
1•energyscholar•11m ago•1 comments

Ask HN: Will GPU and RAM prices ever go down?

1•alentred•12m ago•0 comments

From hunger to luxury: The story behind the most expensive rice (2025)

https://www.cnn.com/travel/japan-expensive-rice-kinmemai-premium-intl-hnk-dst
2•mooreds•13m ago•0 comments

Substack makes money from hosting Nazi newsletters

https://www.theguardian.com/media/2026/feb/07/revealed-how-substack-makes-money-from-hosting-nazi...
5•mindracer•14m ago•1 comments

A New Crypto Winter Is Here and Even the Biggest Bulls Aren't Certain Why

https://www.wsj.com/finance/currencies/a-new-crypto-winter-is-here-and-even-the-biggest-bulls-are...
1•thm•14m ago•0 comments

Moltbook was peak AI theater

https://www.technologyreview.com/2026/02/06/1132448/moltbook-was-peak-ai-theater/
1•Brajeshwar•14m ago•0 comments

Why Claude Cowork is a math problem Indian IT can't solve

https://restofworld.org/2026/indian-it-ai-stock-crash-claude-cowork/
1•Brajeshwar•15m ago•0 comments

Show HN: Built an space travel calculator with vanilla JavaScript v2

https://www.cosmicodometer.space/
2•captainnemo729•15m ago•0 comments

Why a 175-Year-Old Glassmaker Is Suddenly an AI Superstar

https://www.wsj.com/tech/corning-fiber-optics-ai-e045ba3b
1•Brajeshwar•15m ago•0 comments

Micro-Front Ends in 2026: Architecture Win or Enterprise Tax?

https://iocombats.com/blogs/micro-frontends-in-2026
1•ghazikhan205•17m ago•0 comments

These White-Collar Workers Actually Made the Switch to a Trade

https://www.wsj.com/lifestyle/careers/white-collar-mid-career-trades-caca4b5f
1•impish9208•17m ago•1 comments

The Wonder Drug That's Plaguing Sports

https://www.nytimes.com/2026/02/02/us/ostarine-olympics-doping.html
1•mooreds•18m ago•0 comments

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

https://new.knife.day/blog/reddit-steel-sentiment-analysis
1•p-s-v•18m ago•0 comments

Federated Credential Management (FedCM)

https://ciamweekly.substack.com/p/federated-credential-management-fedcm
1•mooreds•18m ago•0 comments

Token-to-Credit Conversion: Avoiding Floating-Point Errors in AI Billing Systems

https://app.writtte.com/read/kZ8Kj6R
1•lasgawe•19m ago•1 comments

The Story of Heroku (2022)

https://leerob.com/heroku
1•tosh•19m ago•0 comments

Obey the Testing Goat

https://www.obeythetestinggoat.com/
1•mkl95•20m ago•0 comments

Claude Opus 4.6 extends LLM pareto frontier

https://michaelshi.me/pareto/
1•mikeshi42•20m ago•0 comments

Brute Force Colors (2022)

https://arnaud-carre.github.io/2022-12-30-amiga-ham/
1•erickhill•23m ago•0 comments

Google Translate apparently vulnerable to prompt injection

https://www.lesswrong.com/posts/tAh2keDNEEHMXvLvz/prompt-injection-in-google-translate-reveals-ba...
1•julkali•23m ago•0 comments

(Bsky thread) "This turns the maintainer into an unwitting vibe coder"

https://bsky.app/profile/fullmoon.id/post/3meadfaulhk2s
1•todsacerdoti•24m ago•0 comments
Open in hackernews

Show HN: Terminal-Bench-RL: Training long-horizon terminal agents with RL

https://github.com/Danau5tin/terminal-bench-rl
125•Danau5tin•6mo ago
After training calculator agent via RL, I really wanted to go bigger! So I built RL infrastructure for training long-horizon terminal/coding agents that scales from 2x A100s to 32x H100s (~$1M worth of compute!) Without any training, my 32B agent hit #19 on Terminal-Bench leaderboard, beating Stanford's Terminus-Qwen3-235B-A22! With training... well, too expensive, but I bet the results would be good!

*What I did*:

- Created a Claude Code-inspired agent (system msg + tools)

- Built Docker-isolated GRPO training where each rollout gets its own container

- Developed a multi-agent synthetic data pipeline to generate & validate training data with Opus-4

- Implemented a hybrid reward signal of unit test verifiers & a behavioural LLM judge.

*Key results*:

- My untrained Qwen3-32B agent achieved 13.75% on Terminal-Bench (#19, beats Stanford's Qwen3-235B MoE)

- I tested training to work stably on 32x H100s distributed across 4 bare metal nodes

- I created a mini-eval framework for LLM-judge performance. Sonnet-4 won.

- ~£30-50k needed for full training run of 1000 epochs (I could only afford testing )

*Technical details*:

- The synthetic dataset ranges from easy to extremely hard tasks. An example hard task's prompt:

"I found this mystery program at `/app/program` and I'm completely stumped. It's a stripped binary, so I have no idea what it does or how to run it properly. The program seems to expect some specific input and then produces an output, but I can't figure out what kind of input it needs. Could you help me figure out what this program requires?"

- Simple config presets allow training to run on multiple hardware setups with minimal effort.

- GRPO used with 16 rollouts per task, up to 32k tokens per rollout.

- Agent uses XML/YAML format to structure tool calls

*More details*:

My Github repos open source it all (agent, data, code) and has way more technical details if you are interested!:

- Terminal Agent RL repo

- Multi-agent synthetic data pipeline repo

I thought I would share this because I believe long-horizon RL is going to change everybody's lives, and so I feel it is important (and super fun!) for us all to share knowledge around this area, and also have enjoy exploring what is possible.

Thanks for reading!

Dan

(Built using rLLM RL framework which was brilliant to work with, and evaluated and inspired by the great Terminal Bench benchmark)

Comments

rboyd•6mo ago
Great work! There should be a way for entities to crowdfund model training. Can a model like this be partially evaluated during training time and save through early stopping?

What are the best papers/resources on sota long-horizon RL?

Thanks.

thomasfromcdnjs•6mo ago
How much did you spend?
tjungblut•6mo ago
If you are curios, like me, how the actual reinforcement learning happens. It uses verl [1] underneath. The paper "HybridFlow: A Flexible and Efficient RLHF Framework" [2] explains it really well.

[1] https://github.com/volcengine/verl

[2] https://arxiv.org/abs/2409.19256v2

OtherShrezzing•6mo ago
That you've spent in the low-thousands (by the looks of it), and managed to beat GPT4.1 is an amazing insight into the moat of the big AI labs.
bravesoul2•6mo ago
Wow amazing! Amazing a "one person band" can do this much. It crosses many skillets.
erdaltoprak•6mo ago
This is incredible work
enigma101•6mo ago
Did you consider a kickstarter to overcome the gpu poorness??? 30 to 50 should be doable
anorwell•6mo ago
Some of the comments so far seem to be misunderstanding this submission. As I understand it:

1. Custom scaffolding (system prompt and tools) using Qwen3-32B achieved 13.75% on Terminal-Bench. No training was involved.

2. The author has built an RL system, but it has not been used for anything due to cost limitations.

So there's actually no result related to training here. It well known that the scaffolding used can have a large impact on benchmark outcomes (the Terminal bench leaderboard also demonstrates this [1]).

[1] https://www.tbench.ai/leaderboard

esafak•6mo ago
It looks like the submission has two aspects that are being conflated.

1. Tooling for training a terminal agent.

2. An agent that was _not_ trained with this tooling but prompt engineered. I could not find the author's discussion on this point.

TarasBob•6mo ago
I'm willing to help fund this if the creator is interested. I sent him an email.
lostmsu•6mo ago
Why do you need 50k? Can't you tune using LoRA?
Danau5tin•6mo ago
Exactly my first thought when I realised the cost! Currently LoRA is not supported by rLLM (The team told me they aim to support in next release), but it is certainly possible to port to verl directly or another RL framework for sure. I just did not have the time to port again (already done 2x as other RL frameworks had issues)