frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Scaling Coding-Agent RL to 32x H100s. 160% Improvement on Stanford's TBench

https://github.com/Danau5tin/Orca-Agent-RL
2•Danau5tin•6h ago

Comments

Danau5tin•6h ago
My RL trained multi-agent-coding model Orca-Agent-v0.1-14B reached a 167% higher relative score than its base model on Stanford's TerminalBench. I've open sourced everything.

*What I did:*

- I trained a 14B orchestrator model to better coordinate explorer & coder subagents (subagents are tool calls for orchestrator) - Scaled to 32x H100s that were pushed to their limits across 4 bare-metal nodes - Scaled to 256 Docker environments rolling out simultaneously, automatically distributed across the cluster

*Key results:*

- Qwen3-14B jumped from *7% → 18.25%* on TerminalBench after training - Model now within striking distance of Qwen3-Coder-480B (19.7%) - Training was stable with smooth entropy decrease and healthy gradient norms

*Training approach:*

Reward design and biggest learning: Kept it simple - *just unit tests*. Every "smart" reward signal I tried to craft led to policy collapse

Curriculum learning: - Stage-1: Tasks where base model succeeded 1-2/3 times (41 tasks) - Stage-2: Tasks where Stage-1 model succeeded 1-4/5 times

Dataset: Used synthetically generated RL environments and unit tests

*More details:*

I have added lots more details in the repo linked to this submission, including training code, model weights, datasets.

Huge thanks to: - Tara for providing the compute - Prime Intellect team for building prime-rl and dealing with my endless questions - Alex Dimakis for the conversation that sparked training the orchestrator model

Thanks for reading!

Dan

(Evaluated on the excellent TerminalBench benchmark by Stanford & Laude Institute)

Workflows: Durable Execution with Just Postgres

https://lucumr.pocoo.org/2025/11/3/absurd-workflows/
1•janpio•1m ago•0 comments

Every Country Has One

https://everycountry.us/
1•lukeigel•2m ago•0 comments

Show HN: MyPCOptimizer – AI-powered PC hardware upgrade advisor

https://www.mypcoptimizer.com/
1•Arnaus•3m ago•0 comments

Show HN: Vayno – AI Email Sequence Generator from Any Landing Page

https://vaynoai.lovable.app/
1•ahemx_•4m ago•0 comments

Show HN: Vayno – AI Email Sequence Generator from Any Landing Page

1•ahemx_•5m ago•0 comments

Better authentication with workload identity federation

https://tailscale.com/blog/workload-identity-beta
2•tabletcorry•5m ago•0 comments

Debugging Playwright Timeouts: A Practical Checklist

https://currents.dev/posts/debugging-playwright-timeouts
1•waltergalvao•5m ago•0 comments

Understanding H-1B Visa Changes

https://www.richmondfed.org/publications/research/economic_brief/2025/eb_25-39
1•andrewstetsenko•5m ago•1 comments

Online OR1K Emulator Running Linux

https://github.com/s-macke/jor1k
1•gurjeet•6m ago•0 comments

Google is showing ads to people who are about to buy [video]

https://www.youtube.com/watch?v=BfNIRyPi5QA
1•potamic•6m ago•0 comments

Homotopy Type Theory for Dummies

http://www.chriswarbo.net/blog/2015-09-11-hott_for_dummies.html
1•fanf2•6m ago•0 comments

Writing 30 posts in 30 days

https://psychotechnology.substack.com/p/i-am-writing-30-posts-in-30-days
1•eatitraw•7m ago•0 comments

Equifax Plans to Profit from Medicaid Cuts

https://www.nytimes.com/2025/11/03/health/medicaid-cuts-equifax-data.html
2•JumpCrisscross•7m ago•0 comments

Bessent: Broader recession possible without more rate cuts

https://www.axios.com/2025/11/02/recession-bessent-fed-interest-rates
3•stopbulying•8m ago•1 comments

$50 PlanetScale Metal

https://planetscale.com/blog/50-dollar-planetscale-metal
6•ianl•9m ago•0 comments

The Stallman Paradox: How Web3 Became the Ultimate Open Source Theater

https://paragraph.com/@holonic-horizons/the-stallman-paradox-how-web3-became-the-ultimate-open-so...
2•nabla9•10m ago•0 comments

rm -rf / remains

https://www.lambdaops.com/posts/rm-rf-remains
1•gurjeet•11m ago•1 comments

2025 Quantum Open Source Software Survey Results

https://unitary.foundation/posts/2025_survey_results/
1•EvgeniyZh•12m ago•0 comments

Is the Internet Making Culture Worse?

https://asteriskmag.com/issues/12-books/is-the-internet-making-culture-worse
11•Luc•15m ago•0 comments

OpenAI signs $38B cloud computing deal with Amazon

https://www.theguardian.com/technology/2025/nov/03/openai-cloud-computing-deal-amazon-aws-datacen...
8•jethronethro•17m ago•0 comments

Show HN: Word Wolfer, Number Wolfer – educational games inspired by Munchers

https://memalign.github.io/m/wolfer/index.html
1•memalign•18m ago•0 comments

Install script does rm -RF /usr for Ubuntu

https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/issues/123
2•oli5679•18m ago•0 comments

Gallery of wonderful drawings our little thermal printer received

https://guestbook.goodenough.us
8•busymom0•21m ago•0 comments

Mechanochemical Approach to Upcycling of Fluoride from PTFE into Fine Chemicals

https://pubs.acs.org/doi/10.1021/jacs.5c14052
2•PaulHoule•21m ago•0 comments

Blood, Brick and Legend: The Chemistry of Dracula's Castle

https://news.research.gatech.edu/2025/10/31/blood-brick-and-legend-chemistry-draculas-castle
1•dhfbshfbu4u3•21m ago•0 comments

Antarctic glacier shows fastest retreat in modern history

https://www.science.org/content/article/antarctic-glacier-shows-fastest-retreat-modern-history
1•bikenaga•21m ago•1 comments

We spent 47k running AI agents in production

https://pub.towardsai.net/we-spent-47-000-running-ai-agents-in-production-heres-what-nobody-tells...
3•datadrivenangel•22m ago•1 comments

Ask HN: Freelancer? Seeking freelancer? (November 2025)

4•Grosvenor•23m ago•7 comments

The Americas, led by Canada, is on brink of losing measles-elimination status

https://www.statnews.com/2025/11/03/measles-elimination-status-canada-united-states-mexico/
9•divbzero•24m ago•0 comments

Ask HN: Are social bonds bad for independent thought?

4•amichail•27m ago•3 comments