frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma

https://rhodesmill.org/brandon/2009/commands-with-comma/
58•theblazehen•2d ago•11 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
638•klaussilveira•13h ago•188 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
936•xnx•18h ago•549 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
35•helloplanets•4d ago•31 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
113•matheusalmeida•1d ago•28 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
13•kaonwarb•3d ago•12 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
45•videotopia•4d ago•1 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
222•isitcontent•13h ago•25 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
214•dmpetrov•13h ago•106 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
324•vecti•15h ago•142 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
374•ostacke•19h ago•94 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
479•todsacerdoti•21h ago•238 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
359•aktau•19h ago•181 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
279•eljojo•16h ago•166 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
407•lstoll•19h ago•273 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
17•jesperordrup•3h ago•10 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
85•quibono•4d ago•21 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
58•kmm•5d ago•4 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
27•romes•4d ago•3 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
245•i5heu•16h ago•193 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
14•bikenaga•3d ago•2 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
54•gfortaine•11h ago•22 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
143•vmatsiiako•18h ago•65 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1061•cdrnsf•22h ago•438 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
179•limoce•3d ago•96 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
284•surprisetalk•3d ago•38 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
137•SerCe•9h ago•125 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
70•phreda4•12h ago•14 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
29•gmays•8h ago•11 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
63•rescrv•21h ago•23 comments
Open in hackernews

Show HN: ART – a new open-source RL framework for training agents

https://github.com/OpenPipe/ART
116•kcorbitt•9mo ago
Hey HN, I wanted to share a new project we've been working on for the last couple of months called ART (https://github.com/OpenPipe/ART).

ART is a new open-source framework for training agents using reinforcement learning (RL). RL allows you to train an agent to perform better at any task whose outcome can be measured and quantified.

There are many excellent projects focused on training LLMs with RL, such as GRPOTrainer (https://huggingface.co/docs/trl/main/en/grpo_trainer) and verl (https://github.com/volcengine/verl). We've used these frameworks extensively for customer-facing projects at OpenPipe, but grew frustrated with some key limitations:

- Multi-turn workflows, where the agent calls a tool, gets a response, and calls another, are not well supported. This makes them a non-starter for any task that requires an agent to perform a sequence of actions.

- Other frameworks typically have low GPU efficiency. They may require multiple H100 GPUs just to train a small 7B parameter model, and aren't able to keep the GPUs busy consistently during both the "rollout" and "training" phases of the training loop.

- Existing frameworks are typically not a convenient shape for integrating with existing agentic codebases. Existing trainers expect you to call raw text completion endpoints, and don't automatically provide industry-standard chat completion APIs.

ART is designed to address these limitations and make it easy to train high-quality agents. We've also shared many details and practical lessons learned is in this post, which walks through a demo of training an email research agent that outperforms o3 (https://openpipe.ai/blog/art-e-mail-agent). You can also find out more about ART's architecture in our announcement post (https://openpipe.ai/blog/art-trainer-a-new-rl-trainer-for-ag...).

Happy to answer any questions you have!

Comments

kcorbitt•9mo ago
Figured now was a good time to post this since we recently got surprisingly good results on training an email research agent. Link is above, but will put it here as well since I think it's a good example of RL's promise: https://openpipe.ai/blog/art-e-mail-agent
bradhilton•9mo ago
Contributor here, we developed the Agent Reinforcement Trainer (ART) library to make it easy to train LLMs for anything.

No callbacks or straitjacket flows. Instead we serve an OpenAI API-compatible endpoint that you can use as a drop-in replacement for any proprietary APIs you may be hitting.

After collecting responses from the inference API, you can tune the model with your own custom rewards and repeat the process as long as you like, until performance converges. We believe this level of flexibility will make it easier for you to train state-of-the-art models for your own use cases, much like Kyle's new email agent[1].

Also happy to answer any questions you have about the framework.

[1] https://openpipe.ai/blog/art-e-mail-agent

tcdent•9mo ago
I really like this concept.

Do you have documentation for the API response from the `/_train_model` endpoint?

bradhilton•9mo ago
Hi, we don't have reliable documentation for the HTTP API endpoints yet, mostly as they are still subject to change.

However, to briefly provide some context, `/_train_model` returns a stream of line delimited JSON objects for each gradient step as the model trains on the provided trajectories so the client can monitor progress. The final version of this endpoint may provide the option for both streaming & non-streaming responses, and/or potentially return a "training job" that can be polled instead.

someguy101010•9mo ago
Thanks for sharing this! A couple of questions come to mind:

- How does training with RL differ from fine tuning?

- When would it make sense to fine tune instead of using RL?

kcorbitt•9mo ago
Ok good questions here.

By fine-tuning in this context I assume you mean "supervised fine-tuning", or SFT. SFT trains a model to produce a specific string of output tokens, given an input. With SFT, if you were trying to train an assistant to solve math problems using a code interpreter, you might train it on a dataset that looks like:

    input: 'What is 934+1208'  
    output: `print(934+1208)`

    input: 'how many "r"s in strawberry'
    output: `print(len([l for l in "strawberry" if l == 'r'])`
etc, etc.

RL, on the other hand, just means training a model not to produce a concrete string of output tokens, but rather to create an output that maximizes some reward function (you get to decide on the reward).

For the example above, you might create the following dataset for RL training:

    input: 'What is 934+1208'
    ground_truth: 2142

    input: 'how many "r"s in strawberry'
    ground_truth: 3
You would then train the model to write python code that produces the ground_truth output. Your training code would take the model's output, run the python it produced, and then check whether the output matches the expected ground_truth. Importantly, this doesn't require you actually writing the code to solve the problem (you don't even have to know if it's solvable, technically!). Over time, the training loop would make the model more likely to produce outputs that get high rewards, which hopefully means it gets better at producing valid and applicable python.

This is useful in lots of domains where it's easier to check the answer than actually produce it. In the blog post[1] linked above, we train the agent to effectively use keyword search to try to find the correct emails in an inbox. As the model trainer, I didn't actually know what the right strategy was to choose keywords that would most quickly find the relevant email, but through training with RL, the model was able to figure it out on its own!

[1]: https://openpipe.ai/blog/art-e-mail-agent?refresh=1746030513...

someguy101010•9mo ago
Thank you for the detailed response!
jeffchuber•9mo ago
the table with comparable models is a really great way to show off things here
pama•9mo ago
Was the name influenced by the ship in the murderbot diaries?
schainks•9mo ago
Seconded.
gitroom•9mo ago
Perfect, I've always wanted an easier way to mess with RL frameworks. Gonna mess around with this asap.
bradhilton•9mo ago
Awesome! If you run into any problems or have questions feel free to open an issue or drop by the discord [1] server.

[1] https://discord.gg/zbBHRUpwf4