frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Launch HN: RunRL (YC X25) – Reinforcement learning as a service

https://runrl.com
20•ag8•1h ago
Hey HN, we’re Andrew and Derik at RunRL (https://runrl.com/). We've built a platform to improve models and agents with reinforcement learning. If you can define a metric, we'll make your model or agent better, without you having to think about managing GPU clusters.

Here's a demo video: https://youtu.be/EtiBjs4jfCg

I (Andrew) was doing a PhD in reinforcement learning on language models, and everyone kept...not using RL because it was too hard to get running. At some point I realized that someone's got to sit down and actually write a good platform for running RL experiments.

Once this happened, people started using it for antiviral design, formal verification, browser agents, and a bunch of other cool applications, so we decided to make a startup out of it.

How it works:

- Choose an open-weight base model (weights are necessary for RL updates; Qwen3-4B-Instruct-2507 is a good starting point)

- Upload a set of initial prompts ("Generate an antiviral targeting Sars-CoV-2 protease", "Prove this theorem", "What's the average summer high in Windhoek?")

- Define a reward function, using Python, an LLM-as-a-judge, or both

- For complex settings, you can define an entire multi-turn environment

- Watch the reward go up!

For most well-defined problems, a small open model + RunRL outperforms frontier models. (For instance, we've seen Qwen-3B do better than Claude 4.1 Opus on antiviral design.) This is because LLM intelligence is notoriously "spiky"; often models are decent-but-not-great at common-sense knowledge, are randomly good at a few domains, but make mistakes on lots of other tasks. RunRL creates spikes precisely on the tasks where you need them.

Pricing: $80/node-hour. Most models up to 14B parameters fit on one node (0.6-1.2 TB of VRAM). We do full fine-tuning, at the cost of parameter-efficiency (with RL, people seem to care a lot about the last few percent gains in e.g. agent reliability).

Next up: continuous learning; tool use. Tool use is currently in private beta, which you can join here: https://forms.gle/D2mSmeQDVCDraPQg8

We'd love to hear any thoughts, questions, or positive or negative reinforcement!

Comments

nextworddev•53m ago
Is there any credence to the view that these startups are basically dspy wrappers
-_-•35m ago
DSPy is great for prompt optimization but not so much for RL fine-tuning (their support is "extremely EXPERIMENTAL"). The nice thing about RL is that the exact prompts don't matter so much. You don't need to spell out every edge case, since the model will get an intuition for how to do its job well via the training process.
nextworddev•10m ago
Isn’t the latest trend in RL mostly about prompt optimization as opposed to full fine tuning

Federal Reserve cuts interest rates by quarter point

https://www.ft.com/content/f1d4522b-331e-45d5-b676-24dc5b8e3c92
1•alephnerd•29s ago•0 comments

Social Security admin denies DB data leak, DOGEs questions about a copy

https://www.theregister.com/2025/09/17/ssa_denies_doge_whistleblower_claim/
2•rntn•1m ago•0 comments

Show HN: LLMyourself.com – Type a name. Get a report.

https://www.llmyourself.com/
1•AlexNicita•5m ago•0 comments

GCJ-02, China's "Mars Coordinates"

https://steemit.com/china/@randyw/chinese-coordinates
2•brendanashworth•5m ago•0 comments

Fed Cuts Rates by Quarter Point and Signals More Are Likely

https://www.wsj.com/economy/central-banking/fed-cuts-rates-by-quarter-point-and-signals-more-are-...
2•thm•9m ago•0 comments

Federal Reserve cuts interest rates by a quarter point

https://www.federalreserve.gov/newsevents/pressreleases/monetary20250917a.htm
4•impish9208•10m ago•2 comments

Janet for Mortals

https://janet.guide/
1•Karrot_Kream•11m ago•0 comments

Anthropic irks White House with limits on models’ use

https://www.semafor.com/article/09/17/2025/anthropic-irks-white-house-with-limits-on-models-uswhi...
4•mindingnever•14m ago•0 comments

Gen Z Leads Biggest Drop in FICO Scores Since Financial Crisis

https://www.bloomberg.com/news/articles/2025-09-17/fico-scores-fall-at-fastest-rate-since-financi...
2•petethomas•14m ago•2 comments

Facing the possibility of consciousness in human brain organoids

https://www.cell.com/patterns/fulltext/S2666-3899(25)00213-2
3•XzetaU8•15m ago•0 comments

Google Open-Sources "Codegen Scorer" to Improve AI-Generation for Web Frameworks

https://blog.angular.dev/beyond-the-horizon-how-angular-is-embracing-ai-for-next-gen-apps-7a7ed70...
3•mgechev•16m ago•0 comments

Glue teams vs. back-office teams

https://newsletter.posthog.com/p/glue-teams-vs-back-office-teams
1•Twixes•17m ago•0 comments

Why Love Generative Art

https://www.artnome.com/news/2018/8/8/why-love-generative-art
1•shreyas_p_238•19m ago•0 comments

Who controls the Internet and How it works?

https://binaryigor.com/who-controls-the-internet-and-how-it-works.html
1•BinaryIgor•20m ago•0 comments

FTC Launches Inquiry into AI Chatbots Acting as Companions

https://www.ftc.gov/news-events/news/press-releases/2025/09/ftc-launches-inquiry-ai-chatbots-acti...
3•mooreds•21m ago•0 comments

The snake-killer trial that led to California's last hanging

https://www.latimes.com/california/story/2025-09-17/states-struggle-for-an-answer-is-there-any-go...
1•axiomdata316•23m ago•0 comments

Supplementary Information for the DeepSeek R1 paper [pdf]

https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-025-09422-z/MediaObjects/41586_202...
1•pr337h4m•25m ago•0 comments

Using a maintenance mode primitive to shard Postgres with zero downtime

https://gadget.dev/blog/sharding-our-core-postgres-database-without-any-downtime
2•draward•26m ago•0 comments

Redesigning Data Systems to Be Agent-First

http://muratbuffalo.blogspot.com/2025/09/supporting-our-ai-overlords-redesigning.html
2•KraftyOne•28m ago•0 comments

Revisiting the IPIP-NEO personality hierarchy with taxonomic graph analysis

https://journals.sagepub.com/doi/10.1177/08902070251352590
1•PaulHoule•29m ago•0 comments

Communications Is So Big

https://heidiwaterhouse.com/communications-is-so-big/
1•mooreds•29m ago•0 comments

Tesla's 'self-driving' software fails at train crossings

https://www.nbcnews.com/tech/elon-musk/tesla-full-self-driving-fails-train-crossings-drivers-warn...
12•Veserv•30m ago•5 comments

Lomuto's Comeback for Quicksort Partitions

https://dlang.org/blog/2020/05/14/lomutos-comeback/
1•fanf2•30m ago•0 comments

Don't Take the Auditor to the Strip Club

https://www.bloomberg.com/opinion/newsletters/2025-09-17/don-t-take-the-auditor-to-the-strip-club
5•ioblomov•30m ago•1 comments

Take Home Interviews in the Era of Claude

https://blog.reffie.me/take-home-interviews-in-the-era-of-claude/
2•SoylentOrange•30m ago•1 comments

Learning the natural history of human disease with generative transformers

https://www.nature.com/articles/s41586-025-09529-3
1•bookofjoe•31m ago•0 comments

Why random lines of video game dialogue get stuck in our heads

https://www.theguardian.com/games/2025/sep/17/video-game-dialogue-pushing-buttons
1•n1b0m•31m ago•0 comments

The Debian-based version and Linux Mint 22.3 will be appearing by year's end

https://www.zdnet.com/article/just-got-linux-mint-22-2-two-more-versions-are-coming-soon-and-they...
1•CrankyBear•32m ago•0 comments

Viaduct, Five Years On: Modernizing the Data-Oriented Service Mesh

https://medium.com/airbnb-engineering/viaduct-five-years-on-modernizing-the-data-oriented-service...
2•mooreds•32m ago•0 comments

Say Goodbye to Node.js HTTP. Meet Brahma-JS an Ultra HTTP

https://github.com/Shyam20001/rsjs
1•StellaMary•32m ago•1 comments