frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: RULER – Easily apply RL to any agent

https://openpipe.ai/blog/ruler
29•kcorbitt•3h ago
Hey HN, Kyle here, one of the co-founders of OpenPipe.

Reinforcement learning is one of the best techniques for making agents more reliable, and has been widely adopted by frontier labs. However, adoption in the outside community has been slow because it's so hard to implement.

One of the biggest challenges when adapting RL to a new task is the need for a task-specific "reward function" (way of measuring success). This is often difficult to define, and requires either high-quality labeled data and/or significant domain expertise to generate.

RULER is a drop-in reward function that works across different tasks without any of that complexity.

It works by showing N trajectories to an LLM judge and asking it to rank them relative to each other. This sidesteps the calibration issues that plague most LLM-as-judge approaches. Combined with GRPO (which only cares about relative scores within groups), it just works (surprisingly well!).

We have a full writeup on the blog, including results on 4 production tasks. On all 4 tasks, small Qwen 2.5 models trained with RULER+GRPO beat the best prompted frontier model, despite being significantly smaller and cheaper to run. Surprisingly, they even beat models trained with hand-crafted reward functions on 3/4 tasks! https://openpipe.ai/blog/ruler

Repo: https://github.com/OpenPipe/ART

Comments

someoneontenet•1h ago
Love these write ups!
kcorbitt•33m ago
Thank! If there are any topics that you'd find particularly interesting, let me know and I can try to find time. :)
sadiq•32m ago
Excellent, look forward to giving this a go.

I was looking at: https://arxiv.org/abs/2506.18254 but your approach is even more general.

TSMC to cease GAN foundry production in 2027 due to Chinese competition

https://www.semiconductor-today.com/news_items/2025/jul/tsmc-030725.shtml
1•ilamont•1m ago•0 comments

LLMs for coding (+ free workflow templates)

https://blog.n8n.io/best-llm-for-coding/
1•based2•2m ago•0 comments

Engine fuel switches cut off before Air India crash, preliminary report finds

https://www.theguardian.com/world/2025/jul/11/engine-fuel-switches-cut-off-before-air-india-crash-that-killed-260-report-finds
1•pseudolus•4m ago•0 comments

OpenAI's Windsurf deal is off – and its CEO is going to Google

https://www.theverge.com/openai/705999/google-windsurf-ceo-openai
3•rcchen•4m ago•0 comments

PocketBase – open-source back end in 1 file

https://pocketbase.io/
1•EPendragon•9m ago•0 comments

Phoenix Framework

https://www.phoenixframework.org/
1•gjvc•13m ago•0 comments

ZKsync's Airbender ZkVM Proves Ethereum Blocks in 35 Seconds

https://www.coindesk.com/tech/2025/06/24/zksyncs-airbender-zkvm-proves-ethereum-blocks-in-35-seconds
1•PaulHoule•20m ago•0 comments

Branchy – A Infinte Generative Outline

https://branchy.co/
2•dennishansen•21m ago•0 comments

America is coming after Chinese it accuses of hacking

https://www.economist.com/china/2025/07/10/america-is-coming-after-chinese-it-accuses-of-hacking
1•bookofjoe•22m ago•1 comments

Texas ignores climate change at our own peril

https://www.expressnews.com/opinion/commentary/article/climate-change-kerrville-flood-20763416.php
1•trauco•22m ago•1 comments

Apple set to land US F1 streaming rights in $150M+ deal

https://9to5mac.com/2025/07/11/report-apple-set-to-land-us-f1-streaming-rights-in-150-million-deal/
2•mgh2•23m ago•0 comments

Grok 4 CLI – a terminal-based LLM tool using xAI's Grok 4 model via their API

https://github.com/ComposioHQ/grok-cli
1•nailer•26m ago•1 comments

AI coding tools are perhaps our new terminal emulators

https://ghuntley.com/vt100/
1•ghuntley•27m ago•0 comments

Salesforce blocks AI rivals from using Slack data

https://www.reuters.com/business/salesforce-blocks-ai-rivals-using-slack-data-information-reports-2025-06-11/
1•rcchen•27m ago•0 comments

Air India 171: Preliminary report confirms movement of fuel control switches

https://theaircurrent.com/dispatches/?entry=30057
2•dondraper36•29m ago•2 comments

Goodnight, Pocket. Hello, Folio

https://savewithfolio.com/blog/goodnight-pocket-hello-folio/
2•Aldo_MX•29m ago•0 comments

aileaks.dev - Community driven database of AI related security incidents

https://www.aileaks.dev/
1•norcalkc•34m ago•0 comments

Wikimedia Foundation considers closing Wikinews

https://thedesk.net/2025/07/wikimedia-foundation-closing-wikinews/
1•kurtreed2•35m ago•0 comments

3I/Atlas: Observing and Modeling an Interstellar Newcomer

https://www.centauri-dreams.org/2025/07/10/3i-atlas-observing-and-modeling-an-interstellar-newcomer/
1•JPLeRouzic•35m ago•0 comments

Ask HN: Persisting LLM token streams through a page refresh?

1•spruce_tips•37m ago•0 comments

Who's Hiring: Curated Jobs by Category and Location – Updated Daily

1•jobswithgptcom•38m ago•0 comments

Exclusivity on OpenAI's $3B acquisition for Windsfurf has expired

https://fortune.com/2025/07/11/the-exclusivity-on-openais-3-billion-acquisition-for-coding-startup-windsfurf-has-expired/
1•granto•38m ago•0 comments

Humans overrely on overconfident language models, across languages

https://arxiv.org/abs/2507.06306
1•softwaredoug•39m ago•0 comments

X-Plane 12.2.1 beta: Increased demo time from 15 to 60 minutes

https://www.x-plane.com/kb/x-plane-12-2-1-release-notes/
1•amichail•40m ago•0 comments

Writable Instruction Set Computer

https://wiki.c2.com/?WritableInstructionSetComputer
1•gjvc•41m ago•0 comments

Timepiece linked to Darwin voyage cannot leave UK

https://www.bbc.com/news/articles/cvg6kz54kz7o
1•Bluestein•41m ago•0 comments

How to Beat Creative Block

https://substack.com/home/post/p-168013503
1•mwidell•42m ago•0 comments

Air India Flight 171 Accident Preliminary Report [pdf]

https://aaib.gov.in/What%27s%20New%20Assets/Preliminary%20Report%20VT-ANB.pdf
21•ummonk•47m ago•14 comments

Psilocybin Delays Aging

https://news.emory.edu/stories/2025/07/hs_psilocybin_aging_study_10-07-2025/story.html
1•vo2maxer•48m ago•0 comments

Photos: The Scale of China's Solar-Power Projects

https://www.theatlantic.com/photography/archive/2025/07/photos-china-solar-power-energy/683488/
2•coloneltcb•48m ago•0 comments