frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Omarchy First Impressions

https://brianlovin.com/writing/omarchy-first-impressions-CEEstJk
1•tosh•3m ago•0 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
1•onurkanbkrc•4m ago•0 comments

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

https://github.com/Concode0/Versor
1•concode0•5m ago•1 comments

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

https://medresearch-ai.org/hypotheses-hub/
1•panossk•8m ago•0 comments

Big Tech vs. OpenClaw

https://www.jakequist.com/thoughts/big-tech-vs-openclaw/
1•headalgorithm•10m ago•0 comments

Anofox Forecast

https://anofox.com/docs/forecast/
1•marklit•11m ago•0 comments

Ask HN: How do you figure out where data lives across 100 microservices?

1•doodledood•11m ago•0 comments

Motus: A Unified Latent Action World Model

https://arxiv.org/abs/2512.13030
1•mnming•11m ago•0 comments

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

https://www.thedailybeast.com/obsessed/rotten-tomatoes-desperately-claims-impossible-rating-for-m...
3•juujian•13m ago•1 comments

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

https://www.science.org/doi/10.1126/scisignal.adv0660
1•thunderbong•14m ago•0 comments

Los Alamos Primer

https://blog.szczepan.org/blog/los-alamos-primer/
1•alkyon•17m ago•0 comments

NewASM Virtual Machine

https://github.com/bracesoftware/newasm
1•DEntisT_•19m ago•0 comments

Terminal-Bench 2.0 Leaderboard

https://www.tbench.ai/leaderboard/terminal-bench/2.0
2•tosh•19m ago•0 comments

I vibe coded a BBS bank with a real working ledger

https://mini-ledger.exe.xyz/
1•simonvc•20m ago•1 comments

The Path to Mojo 1.0

https://www.modular.com/blog/the-path-to-mojo-1-0
1•tosh•23m ago•0 comments

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

https://github.com/voice-of-japan/Virtual-Protest-Protocol/blob/main/README.md
5•sakanakana00•26m ago•0 comments

Show HN: I built Divvy to split restaurant bills from a photo

https://divvyai.app/
3•pieterdy•28m ago•0 comments

Hot Reloading in Rust? Subsecond and Dioxus to the Rescue

https://codethoughts.io/posts/2026-02-07-rust-hot-reloading/
3•Tehnix•29m ago•1 comments

Skim – vibe review your PRs

https://github.com/Haizzz/skim
2•haizzz•30m ago•1 comments

Show HN: Open-source AI assistant for interview reasoning

https://github.com/evinjohnn/natively-cluely-ai-assistant
4•Nive11•30m ago•6 comments

Tech Edge: A Living Playbook for America's Technology Long Game

https://csis-website-prod.s3.amazonaws.com/s3fs-public/2026-01/260120_EST_Tech_Edge_0.pdf?Version...
2•hunglee2•34m ago•0 comments

Golden Cross vs. Death Cross: Crypto Trading Guide

https://chartscout.io/golden-cross-vs-death-cross-crypto-trading-guide
3•chartscout•37m ago•0 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
3•AlexeyBrin•40m ago•0 comments

What the longevity experts don't tell you

https://machielreyneke.com/blog/longevity-lessons/
2•machielrey•41m ago•1 comments

Monzo wrongly denied refunds to fraud and scam victims

https://www.theguardian.com/money/2026/feb/07/monzo-natwest-hsbc-refunds-fraud-scam-fos-ombudsman
3•tablets•46m ago•1 comments

They were drawn to Korea with dreams of K-pop stardom – but then let down

https://www.bbc.com/news/articles/cvgnq9rwyqno
2•breve•48m ago•0 comments

Show HN: AI-Powered Merchant Intelligence

https://nodee.co
1•jjkirsch•50m ago•0 comments

Bash parallel tasks and error handling

https://github.com/themattrix/bash-concurrent
2•pastage•50m ago•0 comments

Let's compile Quake like it's 1997

https://fabiensanglard.net/compile_like_1997/index.html
2•billiob•51m ago•0 comments

Reverse Engineering Medium.com's Editor: How Copy, Paste, and Images Work

https://app.writtte.com/read/gP0H6W5
2•birdculture•56m ago•0 comments
Open in hackernews

Sims with verifiable rewards for web agent benchmarking and RL

https://halluminate.ai/blog/westworld
1•wujerry2000•2mo ago

Comments

wujerry2000•2mo ago
Hi all! Sharing some of our recent work around building RL envs and sims for agent training.

There are a lot more technical details on building the benchmark in the post. If you are interested in more RL/Post-Training, I'd highly recommend reading this super in-depth blog from our partners at Yutori: https://yutori.com/blog/introducing-navigator

Some more casual thoughts and lessons:

1) A high volume of quality RL environments / sims remain one of the largest blockers to training frontier agents, especially as labs/enterprises shift towards creating increasingly specialized AI coworkers that can do real work.

2) Building an RL env is VERY different from building a high quality dataset. While the primary input for dataset creation is specialized human annotators and clear rubrics, the inputs to building a great RL env involve humans, engineers, product, data, and an orchestration of everything together. There are a lot of green field problems when you move from building singular environments to SCALING 1-3 orders of magnitude.

3) There is a constant push/pull between building tasks that are easily verifiable and building tasks that are realistic. Its sort of like a 2x2 grid. The best (and most valuable) tasks are realistic and verifiable. There are constant tradeoffs being made, and we often find ourselves limited by the types of realistic tasks we can make if they lack a clear verifier. I'm reminded of Jason Wei's post here: https://www.jasonwei.net/blog/asymmetry-of-verification-and-...

4) When it comes to building browser sims, we found the hardest challenges to come NOT from mimicking the frontend components but rather creating a realistic distribution of data to sit on top of. Although not immediately obvious, this makes a lot of sense. For example, when building Noodle Flights, the front end UI was (although non trivial) manageable to create, but modeling the distribution of complex flight data was infinitely harder.

5) Its an iterative process. Building a perfect sim / verifier out the gate is very difficult, and a large part of the RL process is shepherding / QA of specific tasks and verifiers. The best way to do this is by constantly reviewing trajectories and spotting false positives/negatives. This is tedious work, but often front loaded - until you see smooth gains :)

Have lots more thoughts but these were just top of mind today. If this work is interesting always happy to chat (we're also hiring)!