frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

WASM 3.0 Completed

https://webassembly.org/news/2025-09-17-wasm-3.0/
24•todsacerdoti•13m ago•1 comments

Apple Photos app corrupts images

https://tenderlovemaking.com/2025/09/17/apple-photos-app-corrupts-images/
752•pattyj•7h ago•278 comments

Tinycolor supply chain attack post-mortem

https://sigh.dev/posts/ctrl-tinycolor-post-mortem/
51•STRiDEX•1h ago•15 comments

DeepSeek writes less secure code for groups China disfavors

https://www.washingtonpost.com/technology/2025/09/16/deepseek-ai-security/
49•otterley•1h ago•18 comments

Depression Reduces Capacity to Learn to Actively Avoid Aversive Events

https://www.eneuro.org/content/12/9/ENEURO.0034-25.2025
35•PaulHoule•1h ago•6 comments

Anthropic irks White House with limits on models’ use

https://www.semafor.com/article/09/17/2025/anthropic-irks-white-house-with-limits-on-models-uswhi...
19•mindingnever•32m ago•2 comments

Drought in Iraq Reveals Ancient Tombs Created 2,300 Years Ago

https://www.smithsonianmag.com/smart-news/severe-droughts-in-iraq-reveals-dozens-of-ancient-tombs...
16•pseudolus•1h ago•0 comments

U.S. investors, Trump close in on TikTok deal with China

https://www.wsj.com/tech/details-emerge-on-u-s-china-tiktok-deal-594e009f
233•Mgtyalx•21h ago•177 comments

Event Horizon Labs (YC W24) Is Hiring

https://www.ycombinator.com/companies/event-horizon-labs/jobs/U6oyyKZ-founding-engineer-at-event-...
1•ocolegro•1h ago

Tau² benchmark: How a prompt rewrite boosted GPT-5-mini by 22%

https://quesma.com/blog/tau2-benchmark-improving-results-smaller-models/
137•blndrt•5h ago•38 comments

Alibaba's new AI chip: Key specifications comparable to H20

https://news.futunn.com/en/post/62202518/alibaba-s-new-ai-chip-unveiled-key-specifications-compar...
205•dworks•8h ago•214 comments

How to motivate yourself to do a thing you don't want to do

https://ashleyjanssen.com/how-to-motivate-yourself-to-do-a-thing-you-dont-want-to-do/
138•mooreds•3h ago•121 comments

Launch HN: RunRL (YC X25) – Reinforcement learning as a service

https://runrl.com
23•ag8•2h ago•6 comments

Determination of the fifth Busy Beaver value

https://arxiv.org/abs/2509.12337
213•marvinborner•8h ago•87 comments

Ton Roosendaal to step down as Blender chairman and CEO

https://www.cgchannel.com/2025/09/ton-roosendaal-to-step-down-as-blender-chairman-and-ceo/
14•cma•1h ago•1 comments

DeepMind and OpenAI Win Gold at ICPC, OpenAI AKs

https://codeforces.com/blog/entry/146536
6•notemap•14m ago•1 comments

YouTube addresses lower view counts which seem to be caused by ad blockers

https://9to5google.com/2025/09/16/youtube-lower-view-counts-ad-blockers/
140•iamflimflam1•4h ago•328 comments

Procedural Island Generation (III)

https://brashandplucky.com/2025/09/17/procedural-island-generation-iii.html
78•ibobev•6h ago•14 comments

UUIDv47: Store UUIDv7 in DB, emit UUIDv4 outside (SipHash-masked timestamp)

https://github.com/stateless-me/uuidv47
83•aabbdev•4h ago•46 comments

Microsoft Python Driver for SQL Server

https://github.com/microsoft/mssql-python
43•kermatt•3h ago•17 comments

Just for fun: animating a mosaic of 90s GIFs

https://alexplescan.com/posts/2025/09/15/gifs/
4•Bogdanp•1d ago•0 comments

Stategraph: Terraform state as a distributed systems problem

https://stategraph.dev/blog/why-stategraph/
117•lawnchair•9h ago•54 comments

Firefox 143 for Android to introduce DoH

https://blog.mozilla.org/en/firefox/dns-android/
165•HieronymusBosch•5h ago•88 comments

PureVPN IPv6 Leak

https://anagogistis.com/posts/purevpn-ipv6-leak/
141•todsacerdoti•8h ago•65 comments

Slow social media

https://herman.bearblog.dev/slow-social-media/
117•rishikeshs•16h ago•109 comments

SQLiteData: A fast, lightweight replacement for SwiftData using SQL and CloudKit

https://github.com/pointfreeco/sqlite-data
30•wahnfrieden•5h ago•23 comments

Doom crash after 2.5 years of real-world runtime confirmed on real hardware

https://lenowo.org/viewtopic.php?t=31
412•minki_the_avali•21h ago•178 comments

Notion API importer, with Databases to Bases conversion bounty

https://github.com/obsidianmd/obsidian-importer/issues/421
167•twapi•13h ago•56 comments

EU Chat Control: Germany's position has been reverted to undecided

https://mastodon.social/@chatcontrol/115215006562371435
352•doener•8h ago•255 comments

GNU Midnight Commander

https://midnight-commander.org/
467•pykello•14h ago•262 comments
Open in hackernews

Launch HN: RunRL (YC X25) – Reinforcement learning as a service

https://runrl.com
23•ag8•2h ago
Hey HN, we’re Andrew and Derik at RunRL (https://runrl.com/). We've built a platform to improve models and agents with reinforcement learning. If you can define a metric, we'll make your model or agent better, without you having to think about managing GPU clusters.

Here's a demo video: https://youtu.be/EtiBjs4jfCg

I (Andrew) was doing a PhD in reinforcement learning on language models, and everyone kept...not using RL because it was too hard to get running. At some point I realized that someone's got to sit down and actually write a good platform for running RL experiments.

Once this happened, people started using it for antiviral design, formal verification, browser agents, and a bunch of other cool applications, so we decided to make a startup out of it.

How it works:

- Choose an open-weight base model (weights are necessary for RL updates; Qwen3-4B-Instruct-2507 is a good starting point)

- Upload a set of initial prompts ("Generate an antiviral targeting Sars-CoV-2 protease", "Prove this theorem", "What's the average summer high in Windhoek?")

- Define a reward function, using Python, an LLM-as-a-judge, or both

- For complex settings, you can define an entire multi-turn environment

- Watch the reward go up!

For most well-defined problems, a small open model + RunRL outperforms frontier models. (For instance, we've seen Qwen-3B do better than Claude 4.1 Opus on antiviral design.) This is because LLM intelligence is notoriously "spiky"; often models are decent-but-not-great at common-sense knowledge, are randomly good at a few domains, but make mistakes on lots of other tasks. RunRL creates spikes precisely on the tasks where you need them.

Pricing: $80/node-hour. Most models up to 14B parameters fit on one node (0.6-1.2 TB of VRAM). We do full fine-tuning, at the cost of parameter-efficiency (with RL, people seem to care a lot about the last few percent gains in e.g. agent reliability).

Next up: continuous learning; tool use. Tool use is currently in private beta, which you can join here: https://forms.gle/D2mSmeQDVCDraPQg8

We'd love to hear any thoughts, questions, or positive or negative reinforcement!

Comments

nextworddev•1h ago
Is there any credence to the view that these startups are basically dspy wrappers
-_-•53m ago
DSPy is great for prompt optimization but not so much for RL fine-tuning (their support is "extremely EXPERIMENTAL"). The nice thing about RL is that the exact prompts don't matter so much. You don't need to spell out every edge case, since the model will get an intuition for how to do its job well via the training process.
nextworddev•28m ago
Isn’t the latest trend in RL mostly about prompt optimization as opposed to full fine tuning
ag8•14m ago
prompt optimization is very cool, and we use it for certain problems! The main goal with this launch is to democratize access to "the real thing"; in many cases, full RL allows you to get the last few percent in reliability for things like complex agentic workflows where prompt optimization doesn't quite get you far enough.

There's also lots of interesting possibilities such as RLing a model on a bunch of environments and then prompt optimizing it on each specific one, which seems way better than, like, training and hot-swapping many LoRAs. In any case, _someone's_ ought to provide a full RL api, and we're here to do that well!

nextworddev•11m ago
Thanks. Is this mainly for verifiable tasks or any general task
-_-•1m ago
There needs to be some way of automatically assessing performance on the task, though this could be with a Python function or another LLM as a judge (or a combination!)