frontpage.

I think RL as a method which produces training data by model's predictions — It directly leads the model to extend its output range because of increased diversity of the data. However, fundamentally RL relies on bootstrapping and has moving target problem which are the reason of its poor stability. One of the most tractable method to approximate value function is TD which causes sample noise, function approximator error and moving target problems. I argue that we need to extend pure RL theory at the level of the Bellman equation to achieve more stable RL. Consequently, we need both a better mathematical foundation for value functions and a tractable approximation method that are aligned with each other — free from problems

Tell HN: Happy Fathers Day

Ask HN: Do you have an unusual income source

Ask HN: Are you being "529 Overloaded" by Anthropic too?

My Opinion on RL

GitHub Banned All CI for Our (OSS) Org Because of Bad Drive-By Contributors

Ask HN: Will programmers write more efficient code during the memory shortage?

Ask HN: What would justify writting an OS kernel in 2026?

Ask HN: Is anyone using the A2A protocol?

Norrin – Git/ diff control in Claude Code

Ask HN: How should I convert Microsoft Word documents to Markdown?

Ask HN: Are You a Workaholic?

Ask HN: What tools are you using for AI-assisted code review?

Ask HN: After you ship a feature, what happens to what you learned?

Ask HN: Favorite aspects of Cocoa/NeXTSTEP for app dev?

Ask HN: What are your favourite Hacker News comments?

Ask HN: How to get ideas for space startups?

Ask HN: What technique do you use to make Claude Code deterministic?

Ask HN: What do you care about? What is your joy and purpose?

Ask HN: Do you give AI coding agents their own GitHub account?

Ask HN: What is your #1 practical lesson or "aha" moment from coding with AI?

Ask HN: Do you use Claude Code, Codex, or something else?

Ask HN: I'm lost. How can I define ICP (Ideal Customer Profile)?

Forked CozoDB to give agents cognitive primitives

Ask HN: How to handle kernel struct changes (e.g. iov_iter) in eBPF?

Ask HN: Is anyone else leaving AUR?

Ask HN: Need advice on distributing and testing what I build

Ask HN: What is the coolest tech progress outside AI?

Ask HN: Do you find vibe coding / agentic engineering to be fulfilling?

Tell HN: Happy Father's Day

Ask HN: Is there a way to stop the animated Google Doodles?

My Opinion on RL

Comments

Tell HN: Happy Fathers Day

Ask HN: Do you have an unusual income source

Ask HN: Are you being "529 Overloaded" by Anthropic too?

My Opinion on RL

GitHub Banned All CI for Our (OSS) Org Because of Bad Drive-By Contributors

Ask HN: Will programmers write more efficient code during the memory shortage?

Ask HN: What would justify writting an OS kernel in 2026?

Ask HN: Is anyone using the A2A protocol?

Norrin – Git/ diff control in Claude Code

Ask HN: How should I convert Microsoft Word documents to Markdown?

Ask HN: Are You a Workaholic?

Ask HN: What tools are you using for AI-assisted code review?

Ask HN: After you ship a feature, what happens to what you learned?

Ask HN: Favorite aspects of Cocoa/NeXTSTEP for app dev?

Ask HN: What are your favourite Hacker News comments?

Ask HN: How to get ideas for space startups?

Ask HN: What technique do you use to make Claude Code deterministic?

Ask HN: What do you care about? What is your joy and purpose?

Ask HN: Do you give AI coding agents their own GitHub account?

Ask HN: What is your #1 practical lesson or "aha" moment from coding with AI?

Ask HN: Do you use Claude Code, Codex, or something else?

Ask HN: I'm lost. How can I define ICP (Ideal Customer Profile)?

Forked CozoDB to give agents cognitive primitives

Ask HN: How to handle kernel struct changes (e.g. iov_iter) in eBPF?

Ask HN: Is anyone else leaving AUR?

Ask HN: Need advice on distributing and testing what I build

Ask HN: What is the coolest tech progress outside AI?

Ask HN: Do you find vibe coding / agentic engineering to be fulfilling?

Tell HN: Happy Father's Day

Ask HN: Is there a way to stop the animated Google Doodles?