frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

My Opinion on RL

3•umjunsik132•7h ago
I think RL as a method which produces training data by model's predictions — It directly leads the model to extend its output range because of increased diversity of the data. However, fundamentally RL relies on bootstrapping and has moving target problem which are the reason of its poor stability. One of the most tractable method to approximate value function is TD which causes sample noise, function approximator error and moving target problems. I argue that we need to extend pure RL theory at the level of the Bellman equation to achieve more stable RL. Consequently, we need both a better mathematical foundation for value functions and a tractable approximation method that are aligned with each other — free from problems

Comments

ctenb•37m ago
It's not good practice to use acronyms without introducing them. From the title alone it's unclear what this is about, from the text it still had me guessing for a while.

Tell HN: Happy Fathers Day

302•consumer451•16h ago•49 comments

Ask HN: Do you have an unusual income source

26•xupybd•4h ago•7 comments

Ask HN: Are you being "529 Overloaded" by Anthropic too?

6•hmokiguess•8h ago•6 comments

My Opinion on RL

3•umjunsik132•7h ago•1 comments

GitHub Banned All CI for Our (OSS) Org Because of Bad Drive-By Contributors

7•BlueMatt•7h ago•1 comments

Ask HN: Will programmers write more efficient code during the memory shortage?

148•amichail•2d ago•239 comments

Ask HN: What would justify writting an OS kernel in 2026?

4•alonsovm44•9h ago•5 comments

Ask HN: Is anyone using the A2A protocol?

94•asim•4d ago•42 comments

Norrin – Git/ diff control in Claude Code

4•gagewoodard•18h ago•1 comments

Ask HN: How should I convert Microsoft Word documents to Markdown?

4•lkrubner•13h ago•7 comments

Ask HN: Are You a Workaholic?

5•julienreszka•21h ago•2 comments

Ask HN: What tools are you using for AI-assisted code review?

24•agos•3d ago•25 comments

Ask HN: After you ship a feature, what happens to what you learned?

10•gaggle_dk•1d ago•12 comments

Ask HN: Favorite aspects of Cocoa/NeXTSTEP for app dev?

5•elcritch•1d ago•0 comments

Ask HN: What are your favourite Hacker News comments?

4•Imustaskforhelp•1d ago•4 comments

Ask HN: How to get ideas for space startups?

5•asxndu•1d ago•5 comments

Ask HN: What technique do you use to make Claude Code deterministic?

6•hbarka•1d ago•10 comments

Ask HN: What do you care about? What is your joy and purpose?

9•bix6•1d ago•20 comments

Ask HN: Do you give AI coding agents their own GitHub account?

6•ahmd•19h ago•4 comments

Ask HN: What is your #1 practical lesson or "aha" moment from coding with AI?

9•johndavid9991•1d ago•15 comments

Ask HN: Do you use Claude Code, Codex, or something else?

8•JohnDSDev•1d ago•23 comments

Ask HN: I'm lost. How can I define ICP (Ideal Customer Profile)?

6•snowhy•3d ago•6 comments

Forked CozoDB to give agents cognitive primitives

3•shanrizvi•2d ago•0 comments

Ask HN: How to handle kernel struct changes (e.g. iov_iter) in eBPF?

4•morolis•1d ago•2 comments

Ask HN: Is anyone else leaving AUR?

8•lordkrandel•3d ago•7 comments

Ask HN: Need advice on distributing and testing what I build

5•darth-pixit•1d ago•2 comments

Ask HN: What is the coolest tech progress outside AI?

16•vantareed•2d ago•9 comments

Ask HN: Do you find vibe coding / agentic engineering to be fulfilling?

13•uejfiweun•3d ago•13 comments

Tell HN: Happy Father's Day

10•atestu•21h ago•5 comments

Ask HN: Is there a way to stop the animated Google Doodles?

12•arnejenssen•3d ago•14 comments