frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Poddley.com – Follow people, not podcasts

https://poddley.com/guests/ana-kasparian/episodes
1•onesandofgrain•3m ago•0 comments

Layoffs Surge 118% in January – The Highest Since 2009

https://www.cnbc.com/2026/02/05/layoff-and-hiring-announcements-hit-their-worst-january-levels-si...
2•karakoram•3m ago•0 comments

Papyrus 114: Homer's Iliad

https://p114.homemade.systems/
1•mwenge•4m ago•1 comments

DicePit – Real-time multiplayer Knucklebones in the browser

https://dicepit.pages.dev/
1•r1z4•4m ago•1 comments

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

https://arxiv.org/abs/2601.14340
2•PaulHoule•5m ago•0 comments

Show HN: AI Agent Tool That Keeps You in the Loop

https://github.com/dshearer/misatay
2•dshearer•7m ago•0 comments

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

https://drmowinckels.io/blog/2026/sitrep-functions/
1•todsacerdoti•7m ago•0 comments

Achieving Ultra-Fast AI Chat Widgets

https://www.cjroth.com/blog/2026-02-06-chat-widgets
1•thoughtfulchris•9m ago•0 comments

Show HN: Runtime Fence – Kill switch for AI agents

https://github.com/RunTimeAdmin/ai-agent-killswitch
1•ccie14019•11m ago•1 comments

Researchers surprised by the brain benefits of cannabis usage in adults over 40

https://nypost.com/2026/02/07/health/cannabis-may-benefit-aging-brains-study-finds/
1•SirLJ•13m ago•0 comments

Peter Thiel warns the Antichrist, apocalypse linked to the 'end of modernity'

https://fortune.com/2026/02/04/peter-thiel-antichrist-greta-thunberg-end-of-modernity-billionaires/
1•randycupertino•14m ago•2 comments

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

https://www.twz.com/sea/uss-preble-used-helios-laser-to-zap-four-drones-in-expanding-testing
2•breve•19m ago•0 comments

Show HN: Animated beach scene, made with CSS

https://ahmed-machine.github.io/beach-scene/
1•ahmedoo•20m ago•0 comments

An update on unredacting select Epstein files – DBC12.pdf liberated

https://neosmart.net/blog/efta00400459-has-been-cracked-dbc12-pdf-liberated/
2•ks2048•20m ago•0 comments

Was going to share my work

1•hiddenarchitect•23m ago•0 comments

Pitchfork: A devilishly good process manager for developers

https://pitchfork.jdx.dev/
1•ahamez•23m ago•0 comments

You Are Here

https://brooker.co.za/blog/2026/02/07/you-are-here.html
3•mltvc•28m ago•1 comments

Why social apps need to become proactive, not reactive

https://www.heyflare.app/blog/from-reactive-to-proactive-how-ai-agents-will-reshape-social-apps
1•JoanMDuarte•28m ago•1 comments

How patient are AI scrapers, anyway? – Random Thoughts

https://lars.ingebrigtsen.no/2026/02/07/how-patient-are-ai-scrapers-anyway/
1•samtrack2019•29m ago•0 comments

Vouch: A contributor trust management system

https://github.com/mitchellh/vouch
2•SchwKatze•29m ago•0 comments

I built a terminal monitoring app and custom firmware for a clock with Claude

https://duggan.ie/posts/i-built-a-terminal-monitoring-app-and-custom-firmware-for-a-desktop-clock...
1•duggan•30m ago•0 comments

Tiny C Compiler

https://bellard.org/tcc/
2•guerrilla•31m ago•0 comments

Y Combinator Founder Organizes 'March for Billionaires'

https://mlq.ai/news/ai-startup-founder-organizes-march-for-billionaires-protest-against-californi...
2•hidden80•32m ago•2 comments

Ask HN: Need feedback on the idea I'm working on

1•Yogender78•32m ago•0 comments

OpenClaw Addresses Security Risks

https://thebiggish.com/news/openclaw-s-security-flaws-expose-enterprise-risk-22-of-deployments-un...
2•vedantnair•33m ago•0 comments

Apple finalizes Gemini / Siri deal

https://www.engadget.com/ai/apple-reportedly-plans-to-reveal-its-gemini-powered-siri-in-february-...
1•vedantnair•33m ago•0 comments

Italy Railways Sabotaged

https://www.bbc.co.uk/news/articles/czr4rx04xjpo
13•vedantnair•34m ago•3 comments

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

https://github.com/ArthurHeymans/emacs-tramp-rpc
1•fanf2•35m ago•0 comments

Nintendo Wii Themed Portfolio

https://akiraux.vercel.app/
2•s4074433•39m ago•2 comments

"There must be something like the opposite of suicide "

https://post.substack.com/p/there-must-be-something-like-the
1•rbanffy•41m ago•1 comments
Open in hackernews

Reinforcement learning, explained with a minimum of math and jargon

https://www.understandingai.org/p/reinforcement-learning-explained
192•JnBrymn•7mo ago

Comments

mnkv•7mo ago
reasonable post with a decent analogy explaining on-policy learning, only major thing I take issue with is

> Reinforcement learning is a technical subject—there are whole textbooks written about it.

and then linking to the still wip RLHF book instead of the book on RL: Sutton & Barto.

dawnofdusk•7mo ago
Haha that's crazy I'm so used to reading RL papers that when the blog linked to a textbook about RL I just filled in Sutton & Barto without clicking on the link or thinking any further about the matter.

I think the other criticism I have is that the historical importance of RLHF to ChatGPT is sort of sidelined, and the author at the beginning pinpoints something like the rise of agents as the beginning of the influence of RL in language modelling. In fact, the first LLM that attained widespread success was ChatGPT, and the secret sauce was RLHF... no need to start the story so late in 2023-2024.

Peteragain•7mo ago
Reinforcement Learning is basically sticks and carrots and the problem is credit assignment. Did I get hit with the stick because I said 5 plus 3 is 8? Or because I wrote my answers in green ink? Or... That used to be what RL was. S&B talk about "modern reinforcement learning" and introduce "Temporal Difference Learning", but imo the book is a bit of a rummage through GOFAI. Is the recent innovation with LLMs to perhaps use feedback to generate prompts? Talking about RL in this context does seem to be an attempt to freshen up interest. "Look! LLMs version 4.0! Now with added Science!"
vonnik•7mo ago
Another rl explainer:

https://wiki.pathmind.com/deep-reinforcement-learning

lsorber•7mo ago
For those who want to dive deeper, here’s a 300 LOC implementation of GRPO in pure NumPy: https://github.com/superlinear-ai/microGRPO

The implementation learns to play Battleship in about 2000 steps, pretty neat!

jekwoooooe•7mo ago
I don’t think it’s useful to explain things that are fundamentally mathematical by leaving out the math and tech. It’s a good article though
chrisweekly•7mo ago
(caveat: I haven't yet read the article)

Huh? Your 2nd sentence seems to contradict your 1st. Or is the article somehow "good" without being "useful"?

jekwoooooe•7mo ago
It was a good read on the concept but I’m left unsatisfied by hand waving all the stuff. Like how, physically, is the reinforcement actually saved? Is it a number in a file? What is the math behind the reward mechanism? What variables are changed and saved? What is the literal deliverable when you serve this to a client?
littlestymaar•7mo ago
> Huh? Your 2nd sentence seems to contradict your 1st. Or is the article somehow "good" without being "useful"?

The article isn't what the title say it is, so it's still good despite the title claim being questionable.

jxjnskkzxxhx•7mo ago
I would encourage everyone to read the Sutton and barto directly. Best technical book I've read past year. Though if you're trying to minimize math, the first edition is significantly simpler.
ivanbelenky•7mo ago
https://github.com/ivanbelenky/RL one the great pleasures in my life was implementing almost completely this book
jxjnskkzxxhx•7mo ago
Pretty cool thank you for sharing. How long did this take you?