frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

https://steamdaily.xyz
1•itshellboy•1m ago•0 comments

The Anthropic Hive Mind

https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b
1•spenvo•1m ago•0 comments

Just Started Using AmpCode

https://intelligenttools.co/blog/ampcode-multi-agent-production
1•BojanTomic•3m ago•0 comments

LLM as an Engineer vs. a Founder?

1•dm03514•3m ago•0 comments

Crosstalk inside cells helps pathogens evade drugs, study finds

https://phys.org/news/2026-01-crosstalk-cells-pathogens-evade-drugs.html
2•PaulHoule•5m ago•0 comments

Show HN: Design system generator (mood to CSS in <1 second)

https://huesly.app
1•egeuysall•5m ago•1 comments

Show HN: 26/02/26 – 5 songs in a day

https://playingwith.variousbits.net/saturday
1•dmje•6m ago•0 comments

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

https://github.com/Paraxiom/topological-coherence
1•slye514•8m ago•1 comments

Top AI models fail at >96% of tasks

https://www.zdnet.com/article/ai-failed-test-on-remote-freelance-jobs/
3•codexon•8m ago•1 comments

The Science of the Perfect Second (2023)

https://harpers.org/archive/2023/04/the-science-of-the-perfect-second/
1•NaOH•9m ago•0 comments

Bob Beck (OpenBSD) on why vi should stay vi (2006)

https://marc.info/?l=openbsd-misc&m=115820462402673&w=2
2•birdculture•13m ago•0 comments

Show HN: a glimpse into the future of eye tracking for multi-agent use

https://github.com/dchrty/glimpsh
1•dochrty•13m ago•0 comments

The Optima-l Situation: A deep dive into the classic humanist sans-serif

https://micahblachman.beehiiv.com/p/the-optima-l-situation
2•subdomain•14m ago•0 comments

Barn Owls Know When to Wait

https://blog.typeobject.com/posts/2026-barn-owls-know-when-to-wait/
1•fintler•14m ago•0 comments

Implementing TCP Echo Server in Rust [video]

https://www.youtube.com/watch?v=qjOBZ_Xzuio
1•sheerluck•14m ago•0 comments

LicGen – Offline License Generator (CLI and Web UI)

1•tejavvo•17m ago•0 comments

Service Degradation in West US Region

https://azure.status.microsoft/en-gb/status?gsid=5616bb85-f380-4a04-85ed-95674eec3d87&utm_source=...
2•_____k•18m ago•0 comments

The Janitor on Mars

https://www.newyorker.com/magazine/1998/10/26/the-janitor-on-mars
1•evo_9•20m ago•0 comments

Bringing Polars to .NET

https://github.com/ErrorLSC/Polars.NET
3•CurtHagenlocher•21m ago•0 comments

Adventures in Guix Packaging

https://nemin.hu/guix-packaging.html
1•todsacerdoti•22m ago•0 comments

Show HN: We had 20 Claude terminals open, so we built Orcha

1•buildingwdavid•23m ago•0 comments

Your Best Thinking Is Wasted on the Wrong Decisions

https://www.iankduncan.com/engineering/2026-02-07-your-best-thinking-is-wasted-on-the-wrong-decis...
1•iand675•23m ago•0 comments

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

https://www.warcraftcn.com/
1•vyrotek•24m ago•0 comments

Trump Vodka Becomes Available for Pre-Orders

https://www.forbes.com/sites/kirkogunrinde/2025/12/01/trump-vodka-becomes-available-for-pre-order...
1•stopbulying•25m ago•0 comments

Velocity of Money

https://en.wikipedia.org/wiki/Velocity_of_money
1•gurjeet•28m ago•0 comments

Stop building automations. Start running your business

https://www.fluxtopus.com/automate-your-business
1•valboa•32m ago•1 comments

You can't QA your way to the frontier

https://www.scorecard.io/blog/you-cant-qa-your-way-to-the-frontier
1•gk1•33m ago•0 comments

Show HN: PalettePoint – AI color palette generator from text or images

https://palettepoint.com
1•latentio•34m ago•0 comments

Robust and Interactable World Models in Computer Vision [video]

https://www.youtube.com/watch?v=9B4kkaGOozA
2•Anon84•38m ago•0 comments

Nestlé couldn't crack Japan's coffee market.Then they hired a child psychologist

https://twitter.com/BigBrainMkting/status/2019792335509541220
1•rmason•39m ago•1 comments
Open in hackernews

Reinforcement learning, explained with a minimum of math and jargon

https://www.understandingai.org/p/reinforcement-learning-explained
192•JnBrymn•7mo ago

Comments

mnkv•7mo ago
reasonable post with a decent analogy explaining on-policy learning, only major thing I take issue with is

> Reinforcement learning is a technical subject—there are whole textbooks written about it.

and then linking to the still wip RLHF book instead of the book on RL: Sutton & Barto.

dawnofdusk•7mo ago
Haha that's crazy I'm so used to reading RL papers that when the blog linked to a textbook about RL I just filled in Sutton & Barto without clicking on the link or thinking any further about the matter.

I think the other criticism I have is that the historical importance of RLHF to ChatGPT is sort of sidelined, and the author at the beginning pinpoints something like the rise of agents as the beginning of the influence of RL in language modelling. In fact, the first LLM that attained widespread success was ChatGPT, and the secret sauce was RLHF... no need to start the story so late in 2023-2024.

Peteragain•7mo ago
Reinforcement Learning is basically sticks and carrots and the problem is credit assignment. Did I get hit with the stick because I said 5 plus 3 is 8? Or because I wrote my answers in green ink? Or... That used to be what RL was. S&B talk about "modern reinforcement learning" and introduce "Temporal Difference Learning", but imo the book is a bit of a rummage through GOFAI. Is the recent innovation with LLMs to perhaps use feedback to generate prompts? Talking about RL in this context does seem to be an attempt to freshen up interest. "Look! LLMs version 4.0! Now with added Science!"
vonnik•7mo ago
Another rl explainer:

https://wiki.pathmind.com/deep-reinforcement-learning

lsorber•7mo ago
For those who want to dive deeper, here’s a 300 LOC implementation of GRPO in pure NumPy: https://github.com/superlinear-ai/microGRPO

The implementation learns to play Battleship in about 2000 steps, pretty neat!

jekwoooooe•7mo ago
I don’t think it’s useful to explain things that are fundamentally mathematical by leaving out the math and tech. It’s a good article though
chrisweekly•7mo ago
(caveat: I haven't yet read the article)

Huh? Your 2nd sentence seems to contradict your 1st. Or is the article somehow "good" without being "useful"?

jekwoooooe•7mo ago
It was a good read on the concept but I’m left unsatisfied by hand waving all the stuff. Like how, physically, is the reinforcement actually saved? Is it a number in a file? What is the math behind the reward mechanism? What variables are changed and saved? What is the literal deliverable when you serve this to a client?
littlestymaar•7mo ago
> Huh? Your 2nd sentence seems to contradict your 1st. Or is the article somehow "good" without being "useful"?

The article isn't what the title say it is, so it's still good despite the title claim being questionable.

jxjnskkzxxhx•7mo ago
I would encourage everyone to read the Sutton and barto directly. Best technical book I've read past year. Though if you're trying to minimize math, the first edition is significantly simpler.
ivanbelenky•7mo ago
https://github.com/ivanbelenky/RL one the great pleasures in my life was implementing almost completely this book
jxjnskkzxxhx•7mo ago
Pretty cool thank you for sharing. How long did this take you?