frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

EVs Are a Failed Experiment

https://spectator.org/evs-are-a-failed-experiment/
1•ArtemZ•3m ago•0 comments

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

https://www.databricks.com/blog/memalign-building-better-llm-judges-human-feedback-scalable-memory
1•superchink•4m ago•0 comments

CCC (Claude's C Compiler) on Compiler Explorer

https://godbolt.org/z/asjc13sa6
1•LiamPowell•6m ago•0 comments

Homeland Security Spying on Reddit Users

https://www.kenklippenstein.com/p/homeland-security-spies-on-reddit
2•duxup•9m ago•0 comments

Actors with Tokio (2021)

https://ryhl.io/blog/actors-with-tokio/
1•vinhnx•10m ago•0 comments

Can graph neural networks for biology realistically run on edge devices?

https://doi.org/10.21203/rs.3.rs-8645211/v1
1•swapinvidya•22m ago•1 comments

Deeper into the shareing of one air conditioner for 2 rooms

1•ozzysnaps•24m ago•0 comments

Weatherman introduces fruit-based authentication system to combat deep fakes

https://www.youtube.com/watch?v=5HVbZwJ9gPE
2•savrajsingh•25m ago•0 comments

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

http://www.effacermonexistence.com/rcc-hn-1-1
1•formerOpenAI•27m ago•2 comments

A Curated List of ML System Design Case Studies

https://github.com/Engineer1999/A-Curated-List-of-ML-System-Design-Case-Studies
3•tejonutella•31m ago•0 comments

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

https://ponyalpha.pro
1•qzcanoe•35m ago•1 comments

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

https://github.com/Goofygiraffe06/tunbot
1•g1raffe•38m ago•0 comments

Open Problems in Mechanistic Interpretability

https://arxiv.org/abs/2501.16496
2•vinhnx•43m ago•0 comments

Bye Bye Humanity: The Potential AMOC Collapse

https://thatjoescott.com/2026/02/03/bye-bye-humanity-the-potential-amoc-collapse/
2•rolph•48m ago•0 comments

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

https://github.com/virattt/dexter
1•Lwrless•49m ago•0 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE
1•vermilingua•54m ago•0 comments

Essential CDN: The CDN that lets you do more than JavaScript

https://essentialcdn.fluidity.workers.dev/
1•telui•55m ago•1 comments

They Hijacked Our Tech [video]

https://www.youtube.com/watch?v=-nJM5HvnT5k
1•cedel2k1•59m ago•0 comments

Vouch

https://twitter.com/mitchellh/status/2020252149117313349
34•chwtutha•59m ago•5 comments

HRL Labs in Malibu laying off 1/3 of their workforce

https://www.dailynews.com/2026/02/06/hrl-labs-cuts-376-jobs-in-malibu-after-losing-government-work/
4•osnium123•1h ago•1 comments

Show HN: High-performance bidirectional list for React, React Native, and Vue

https://suhaotian.github.io/broad-infinite-list/
2•jeremy_su•1h ago•0 comments

Show HN: I built a Mac screen recorder Recap.Studio

https://recap.studio/
1•fx31xo•1h ago•1 comments

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

1•kachapopopow•1h ago•0 comments

Vectors and HNSW for Dummies

https://anvitra.ai/blog/vectors-and-hnsw/
1•melvinodsa•1h ago•0 comments

Sanskrit AI beats CleanRL SOTA by 125%

https://huggingface.co/ParamTatva/sanskrit-ppo-hopper-v5/blob/main/docs/blog.md
1•prabhatkr•1h ago•1 comments

'Washington Post' CEO resigns after going AWOL during job cuts

https://www.npr.org/2026/02/07/nx-s1-5705413/washington-post-ceo-resigns-will-lewis
4•thread_id•1h ago•1 comments

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

https://twitter.com/claudeai/status/2020207322124132504
1•geeknews•1h ago•0 comments

TSMC to produce 3-nanometer chips in Japan

https://www3.nhk.or.jp/nhkworld/en/news/20260205_B4/
3•cwwc•1h ago•0 comments

Quantization-Aware Distillation

http://ternarysearch.blogspot.com/2026/02/quantization-aware-distillation.html
2•paladin314159•1h ago•0 comments

List of Musical Genres

https://en.wikipedia.org/wiki/List_of_music_genres_and_styles
1•omosubi•1h ago•0 comments
Open in hackernews

Richard Sutton – Father of Reinforced Learning thinks LLMs are a dead end

https://www.dwarkesh.com/p/richard-sutton
12•RyeCombinator•4mo ago

Comments

YeGoblynQueenne•4mo ago
Sutton's alternative to LLMs is RL obviously, I mean duh. He says an alternative theory for the foundation of intelligence is "sensation, action, reward", that animals do this throughout their lives and that intelligence is about figuring out what actions to take to increase the rewards.

Well I have a problem with that, with all respect to Richard Sutton who is one of the AI gods. I don't think his Skinnerian behaviourist paradigm is realistic, I don't think "sensation, action, reward" works in physical reality, in the real world: because in the real world there are situations where pursuing your goals does not increase your reward.

Here's an example of what I mean. Imagine the "reward" that an animal will get from not falling down a cliff and dying. If the animal falls down the cliff and dies, reward is probably negative (maybe even infinitely negative: it's game over, man). But if the animal doesn't fall down the cliff and die, what is the reward?

There's no reward. If there was any reward for not falling down a cliff and dying, then all animals would ever do would be to sit around not falling down cliffs and dying, and just increasing their reward for free. That wouldn't lead to the development of intelligence very fast.

You can try to argue that an animal will obtain a positive reward from just not dying, but that doesn't work: for RL to enforce some behaviour P, it is P that has to be rewarded, not just being alive, in general. Deep RL systems don't learn to play chess by refusing to play.

For RL to work, agents must constantly maximise their reward, not just increase it or just avoid it going negative-infinite. And you just cannot do that in the physical world because there are situations where doing the wrong thing kills you and doing the right thing does not increase your reward.

Digital RL agents can avoid this kind of zero-gains scenario because they can afford to act randomly until they hit a reward, so e.g. an RL chess player can afford to play at random until it figures out how to play. But that doesn't work in the real world, where acting at random has a very high chance of killing an animal. Imagine an animal that randomly jumps off cliffs: game over, man. In the real world if you chase reward without already knowing where it comes from, you better have a very large number of lives [1].

So reward is not all you need. There may be cases where animals use a reward system to guide their behaviours, just like there are cases where humans learn by imitation, but in the general case they don't. It doesn't work. RL doesn't work in the real world and it's not how animals developed intelligence.

__________________

[1] Support for the theory that all animals are descended from cats?