frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

RL algorithms are less bitter-lesson-pilled than 2015-era deep learning

1•rajap•9h ago
The real issue isn't reward shaping or curriculum learning - everyone complains about those. The deeper problem is that we're hardcoding the credit assignment timescale into our algorithms.

Discount factors (γ), n-step returns, GAE λ parameters - these are human priors about temporal abstraction baked directly into the learning signal. PPO's GAE(λ) literally tells the algorithm "here's how far into the future you should care about consequences." We're not learning this, we're imposing it. Different domains need different λ values. That's manual feature engineering, RL-style.

Biological learning doesn't have a global discount factor slider. Dopamine and temporal difference learning in the brain operate at multiple timescales simultaneously - the brain learns which timescales matter for which situations. Our algorithms? They get a single γ parameter tuned by grad students.

Even worse: exploration strategies are domain-specific hacks. ε-greedy for Atari, continuous noise processes for robotics, count-based bonuses for sparse rewards. We're essentially doing "exploration engineering" for each domain, like it's 2012 computer vision all over again.

Compare this to supervised learning circa 2015: we stopped engineering features and just scaled transformers. The architecture learned what mattered. RL in 2025? Still tweaking γ, λ, exploration coefficients, entropy bonuses for every new task.

True bitter-lesson compliance would mean learning your own temporal abstractions (dynamic γ), learning how to explore (meta-RL over exploration strategies), and learning credit assignment windows (adaptive eligibility traces). Some promising directions exist - options frameworks, meta-RL, world models with learned abstraction - but they're not mainstream because they're compute-hungry and unstable. We keep returning to human priors because they're cheaper. That's the opposite of the bitter lesson.

The irony is stark: RL researchers talk about "end-to-end learning" while manually tuning the most fundamental learning signal parameters. Imagine if vision researchers were still manually setting feature detector orientations in 2025. That's where RL is.

I predict: The next major RL breakthrough won't come from better policy gradient estimators. It'll come from algorithms that discover their own temporal abstractions and exploration strategies through meta-learning at scale. Only then will RL be bitter-lesson-pilled.

The Learning Curve Moat in Data Systems

https://www.typedef.ai/blog/the-learning-curve-moat-in-data-systems
1•cpard•37s ago•1 comments

How Trump Is Using Generated Imagery to Attack Enemies and Rouse Supporters

https://www.nytimes.com/interactive/2025/10/21/business/media/trump-ai-truth-social-no-kings.html
1•pretext•1m ago•0 comments

Forget SEO. Welcome to the World of Generative Engine Optimization

https://www.wired.com/story/goodbye-seo-hello-geo-brandlight-openai/
1•quapster•2m ago•0 comments

Spin LLM questions, get better answers

https://home.alles-tools.com/blog/2025/10/21/spin-llm-questions-get-better-answers/
1•Antitoxic6185•5m ago•0 comments

Building Instructions

https://obsolescence.dev/pidp8-get-one.html
1•jruohonen•5m ago•0 comments

Public Trust Demands Open-Source Voting Systems

https://www.voting.works/news/public-trust-demands-open-source-voting-systems
1•philips•6m ago•0 comments

Gitpod is now Ona, moving beyond the IDE

https://ona.com/stories/gitpod-is-now-ona
1•janpio•6m ago•0 comments

Is Sora the Beginning of the End for OpenAI?

https://calnewport.com/is-sora-the-beginning-of-the-end-for-openai/
1•warrenm•6m ago•1 comments

Unify Your Security Stack with Socket Basics

https://socket.dev/blog/socket-basics
1•feross•7m ago•0 comments

Microspeak: The Hockey Stick on Wheels

https://devblogs.microsoft.com/oldnewthing/20251021-00/?p=111710
1•MattSayar•7m ago•0 comments

Tokeko: Interactive LR parsers visualization to learn compilers

https://tokeko.specy.app/
1•andromedaM31•7m ago•0 comments

Show HN: Gnoke Station – A WebOS for Industrial and IoT Dashboards

1•edmundsparrow•8m ago•0 comments

Show HN: FastQR – A Fast C++ QR Code Generator Supporting Batch Processing

1•tranhuucanh•9m ago•0 comments

Amazon's Silent Sacking (2023)

https://justingarrison.com/blog/2023-12-30-amazons-silent-sacking/
1•softwaredoug•9m ago•0 comments

GPU-accelerated agent sandboxes using gaming streaming tech

https://blog.helix.ml/p/gpu-accelerated-ai-agent-sandboxes
2•quesobob•9m ago•0 comments

Taiwan should build a space-enabled kill web, not big warships

https://spacenews.com/taiwan-should-build-a-space-enabled-kill-web-not-big-warships/
3•warrenm•10m ago•0 comments

Ask HN: Our AWS account got compromised after their outage

3•kinj28•12m ago•0 comments

Apple alerts exploit developer that his iPhone was targeted with gov spyware

https://techcrunch.com/2025/10/21/apple-alerts-exploit-developer-that-his-iphone-was-targeted-wit...
4•speckx•15m ago•2 comments

The Lovable Shopify Integration

https://lovable.dev/blog/shopify-integration
2•mattew•16m ago•1 comments

Foreign hackers breached a US nuclear weapons plant via SharePoint flaws

https://www.csoonline.com/article/4074962/foreign-hackers-breached-a-us-nuclear-weapons-plant-via...
4•zdw•16m ago•2 comments

Fork Buckets Like You Fork Code

https://www.tigrisdata.com/blog/fork-buckets-like-code/
1•ryuuseijin•17m ago•0 comments

Random Thoughts About AI

https://dinosaurseateverybody.com/blog/random-thoughts-about-ai
1•dorkrawk•17m ago•1 comments

Bachata Music Explorator

https://www.emusicality.co.uk/home
1•agnishom•18m ago•0 comments

Putin's Mesmeric Sway on Trump

https://www.ft.com/content/7debcf11-5213-44ac-96ff-f18525bc42b5
6•zerosizedweasle•18m ago•1 comments

20,858 Public Domain Audio Books

https://librivox.org/search
2•smooke•19m ago•0 comments

Krea Realtime 14B: an autoregressive real-time video model

https://twitter.com/krea_ai/status/1980358158376988747
1•dvrp•19m ago•1 comments

5 years of no social media

https://raahel.bearblog.dev/5-years-of-no-social-media/
3•speckx•21m ago•0 comments

SynTesla Giorgio III, a monster modular Synthesizer custom-built for Hans Zimmer

https://synthanatomy.com/2025/10/syntesla-giorgio-iii-a-monster-modular-synthesizer-custom-built-...
1•consumer451•22m ago•0 comments

Show HN: Realizing Karpathy's Prediction for Natural Language Programming

3•amthewiz•23m ago•1 comments

UI Library from Font Awesome, Called Web Awesome

https://webawesome.com/
2•sieep•24m ago•1 comments