frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Big Hunger by Walter J Miller, Jr. (1952)

https://lauriepenny.substack.com/p/the-big-hunger
1•shervinafshar•41s ago•0 comments

The Genus Amanita

https://www.mushroomexpert.com/amanita.html
1•rolph•5m ago•0 comments

We have broken SHA-1 in practice

https://shattered.io/
1•mooreds•6m ago•1 comments

Ask HN: Was my first management job bad, or is this what management is like?

1•Buttons840•7m ago•0 comments

Ask HN: How to Reduce Time Spent Crimping?

1•pinkmuffinere•8m ago•0 comments

KV Cache Transform Coding for Compact Storage in LLM Inference

https://arxiv.org/abs/2511.01815
1•walterbell•13m ago•0 comments

A quantitative, multimodal wearable bioelectronic device for stress assessment

https://www.nature.com/articles/s41467-025-67747-9
1•PaulHoule•15m ago•0 comments

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

https://www.wsj.com/world/india/why-big-tech-is-throwing-cash-into-india-in-quest-for-ai-supremac...
1•saikatsg•15m ago•0 comments

How to shoot yourself in the foot – 2026 edition

https://github.com/aweussom/HowToShootYourselfInTheFoot
1•aweussom•15m ago•0 comments

Eight More Months of Agents

https://crawshaw.io/blog/eight-more-months-of-agents
3•archb•17m ago•0 comments

From Human Thought to Machine Coordination

https://www.psychologytoday.com/us/blog/the-digital-self/202602/from-human-thought-to-machine-coo...
1•walterbell•17m ago•0 comments

The new X API pricing must be a joke

https://developer.x.com/
1•danver0•18m ago•0 comments

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

https://rma-dashboard.bukhari-kibuka7.workers.dev/
1•bumahkib7•19m ago•0 comments

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

https://github.com/2015xli/jqassistant-graph-rag
1•artigent•24m ago•0 comments

Python Only Has One Real Competitor

https://mccue.dev/pages/2-6-26-python-competitor
3•dragandj•25m ago•0 comments

Tmux to Zellij (and Back)

https://www.mauriciopoppe.com/notes/tmux-to-zellij/
1•maurizzzio•26m ago•1 comments

Ask HN: How are you using specialized agents to accelerate your work?

1•otterley•27m ago•0 comments

Passing user_id through 6 services? OTel Baggage fixes this

https://signoz.io/blog/otel-baggage/
1•pranay01•28m ago•0 comments

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

https://davmail.sourceforge.net/
1•todsacerdoti•29m ago•0 comments

Visual data modelling in the browser (open source)

https://github.com/sqlmodel/sqlmodel
1•Sean766•31m ago•0 comments

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

https://github.com/chinonsochikelue/tharos
1•fluantix•31m ago•0 comments

Oddly Simple GUI Programs

https://simonsafar.com/2024/win32_lights/
1•MaximilianEmel•31m ago•0 comments

The New Playbook for Leaders [pdf]

https://www.ibli.com/IBLI%20OnePagers%20The%20Plays%20Summarized.pdf
1•mooreds•32m ago•1 comments

Interactive Unboxing of J Dilla's Donuts

https://donuts20.vercel.app
1•sngahane•33m ago•0 comments

OneCourt helps blind and low-vision fans to track Super Bowl live

https://www.dezeen.com/2026/02/06/onecourt-tactile-device-super-bowl-blind-low-vision-fans/
1•gaws•35m ago•0 comments

Rudolf Vrba

https://en.wikipedia.org/wiki/Rudolf_Vrba
1•mooreds•35m ago•0 comments

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

https://www.medpagetoday.com/neurology/autism/119747
1•paulpauper•36m ago•0 comments

Wellness Hotels Discovery Application

https://aurio.place/
1•cherrylinedev•37m ago•1 comments

NASA delays moon rocket launch by a month after fuel leaks during test

https://www.theguardian.com/science/2026/feb/03/nasa-delays-moon-rocket-launch-month-fuel-leaks-a...
1•mooreds•38m ago•0 comments

Sebastian Galiani on the Marginal Revolution

https://marginalrevolution.com/marginalrevolution/2026/02/sebastian-galiani-on-the-marginal-revol...
2•paulpauper•41m ago•0 comments
Open in hackernews

Recursive Language Models: the paradigm of 2026

https://www.primeintellect.ai/blog/rlm
5•skhameneh•1mo ago

Comments

obiefernandez•1mo ago
The RLM framing basically turns long-context into an RL problem over what to remember and where to route it: main model context vs Python vs sub-LLMs. That’s a nice instantiation of The Bitter Lesson, but it also means performance is now tightly coupled to whatever reward signal you happen to define in those environments. Do you have any evidence yet that policies learned on DeepDive / Oolong-style tasks transfer to “messy” real workloads (multi-week code refactors, research over evolving corpora, etc.), or are we still in the “per-benchmark policy” regime?

The split between main model tokens and sub-LLM tokens is clever for cost and context rot, but it also hides the true economic story. For many users the cost that matters is total tokens across all calls, not just the controller’s context. Some of your plots celebrate higher “main model token efficiency” while total tokens rise substantially. Do you have scenarios where RLM is strictly more cost-efficient at equal or better quality, or is the current regime basically “pay more total tokens to get around context limits”?

math-python is the most damning data point: same capabilities, but the RLM harness makes models worse and slower. That feels like a warning that “more flexible scaffold” is not automatically a win; you’re introducing an extra layer of indirection that the model has not been optimized for. The claim that RL training over the RLM will fix this is plausible, but also unfalsifiable until you actually show a model that beats a strong plain-tool baseline on math with less wall-clock and tokens.

Oolong and verbatim-copy are more encouraging: the controller treating large inputs as opaque blobs and then using Python + sub-LLMs to scan/aggregate is exactly the kind of pattern humans write by hand in agents today. One thing I’d love to see is a comparison vs a well-engineered non-RL agent baseline that does essentially the same thing but with hand-written heuristics (chunk + batch + regex/SQL/etc.). Right now the RLM looks like a principled way to let the model learn those heuristics, but the post doesn’t really separate “benefit from architecture” vs “benefit from just having more structure/tools than a vanilla single call.”

On safety / robustness: giving the model a persistent Python REPL and arbitrary pip is powerful, but it also dramatically expands the attack surface if this ever runs on untrusted inputs. Are you treating RLM as strictly a research/eval harness, or do you envision this being exposed in production agent systems? If the latter, sandboxing guarantees and resource controls probably matter as much as reward curves.