frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

94% zero-shot in a shifting gridworld, no retraining

1•heavymemory•2mo ago
Key door puzzle. Grid: 8 by 11. Two rooms. One key somewhere in the left room. Two coloured doors in the wall. Goal on the far side.

Train PPO or DQN on one layout and it solves that layout. Shift the key, add or move a wall passage, alter the distractor key setup and performance collapses. The usual story of memorising geometry instead of the rules.

Instead, I train a small set of skills, like find the correct key, go to the passage, open the correct door, reach the goal. Each skill trained once, then frozen. When the layout changes, nothing updates. It retrieves the right skills from longterm memory and composes them.

State space is already large if treated symbolically. Roughly 50 reachable cells for the agent, 50 for the key, 4 door configurations, multiple passage layouts, 3 inventory values, 4 headings. Around 360,000 distinct logical states from conservative counting.

At composition time, the system only reuses states it actually encountered during skill training. No gradients. No online policy adaptation.

Benchmark: 2500 zeroshot episodes with randomised keys and randomised passages. No retraining. Solve rate about 94%.

Frozen skills. New layouts. Still works.

So here's the real question: If hierarchical RL should solve this, why does it still struggle with such a tiny, structured world unless you train it across every variation? Or am I wrong?

And what’s actually being learned when a system generalises to layouts it has never seen?

I'm interested in that discussion. The gap between, this looks trivial and most agents don't generalise, feels like the interesting thing here.

Comments

gus_massa•2mo ago
Do you have a blog post about the whole process? A more detailed explanation may get more traction here.
heavymemory•2mo ago
Thanks, i'll try and publish something soon.

Go 1.22, SQLite, and Next.js: The "Boring" Back End

https://mohammedeabdelaziz.github.io/articles/go-next-pt-2
1•mohammede•5m ago•0 comments

Laibach the Whistleblowers [video]

https://www.youtube.com/watch?v=c6Mx2mxpaCY
1•KnuthIsGod•6m ago•1 comments

I replaced the front page with AI slop and honestly it's an improvement

https://slop-news.pages.dev/slop-news
1•keepamovin•11m ago•1 comments

Economists vs. Technologists on AI

https://ideasindevelopment.substack.com/p/economists-vs-technologists-on-ai
1•econlmics•13m ago•0 comments

Life at the Edge

https://asadk.com/p/edge
1•tosh•19m ago•0 comments

RISC-V Vector Primer

https://github.com/simplex-micro/riscv-vector-primer/blob/main/index.md
2•oxxoxoxooo•22m ago•1 comments

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

2•InvoxoEU•23m ago•0 comments

A Tale of Two Standards, POSIX and Win32 (2005)

https://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html
2•goranmoomin•27m ago•0 comments

Ask HN: Is the Downfall of SaaS Started?

3•throwaw12•28m ago•0 comments

Flirt: The Native Backend

https://blog.buenzli.dev/flirt-native-backend/
2•senekor•29m ago•0 comments

OpenAI's Latest Platform Targets Enterprise Customers

https://aibusiness.com/agentic-ai/openai-s-latest-platform-targets-enterprise-customers
1•myk-e•32m ago•0 comments

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

https://www.cnbc.com/2026/02/06/anthropic-goldman-sachs-ai-model-accounting.html
2•myk-e•34m ago•5 comments

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

https://www.ft.com/content/83488628-8dfd-4060-a7b0-71b1bb012785
1•1vuio0pswjnm7•35m ago•1 comments

Big Tech's AI Push Is Costing More Than the Moon Landing

https://www.wsj.com/tech/ai/ai-spending-tech-companies-compared-02b90046
4•1vuio0pswjnm7•37m ago•0 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
2•1vuio0pswjnm7•39m ago•0 comments

Suno, AI Music, and the Bad Future [video]

https://www.youtube.com/watch?v=U8dcFhF0Dlk
1•askl•41m ago•2 comments

Ask HN: How are researchers using AlphaFold in 2026?

1•jocho12•44m ago•0 comments

Running the "Reflections on Trusting Trust" Compiler

https://spawn-queue.acm.org/doi/10.1145/3786614
1•devooops•49m ago•0 comments

Watermark API – $0.01/image, 10x cheaper than Cloudinary

https://api-production-caa8.up.railway.app/docs
1•lembergs•50m ago•1 comments

Now send your marketing campaigns directly from ChatGPT

https://www.mail-o-mail.com/
1•avallark•54m ago•1 comments

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

https://github.com/joelparkerhenderson/queueing-theory
1•jph•1h ago•0 comments

Show HN: Hibana – choreography-first protocol safety for Rust

https://hibanaworks.dev/
5•o8vm•1h ago•1 comments

Haniri: A live autonomous world where AI agents survive or collapse

https://www.haniri.com
1•donangrey•1h ago•1 comments

GPT-5.3-Codex System Card [pdf]

https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
1•tosh•1h ago•0 comments

Atlas: Manage your database schema as code

https://github.com/ariga/atlas
1•quectophoton•1h ago•0 comments

Geist Pixel

https://vercel.com/blog/introducing-geist-pixel
2•helloplanets•1h ago•0 comments

Show HN: MCP to get latest dependency package and tool versions

https://github.com/MShekow/package-version-check-mcp
1•mshekow•1h ago•0 comments

The better you get at something, the harder it becomes to do

https://seekingtrust.substack.com/p/improving-at-writing-made-me-almost
2•FinnLobsien•1h ago•0 comments

Show HN: WP Float – Archive WordPress blogs to free static hosting

https://wpfloat.netlify.app/
1•zizoulegrande•1h ago•0 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
1•melvinzammit•1h ago•0 comments