frontpage.

Show HN: Sleeping LLM – A language model that remembers by sleeping

https://github.com/vbario/sleeping-llm

2•vbaranov87•1h ago

I built a system that gives LLMs persistent memory from conversations — not through RAG or databases, but by editing the model's actual weights. The knowledge lives in the parameters. The context window is empty.

During wake, facts from conversation are injected directly into MLP weights via MEMIT (a single forward pass, instant recall). During sleep, the system audits which memories degraded, refreshes them with null-space constraints (guaranteeing orthogonality to working memories), then progressively transfers knowledge into LoRA — like biological memory consolidation from hippocampus to neocortex.

The key problem was a hard capacity ceiling: the 8B model sustains 0.92 recall up to 13 facts, then crashes to 0.57 at fact 14 — a sharp phase transition, not gradual decay. And LoRA consolidation was blocked by what I call the "alignment tax": RLHF training fights back against injected knowledge (37% recall loss on 8B from a single LoRA pass).

The fix: per-fact graduated consolidation. Each fact independently tracks its own stage and advances only when LoRA proves it absorbed that specific fact. A dissolution schedule (1.0 → 0.5 → 0.1 → 0.0) gradually removes the MEMIT edit as LoRA takes over. And cumulative fusing — training each cycle on the already-fused model — reduces the alignment tax from catastrophic to negligible (starting loss drops 2.91 → 0.62 by cycle 2).

Results on Llama 3.1 8B (4-bit, 2×H100): - 100% advancement rate at 5/10/15/20 facts - 1.00 chat recall at all scales - MEMIT edits dissolve on schedule, making the buffer renewable - Effective lifetime capacity: unbounded

There's also a biological curiosity: individual facts consolidate at different rates. One synthetic fact ("Aria lives in Portland") is consistently the hardest across very run — some memories are just harder to absorb, same as in biological systems.

6 papers documenting the full journey from initial LoRA prototype to this result: https://doi.org/10.5281/zenodo.18779159

Built with: Python, PyTorch, PEFT, BitsAndBytes, Llama 3.1. Runs on MacBook Air (3B) or H100 (8B/70B).

Discussion: Why Testing Is Important

"AI raises the quality of tuning beyond what most of us can achieve manually"

Show HN: PolyTell-AI Chrome extension that shows Polymarket odds as you browse

Show HN: NetWatch – A Wireshark-style network analyzer TUI built in Rust

Show HN: Treekei – File Tree with Line Counts in CLI

Free AI Headshot Generator – Professional Photos from Any Selfie

LM Link: Use local models on remote devices, powered by Tailscale

$2.1B in Epstein Financial Records. Here's Every Name the Money Touched

Anthropic/Pentagon: allow AI to be used for all military purposes by this Friday

Show HN: Rewrite Text – On-Device AI Writing Tool for iOS

Investment Supply Chain Analysis

Show HN: Skillscape – Engineering skills matrix without the spreadsheet

SimpleSteps – TypeScript-to-ASL Compiler

Demonstration of Network Tap and Packet Filter Using a Security Camera

I thought freelancers hated invoices. They hated the tools

ThePrimeagen goes back to traditional coding

When "technically true" becomes "misleading"

Australia's WiseTech to cut 2k jobs as AI renders manual coding obsolete

CleverMock – An AI voice interviewer that interrupts you like a real human

Show HN: Programmatic (and self-updating) SaaS demo videos

Show HN: Bing Webmaster CLI for Agents and LLMs

A White House Staffer Appears to Run Pro-Trump X Account

Show HN: Onera – Private LLM Inference Inside AMD SEV-SNP Enclaves

Next-Token Predictor Is an AI's Job, Not Its Species

Tests Are the New Moat

'Access to Insight' is shutting down

The next batch of fixed Epstein files links and notes is live

Programming has changed dramatically due to AI in the last 2 months (Karpathy)

Demo of an indie AI collaboration app – beyond Codex and Claude Code desktop

AIQuotaBar – macOS menu bar app that shows Claude and ChatGPT usage limits