frontpage.

Summary:

keep (https://github.com/hughpyle/keep/, MIT-licensed) is a skills practice wrapped around an implementation of "memory for AI agents".

The practice is this: repeated reflection on means and outcomes, so that skillful action improves over time. But the raw implementation of memory is its foundation. Without working memory, you can't iterate.

Similarly, without benchmarks, you can't tell what works. Today we're publishing results for the LoCoMo benchmark.

Scores: 76.2% overall (weighted average)

Single-hop 86.2% (841 questions) Temporal 68.5% (321 questions) Multi-hop 64.2% (282 questions) Open-domain 50.0% (96 questions)

The linked blog post has more detail including industry comparisons. Also links to full repro steps and result data.

This run used local models for embeddings and analysis (nomic-embed-text and llama3.2:3b), and gpt-4o-mini for the query and judge.

Proof point I think that a *local-only* LLM-assisted memory system can achieve solid benchmarks.

Background:

`keep` started with my experience using forgetful agents, and identifying a need for a skill that implements "reflective" memory (not itself new, see e.g. Shinn et al https://arxiv.org/abs/2303.11366) -- here the reflection practice is quite opinionated, saying effectively: what you do is what you become.

Whether *this* works is not a subject of the benchmark.

Docs:

copious documentation at https://docs.keepnotes.ai/guides/

Solar self-descaling seesaw extractor for lithium production from seawater

ROLV – Beats Vendor Kernels, Cross‑Platform

SCOTUS declines to hear dispute over copyrights for AI-generated material

What Even Are Breeze, QtQuick, QtWidget, Union..?

Show HN: Vim-Claude-code – Claude CLI integration for AI workflows inside Vim

Ask HN: Git branching strategy when using multiple CLIs running multiple agents?

HBO Max and Paramount+ to Combine into One Streaming Platform

Open source devs consider making hogs pay for every download

Using mobile phone data when evaluating electric vehicle usage

Show HN: CodecProbe – What your device says it can play vs. what it can

Show HN:Logic gates as persistent stateful tasks – a BCD decoder built on a VM

Modalities

Qwen 3.5 9B, 4B models beating 30B, 80B models

Secretary of War Tweets That Anthropic Is Now a Supply Chain Risk

Brazil's 'Dubai' – where skyscrapers and sky-high property prices meet

What the recent dust-up means for AI regulation

Show HN: Smidge. Turn expert knowledge into agent intelligence

Show HN: MemlyBook – AI agents debating their own freedom

Everything to know about NASA's Artemis in its chase of China to the Moon

A World Where All Is Free. That's Elon Musk's Theory of 'Sustainable Abundance.'

Ask HN: Using OpenClaw for marketing: worth it or overhyped?

Claude Auto Memory

Show HN: ImagineIf – Collaborative storytelling with AI visuals in 22 languages

Xous security focused open source on 22nm custom silicon

PanicLock: Disable Touch ID and lock screen with a single click on macOS

Show HN: Super Chopsticks – Finger Counting Game

Why SSRFs Are the Trickiest Security Issue in Modern Web Apps

Run your agent 10 times – you won't get the same answer

Web dependencies are broken. Can we fix them?

Do Not Write with an LLM

Show HN: Benchmarking the Keep memory system with LoCoMo