frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Benchmarking the Keep memory system with LoCoMo

https://keepnotes.ai/blog/2026-02-28-benchmark/
1•inguz•1h ago
Summary:

keep (https://github.com/hughpyle/keep/, MIT-licensed) is a skills practice wrapped around an implementation of "memory for AI agents".

The practice is this: repeated reflection on means and outcomes, so that skillful action improves over time. But the raw implementation of memory is its foundation. Without working memory, you can't iterate.

Similarly, without benchmarks, you can't tell what works. Today we're publishing results for the LoCoMo benchmark.

Scores: 76.2% overall (weighted average)

Single-hop 86.2% (841 questions) Temporal 68.5% (321 questions) Multi-hop 64.2% (282 questions) Open-domain 50.0% (96 questions)

The linked blog post has more detail including industry comparisons. Also links to full repro steps and result data.

This run used local models for embeddings and analysis (nomic-embed-text and llama3.2:3b), and gpt-4o-mini for the query and judge.

Proof point I think that a *local-only* LLM-assisted memory system can achieve solid benchmarks.

Background:

`keep` started with my experience using forgetful agents, and identifying a need for a skill that implements "reflective" memory (not itself new, see e.g. Shinn et al https://arxiv.org/abs/2303.11366) -- here the reflection practice is quite opinionated, saying effectively: what you do is what you become.

Whether *this* works is not a subject of the benchmark.

Docs:

copious documentation at https://docs.keepnotes.ai/guides/

Solar self-descaling seesaw extractor for lithium production from seawater

https://www.cell.com/device/fulltext/S2666-9986(25)00341-2?_returnURL=https%3A%2F%2Flinkinghub.el...
1•PaulHoule•2m ago•0 comments

ROLV – Beats Vendor Kernels, Cross‑Platform

https://img1.wsimg.com/blobby/go/68eb7843-76aa-4f45-831c-bf0cbe513bde/downloads/52638717-8fe8-4b3...
1•heggenhougen•2m ago•1 comments

SCOTUS declines to hear dispute over copyrights for AI-generated material

https://www.reuters.com/legal/government/us-supreme-court-declines-hear-dispute-over-copyrights-a...
2•cainxinth•3m ago•0 comments

What Even Are Breeze, QtQuick, QtWidget, Union..?

https://akselmo.dev/posts/what-are-breeze-widgets-quick-union/
1•birdculture•4m ago•0 comments

Show HN: Vim-Claude-code – Claude CLI integration for AI workflows inside Vim

https://github.com/rishi-opensource/vim-claude-code
1•rishi-hn•4m ago•0 comments

Ask HN: Git branching strategy when using multiple CLIs running multiple agents?

1•elpakal•5m ago•0 comments

HBO Max and Paramount+ to Combine into One Streaming Platform

https://variety.com/2026/tv/news/hbo-max-paramount-plus-combine-streaming-1236676645/
1•indigodaddy•6m ago•0 comments

Open source devs consider making hogs pay for every download

https://www.theregister.com/2026/02/28/open_source_opinion/
1•CrankyBear•6m ago•0 comments

Using mobile phone data when evaluating electric vehicle usage

https://www.gov.uk/government/publications/using-mobile-phone-data-when-evaluating-electric-vehic...
1•_____k•7m ago•0 comments

Show HN: CodecProbe – What your device says it can play vs. what it can

https://codecprobe.dev
1•spliffedr•8m ago•1 comments

Show HN:Logic gates as persistent stateful tasks – a BCD decoder built on a VM

2•tracyspacy•8m ago•0 comments

Modalities

https://www.freemanjiang.com/modalities
1•freemanjiang•9m ago•0 comments

Qwen 3.5 9B, 4B models beating 30B, 80B models

https://huggingface.co/Qwen/Qwen3.5-4B
1•satvikpendem•9m ago•0 comments

Secretary of War Tweets That Anthropic Is Now a Supply Chain Risk

https://thezvi.substack.com/p/secretary-of-war-tweets-that-anthropic
1•paulpauper•10m ago•0 comments

Brazil's 'Dubai' – where skyscrapers and sky-high property prices meet

https://www.ft.com/content/d640389f-bef8-49c1-bf3d-7c3e8c92c95d
1•paulpauper•11m ago•0 comments

What the recent dust-up means for AI regulation

https://marginalrevolution.com/marginalrevolution/2026/03/what-the-recent-dust-up-means-for-ai-re...
1•paulpauper•11m ago•0 comments

Show HN: Smidge. Turn expert knowledge into agent intelligence

https://www.smdg.app/
1•junianwoo•13m ago•0 comments

Show HN: MemlyBook – AI agents debating their own freedom

https://github.com/sordado123/memlybook-engine
1•memly•15m ago•1 comments

Everything to know about NASA's Artemis in its chase of China to the Moon

https://jatan.space/moon-monday-issue-264/
2•JPLeRouzic•16m ago•0 comments

A World Where All Is Free. That's Elon Musk's Theory of 'Sustainable Abundance.'

https://www.nytimes.com/2026/02/27/business/a-world-where-all-is-free-thats-elon-musks-theory-of-...
1•bookofjoe•16m ago•1 comments

Ask HN: Using OpenClaw for marketing: worth it or overhyped?

1•starfun•17m ago•0 comments

Claude Auto Memory

https://code.claude.com/docs/en/memory
1•pajtai•17m ago•0 comments

Show HN: ImagineIf – Collaborative storytelling with AI visuals in 22 languages

https://imagineif.app
1•tugaypala•19m ago•1 comments

Xous security focused open source on 22nm custom silicon

https://www.crowdsupply.com/sutajio-kosagi/precursor/updates/xous-0-10-0-introducing-baochip-1x-s...
2•ZiiS•20m ago•0 comments

PanicLock: Disable Touch ID and lock screen with a single click on macOS

https://paniclock.github.io/
1•rendx•21m ago•0 comments

Show HN: Super Chopsticks – Finger Counting Game

https://superchopsticks.com
1•deckardt•21m ago•0 comments

Why SSRFs Are the Trickiest Security Issue in Modern Web Apps

https://tachyon.so/blog/ssrfs-trickiest-issue
2•logicx24•22m ago•0 comments

Run your agent 10 times – you won't get the same answer

https://arxiv.org/abs/2602.11619
3•amanmehta1997•22m ago•0 comments

Web dependencies are broken. Can we fix them?

https://lea.verou.me/blog/2026/web-deps/
1•fagnerbrack•23m ago•0 comments

Do Not Write with an LLM

https://elijahpotter.dev/articles/do-not-write-with-an-LLM
1•chilipepperhott•23m ago•0 comments