Show HN: Recall: Give Claude perfect memory with Redis-backed persistent context

https://www.npmjs.com/package/@joseairosa/recall

57•elfenleid•2h ago

Hey HN! I'm José, and I built Recall to solve a problem that was driving me crazy.

The Problem: I use Claude for coding daily, but every conversation starts from scratch. I'd explain my architecture, coding standards, past decisions... then hit the context limit and lose everything. Next session? Start over.

The Solution: Recall is an MCP (Model Context Protocol) server that gives Claude persistent memory using Redis + semantic search. Think of it as long-term memory that survives context limits and session restarts.

How it works: - Claude stores important context as "memories" during conversations - Memories are embedded (OpenAI) and stored in Redis with metadata - Semantic search retrieves relevant memories automatically - Works across sessions, projects, even machines (if you use cloud Redis)

Key Features: - Global memories: Share context across all projects - Relationships: Link related memories into knowledge graphs - Versioning: Track how memories evolve over time - Templates: Reusable patterns for common workflows - Workspace isolation: Project A memories don't pollute Project B

Tech Stack: - TypeScript + MCP SDK - Redis for storage - OpenAI embeddings (text-embedding-3-small) - ~189KB bundle, runs locally

Current Stats: - 27 tools exposed to Claude - 10 context types (directives, decisions, patterns, etc.) - Sub-second semantic search on 10k+ memories - Works with Claude Desktop, Claude Code, any MCP client

Example Use Case: I'm building an e-commerce platform. I told Claude once: "We use Tailwind, prefer composition API, API rate limit is 1000/min." Now every conversation, Claude remembers and applies these preferences automatically.

What's Next (v1.6.0 in progress): - CI/CD pipeline with GitHub Actions - Docker support for easy deployment - Proper test suite with Vitest - Better error messages and logging

Try it:

npm install -g @joseairosa/recall # Add to claude_desktop_config.json # Start using persistent memory

Comments

jcmontx•1h ago

If this delivers can be 100% game changer, I will try it out and give some feedback

elfenleid•1h ago

I've been using it for a while now, personally. I've found that I have less issues with context, I can easily recall (pun intended) after a context compact, etc.

bryanhogan•1h ago

Why would you not use context files in form of .md? E.g. how the SpecKit project does it.

elfenleid•1h ago

I still do, but having this allows for strategies like memory decay for older information. It also allows for much more structured searching capabilities, instead of opening file which are less structured.

.md files work great for small projects. But they hit limits:

1. Size - 100KB context.md won't fit in the window 2. No search - Claude reads the whole file every time 3. Manual - You decide what to save, not Claude 4. Static - Doesn't evolve or learn

Recall fixes this: - Semantic search finds relevant memories only - Auto-captures context during conversations - Handles 10k+ memories, retrieves top 5 - Works across multiple projects

Real example: I have 2000 memories. That's 200KB in .md form. Recall retrieves 5 relevant ones = 2KB.

And of course, there's always the option to use both .md for docs, Recall for dynamic learning.

Does that help?

bryanhogan•1h ago

I'm not sure. You don't use a single context.md file, you use multiple and add them when relevant in context. AIs adjust these as you need, so they do "evolve". So what you try to achieve is already solved.

These two videos on using Claude well explain what I mean:

1. Claude Code best practices: https://youtu.be/gv0WHhKelSE

2. Claude Code with Playwright MCP and subagents: https://youtu.be/xOO8Wt_i72s

elfenleid•1h ago

Yeah that's a solid workflow and honestly simpler than what I built - I think Recall makes sense when you hit the scale where managing multiple .md files becomes tedious (like 50+ conversations across 10 projects), but you're right that for most people your approach works great and is way less complex.

steveklabnik•55m ago

Memory features are useful for the same reason that a human would use a database instead of a large .md file: it's more efficient to query for something and get exactly what you want than it is to read through a large, ultimately less structured document.

That said, Claude now has a native memory feature as of the 2.0 release recently: https://docs.claude.com/en/docs/claude-code/memory so the parent's tool may be too late, unless it offers some kind of advantage over that. I don't know how to make that comparison, personally.

atonse•21m ago

It's had native memory in the form of per-directory CLAUDE.md files for a while though. Not just 2.0

ebcode•17m ago

Claude’s memory function adds a note to the file(s) that it reads on startup. Whereas this tool pulls from a database of memories on-demand.

pacoWebConsult•1h ago

Why would you bloat the (already crowded) context window with 27 tools instead of the 2 simplest ones: Save Memory & Search Memory? Or even just search, handling the save process through a listener on a directory of markdown memory files that Claude Code can natively edit?

elfenleid•1h ago

That's a great point, the reality is that context, at least from personal experience, is brittle and over time will start to lose precision. This is a always there, persistent way for claude to access "memories". I've been running with it for about a week now and did not feel that the context would get bloated.

fishmicrowaver•59m ago

People are just ricing out AI like they rice out Linux, nvim or any other thing. It's pretty simple to get results from the tech. Use the CLI and know what you're doing.

warthog•1h ago

imo it would be better to carry the whole memory outside of the inference time where you could use an LLM as a judge to track the output of the chat and the prompts submitted

it would sort of work like grammarly itself and you can use it to metaprompt

i find all the memory tooling, even native ones on claude and chatgpt to be too intrusive

elfenleid•1h ago

Totally get what you're saying! Having Claude manually call memory tools mid-conversation does feel intrusive, I agree with that, especially since you need to keep saying Yes to the tool access.

Your approach is actually really interesting, like a background process watching the conversation and deciding what's worth remembering. More passive, less in-your-face.

I thought about this too. The tradeoff I made:

Your approach (judge/watcher): - Pro: Zero interruption to conversation flow - Pro: Can use cheaper model for the judge - Con: Claude doesn't know what's in memory when responding - Con: Memory happens after the fact

Tool-based (current Recall): - Pro: Claude actively uses memory while thinking - Pro: Can retrieve relevant context mid-response - Con: Yeah, it's intrusive sometimes

Honestly both have merit. You could even do both, background judge for auto-capture, tools when Claude needs to look something up.

The Grammarly analogy is spot on. Passive monitoring vs active participation.

Have you built something with the judge pattern? I'd be curious how well it works for deciding what's memorable vs noise.

Maybe Recall needs a "passive mode" option where it just watches and suggests memories instead of Claude actively storing them. That's a cool idea.

westurner•1h ago

Is this the/a agent model routing problem? Which agent or subagent has context precedence?

jj autocommits when the working copy changes, and you can manually stage against @-: https://news.ycombinator.com/item?id=44644820

OpenCog differentiates between Experiential and Episodic memory; and various processes rewrite a hypergraph stored in RAM in AtomSpace. I don't remember how the STM/LTM limit is handled in OpenCog.

So the MRU/MFU knapsack problem and more predictable primacy/recency bias because context length limits and context compaction?

westurner•22m ago

OpenCogPrime:EconomicAttentionAllocation: https://wiki.opencog.org/w/OpenCogPrime:EconomicAttentionAll... :

> Economic Attention Allocation (ECAN) was an OpenCog subsystem intended to control attentional focus during reasoning. The idea was to allocate attention as a scarce resource (thus, "economic") which would then be used to "fund" some specific train of thought. This system is no longer maintained; it is one of the OpenCog Fossils.

(Smart contracts require funds to execute (redundantly and with consensus), and there there are scarce resources).

Now there's ProxyNode and there are StorageNode implementations, but Agent is not yet reimplemented in OpenCog?

ProxyNode implementers: ReadThruProxy, WriteThruProxy, SequentialReadProxy, ReadWriteProxy, CachingProxy

StorageNode > Implementations: https://wiki.opencog.org/w/StorageNode#Implementations

namanyayg•42m ago

I've been building exactly this. Currently a beta feature in my existing product. Can I reach out to you for your feedback on metaprompting/grammarly aspect of it?

tarun_anand•1h ago

Claude introduced it's own memories api.. have you had a look?

elfenleid•1h ago

Yes I did, I worked on this a while back, before it was availabale I believe. I'll have another check. Thanks for the heads up

mannyv•1h ago

This is excellent for those of us who are building local AIs.

elfenleid•1h ago

That's a great point! And also works really well for shared context between claude instances, for example, we use that for our business model in the company, all business rules and model is stored as memories in a central redis that the mcp connects to. The way that memories are stored are specific to a folder or global (similar to CLAUDE.md home directiory), but with this approach you can have an external redis where multiple claudes read and write into as a shared almost hive like memory.

otterley•1h ago

Does it work with Valkey as well?

elfenleid•1h ago

Yep! Valkey should work fine.

Recall just uses basic Redis commands - HSET, SADD, ZADD, etc. Nothing fancy.

Valkey is Redis-compatible so all those commands work the same.

I haven't tested it personally but there's no reason it wouldn't work. The Redis client library (ioredis) should connect to Valkey without issues.

If you try it and hit any problems let me know! Would be good to officially support it.

h1fra•1h ago

I'm not super familiar with context and "memory", but adding context manually or via memory doesn't end up consuming context length either way?

elfenleid•1h ago

Yeah it still uses context but way more efficiently, instead of injecting a 50KB context.md every time, Recall searches 10k memories and only injects the top 5 relevant ones (maybe 2KB), so you can store way more total knowledge.

alecco•1h ago

Why not just ask CC to write a prompt or Markdown file to re-start the conversation in a new chat?

elfenleid•1h ago

Yeah people do that but it doesn't scale, after a while your "restart prompt" is 50KB and won't fit, plus you're stuck copying stuff manually instead of just asking "what did we say about Redis" and getting the relevant bits automatically.

the_arun•1h ago

I wish there was a way to send compressed context to LLMs instead of plain text. This will reduce token size, performance & operational costs.

joshstrange•44m ago

> This will reduce token size, performance & operational costs.

How? The models aren't trained on compressed text tokens nor could they be if I understand it correctly. The models would have to uncompress before running the raw text through the model.

the_arun•40m ago

That is what I am looking for. a) LLMs are trained using compressed text tokens and b) use compressed prompts. Don't know how..but that is what I was hoping for.

iambateman•1h ago

I’ve started asking Claude to write tutorials that live in a _docs folder alongside my code.

Then it can reference those tutorials for specific things.

Interested in giving this a shot but it feels like a lot of infrastructure.

zzzeek•20m ago

Yeah this is what I do, you want the knowledge in md files , but currently you don't want to stuff up the context with everything you know every time. I may be wrong here but my impression is the way that "context" is special and very limited in size vs "things the LLM is trained on" is still an unsolved problem getting AI to act like an "assistant" , AFAICT.

asdev•1h ago

The problem is you need to tell prompt Claude to "Store" or "Remember", if you don't it will never call the MCP server. Ideally, Claude would have some mechanism to store memories without any explicit prompting but I don't think that's currently possible today.

jMyles•1h ago

Heh, I'm building the same thing this week (albeit with postgres rather than redis). I bet like 15% of the people here are.

iamleppert•1h ago

I'm not seeing how this is any different than a standard vector database MCP tool. It's not like Claude is going to know about any of the things you told it to "remember" unless you explicitly tell it to use its memory tool like shown in the demo, to remember something you've stored.

bananapub•54m ago

how did you benchmark this against much less convoluted solutions, like "a text file"?

how much better was this to justify all that extra complexity?

datadrivenangel•38m ago

How does Claude know when to try and remember?

Often memory works too well and crowds out new things, so how are you balancing that?

daxfohl•31m ago

I'm surprised Anthropic doesn't offer something like this server-side, with an API to control it. Seems like it'd be a lot more efficient than having client manually reworking the context and uploading the whole thing.

ryan29•16m ago

Who should own the context?

Imagine having 20 years of context / memories and relying on them. Wouldn't you want to own that? I can't imagine pay-per-query for my real memories and I think that allowing that for AI assisted memory is a mistake. A person's lifetime context will be irreplaceable if high quality interfaces / tools let us find and load context from any conversation / session we've ever had with an LLM.

On the flip side of that, something like a software project should own the context of every conversation / session used during development, right? Ideally, both parties get a copy of the context. I get a copy for my personal "lifetime context" and the project or business gets a copy for the project. However, I can't imagine businesses agreeing to that.

If LLMs become a useful tool for assisting memory recall there's going to be fighting over who owns the context / memories and I worry that normal people will lose out to businesses. Imagine changing jobs and they wipe a bunch of your memory before you leave.

We may even see LLM context ownership rules in employment agreements. It'll be the future version of a non-compete.

gmerc•20m ago

Every single persistent memory feature is a persistence vector for prompt injection.

Enshitification [YouTube] [video]

The Scaling Era: An Oral History of AI, 2019–2025

Glue raises $20M Series A for agentic team chat

Hacking GTA V RP Servers Using Web Exploitation Techniques

Rendu: A JavaScript Hypertext Preprocessor

Show HN: Magic Vizion – highlight anything, visualize instantly with one click

Show HN: KI Song Erstellen Kostenlos – AI Music Generator FüR Deutsche Musik

SoftBank to buy ABB robotics unit for $5.4B as it boosts its AI play

Building What Matters in Product and Experience

Microsoft's Fluid Icons, Figma's ChatGPT Diagrams and Okay DEV's Creative Beta

Women portrayed as younger than men online, and AI amplifies the bias

Show HN: Solving the cluster 1 problem with vCluster standalone

What fully automated firms will look like

Doctorow: American Tech Cartels Use Apps to Break the Law

Show HN: I built a local-first podcast app

Rebuild the World

Major protests against corruption in the Philippines

3rd Circuit: CFAA Does Not Turn Workplace Policy Infractions into Federal Crimes [pdf]

From Zero Code to Live DApp: Why We Built an AI Launchpad for Web3 Founders

What RSS is and why we should keep using it (2022)

An Event Mikeal Would Have Liked

Show HN: Autocache – Cut Claude API costs 90% (for n8n, Flowise, etc.)

JSON River – Parse JSON incrementally as it streams in

Ask HN: How can we force a gcloud billing support ticket to be escalted

Data-driven fine-grained region discovery in the mouse brain with transformers

Show HN: Sarge Launcher – A Quake 3 Arena Utility

New Home for React and React Native

Improving Clinical Trial Design

Show HN: Open-source Unity MCP for game developers

To Free or Not to Free?

Show HN: Recall: Give Claude perfect memory with Redis-backed persistent context

Comments

Enshitification [YouTube] [video]

The Scaling Era: An Oral History of AI, 2019–2025

Glue raises $20M Series A for agentic team chat

Hacking GTA V RP Servers Using Web Exploitation Techniques

Rendu: A JavaScript Hypertext Preprocessor

Show HN: Magic Vizion – highlight anything, visualize instantly with one click

Show HN: KI Song Erstellen Kostenlos – AI Music Generator FüR Deutsche Musik

SoftBank to buy ABB robotics unit for $5.4B as it boosts its AI play

Building What Matters in Product and Experience

Microsoft's Fluid Icons, Figma's ChatGPT Diagrams and Okay DEV's Creative Beta

Women portrayed as younger than men online, and AI amplifies the bias

Show HN: Solving the cluster 1 problem with vCluster standalone

What fully automated firms will look like

Doctorow: American Tech Cartels Use Apps to Break the Law

Show HN: I built a local-first podcast app

Rebuild the World

Major protests against corruption in the Philippines

3rd Circuit: CFAA Does Not Turn Workplace Policy Infractions into Federal Crimes [pdf]

From Zero Code to Live DApp: Why We Built an AI Launchpad for Web3 Founders

What RSS is and why we should keep using it (2022)

An Event Mikeal Would Have Liked

Show HN: Autocache – Cut Claude API costs 90% (for n8n, Flowise, etc.)

JSON River – Parse JSON incrementally as it streams in

Ask HN: How can we force a gcloud billing support ticket to be escalted

Data-driven fine-grained region discovery in the mouse brain with transformers

Show HN: Sarge Launcher – A Quake 3 Arena Utility

New Home for React and React Native

Improving Clinical Trial Design

Show HN: Open-source Unity MCP for game developers

To Free or Not to Free?