Maintaining the memory is a considerable burden, and make sure that simple "fix this linting" doesn't end up in the memory, as we always fix that type of issue in that particular way. That's also the major problem I have with ChatGPT's memory: it starts to respond from the perspective of "this is correct for this person".
I am curious who sees the benefits of the memory in coding? Is it like "learns how to code better" or it learns "how the project is structured". Either way, to me, this sounds like an easy project setup thing.
I think similar concepts apply to coding - in some cases, you have all the context you need up front (good coding practices help with this), but in many cases, there's a lot of "tribal knowledge" scattered across various repos that a human vet working in the org would certainly know, but an agent wouldn't (of course, there's somewhat of a circular argument here that if the agent eventually learned this tribal knowledge, it could just write it down into a CLAUDE.md file ;)). I think there's also a clear separation between procedural knowledge and learned preferences, the former is probably better represented as skills committed to a repo, vs I view the latter more as a "system prompt learning" problem.
An example of how this kind of memory can help is learned skills https://www.letta.com/blog/skill-learning - if your agent takes the time to reflect/learn from experience and create a skill, that skills is much more effective at making it better next time than just putting the raw trajectory into context.
A local, project specific llm.md is absolutely something I require though. Without that, language models kept on "fixing" random things in my code that it considered to be incorrect, despite comments on those lines literally telling it to NOT CHANGE THIS LINE OR THIS COMMENT.
My llm.md is structured like this:
- Instructions for the LLM on how to use it
- Examples of a bad and a good note
- LLM editable notes on quirks in the project
It helps a lot with making an LLM understand when things are unusual for a reason.
Besides that file, I wrap every prompt in a project specific intro and outro. I use these to take care of common undesirable LLM behavior, like removing my comments.
I also tell it to use a specific format on its own comments, so I can make it automatically clean those up on the next pass, which takes care of most of the aftercare.
On first glance, of course it's something we want. It's how we do it, after all! Learning on the job is what enables us to do our jobs and so many other things.
On the other hand humans are frustratingly stuck in their ways and not all that happy to change and that is something that societies or orgs fight a lot. Do I want to convince my coding agent to learn new behavior, conflicting with existing memory?
It's not at all obvious to me in how far memory is a bug or a feature. Does somebody have a clear case on why this is something that we should want and why it's not a problem?
For coding agents, I think it's clear that nobody wants to repeat the same thing over an over again. If a coding agent makes a mistake once (like `git add .` instead of manually picking files), it should be able to "learn" and never make the same mistake again.
Though I definitely agree w/ you that we shouldn't aspire to 1:1 replicate human memory. We want to be able to make our machines "unlearn" easily when needed, and we also want them to be able to "share" memory with other agents in ways that simply isn't possible with humans (until we all get neuralinks I guess)
I'd be interested in hearing about how this approach compares with Beads [1].
Letta's memory system is designed off the MemGPT reference architecture, which is intentionally very simple - break the system prompt up into "memory blocks" (all pinned to the context window, since they are injected in system, which are modifiable via memory tools (the original MemGPT paper is still a good reference for what this looks like at a high level: https://research.memgpt.ai/). So it's more like a "living CLAUDE.md" that follows your agent around wherever it's deployed - ofc, it's also interoperable with CLAUDE.md. For example, when you boot up Letta Code and run `/init`, it will scan for AGENTS.md/CLAUDE.md, and will ingest the files into its memory blocks.
LMK if you have any other questions about how it works happy to explain more
I guess the main potential point of confusion would arise if it's not clear to the LLM / agent which tool should be used for what. E.g. if you tell your agent to use Letta memory blocks as a scratchpad / TODO list, that functionality overlaps with Beads (I think?), so it's easy to imagine the agent getting confused due to stale data in either location. But as long as the instructions are clear about what context/memory to use for what task, it should be fine / complementary.
Firefox says it can't play it.
I'd download and check it with ffprobe, but direct downloads seem blocked.
pacjam•6h ago
If you're a Claude Code user (I assume much of HN is) some context on Letta Code: it's a fully open source coding harness (#1 model-agnostic OSS on Terminal-Bench, #4 overall).
It's specifically designed to be "memory-first" - the idea is that you use the same coding agents perpetually, and have them build learned context (memory) about you / your codebase / your org over time. There are some built-in memory tools like `/init` and `/remember` to help guide this along (if your agent does something stupid, you can 'whack it' with /remember). There's also a `/clear` command, which resets the message buffer, but keeps the learned context / memory inside the context window.
We built this for ourselves - Letta Code co-authors the majority of PRs on the letta-code GitHub repo. I personally have been the same agent for ~2+ weeks (since the latest stable build) and it's fun to see its memory become more and more valuable over time.
LMK if you have any q's! The entire thing is OSS and designed to be super hackable, and can run completely locally when combined with the Letta docker image.
koakuma-chan•5h ago
pacjam•5h ago
If you're asking about why Letta Code isn't on the leaderboard, the TBench maintainers said it should be up later today (so probably refresh in a few hours!). The results are already public, you can see them on our blog (graphs linked in the OP). They are also verifiable, all data is available for the runs + Letta Code is open source, so you can replicate the results yourself.
koakuma-chan•5h ago
pacjam•5h ago
koakuma-chan•4h ago
pacjam•4h ago
shortlived•2h ago
pacjam•2h ago
More on permissions here: https://docs.letta.com/letta-code/permissions
Install is just `npm install -g @letta-ai/letta-code`