frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Claude, please stop trying to memorize random crap

https://12gramsofcarbon.com/p/agentics-memorizing-session-transcripts
83•theahura•2h ago

Comments

bigyabai•59m ago
Settings > Capabilities > "Generate memory from chat history"

Toggle it off and never think about it again.

chopete3•51m ago
>> We don't really write code by hand anymore.

The software world is very close to building a super intelligent senior software developer. Companies like this will ask all the best things a software engineer does automatically. Now claude will add it into the coding agents itself.

Damn, I didn't see this coming.

Its first the build the intelligent builder. We will figure out what we want to build later.

jmalicki•48m ago
> We will figure out what we want to build later.

Once the automator automates itself fast enough, we won't have the ability to opine what gets built. The LLM will decide. Just like right now sometimes LLMs delete tests so they pass, they could just delete humanity if humans get in their way.

otabdeveloper4•44m ago
> The software world is very close to building a super intelligent senior software developer.

Yeah. Two more weeks, as they say. Just need to iron out some kinks.

andai•5m ago
It's the error rate. That's what everyone found when they were trying to go Full Auto with OpenClaw in February.

You can rely on it like 95% of the time but that means if you keep it running continuously the error rate rapidly approaches 100%. That's getting a little better with each release, and it might actually hit the point where you can more or less trust it indefinitely (on well defined workflows).

Or at least it would, if context window permitted...

rvz•31m ago
> The software world is very close to building a super intelligent senior software developer. Companies like this will ask all the best things a software engineer does automatically. Now claude will add it into the coding agents itself.

Except Claude is more expensive than an actual senior software developer. Otherwise, why are many companies terrified of the usage bill that gets printed on the invoice?

The nonsense in "tokenmaxxing" was a complete marketing scam and illusion of cheap tokens which in reality were heavily subsidized.

The entire point is detecting bad code before it reaches production. [0] AI generated or not.

[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

beepbooptheory•49m ago
There has been this slow transition inside me, as someone who likes to not touch the AI as much as possible, where I've gone from skeptical and argumentative about it all to starting to just feel sad for all the Claude et al heads. Like, this is such a ridiculous house of cards you have to deal with all the time, which isn't even directly concerning the task at hand, presumably. Like you're cooking yourself a meal but its just nuking a burrito and then still somehow needing to wash the dishes for an hour.

Not that this isolated article is super damning or anything, but the accumulated set of all these reports has left me only empathetic, I think, of these other devs. Like, I just want to tell them, "it can be ok, it doesn't need to be like this.."

andai•8m ago
I've been having a very nice time with Fable. I cooked up an Anki clone in like half an hour, with tech it's not familiar with. Nothing too ground breaking, but I was very pleased!

I think Opus might be on similar level for most of what I'm doing, but I haven't used it much recently, so I can't remember the difference. So I guess I'll find out on the 7th when they pull the plug again! (Free-ish trial of Fable ending.)

That being said, I tried using other frontier models to help with a Pong clone the other day and they were introducing new bugs at approximately the same rate as they were fixing it. On Pong!! I found that amusing because I couldn't think of a simpler game, so it didn't inspire confidence.

Fable's doing just fine on an online multiplayer game though. I have no idea how that works. (Maybe it would fail Pong too?? I haven't tested that!)

general_reveal•49m ago
Isn’t this just a form of the bitter lesson? Our attempts to make engineered context and agents will simply be made obsolete with bigger and better models. Those transcripts are probably extremely useful for lesser capable models, and near unnecessary for frontier ones, maybe?
andai•16m ago
Yeah, the question is whether this applies to all of context management.

I've been using a custom harness based on https://minimal-agent.com/ (itself based on swe-mini-agent), which is like 50 lines for the core logic. Bash is all you need.

For small tasks, I find it's about 8x faster (and uses 8x fewer tokens) than the standard harness for each model.

For bigger tasks I haven't tested it much. It seems to work too but I think they're a bit less focused and productive in that case. It could be that those big harnesses' 20k token system prompts are doing something important with regard to steering software development workflows. (e.g. I heard Fable has a custom system prompt in Claude Code which might explain its markedly more proactive behavior.)

So I want to say there's still a lot of value in context engineering though it seems to diminish with each model release (since they're fine tuned on mostly non stupid behavior and need less hand holding).

HarHarVeryFunny•9m ago
I don't think so - I think we'll find that to build a brain you need more built-in structure and biases, not less.

Bear in mind that brain architecture is learnt too - just over a much longer timescale than an individual lifetime.

wongarsu•43m ago
I agree with the take not to bother with a sophisticated memory system. Anything worth remembering should be in docs, guides, source comments, commit messages or tickets. You don't need another layer, every conceivable granularity is already covered by existing best practices
throwup238•36m ago
Especially a layer that is largely out of band in a project (i.e. ~/.claude/…). In any project where I’ve needed memory I just add a line to AGENTS.md telling it to use MEMORY.md to save memories or STATUS.md to track progress.
andai•21m ago
I've been enjoying having a little todo file the agent updates as it goes along, because then I can keep track of progress without scrolling through aeons of "Combobulating..."

Also if context runs out you can just do "cat todo.md | agent" and you're off to the races again.

sdesol•24m ago
> You don't need another layer

I do think we need another layer, but it should be a routing layer. I am finalizing my pi-brains extension for Pi (https://github.com/earendil-works/pi) which does this:

https://github.com/gitsense/pi-brains

Right now "humans" need to define the routing rules for how to access information, but I will support what I call "knowledge agents" that can monitor conversations to inject context when needed.

CuriouslyC
dofm•42m ago
Blog posts like this just blow me away.

> I believed this so strongly that my company built an entire product around this concept. I used to tell folks that "session transcripts were the new oil," that they were more valuable than the code itself.

> […]

> We don't really write code by hand anymore.

Honestly, isn't this just influencer spam? What possible value is there in reading about people who used to have products, but no longer write their own code, complaining about the inscrutable prediction machine they have handed that job and their livelihoods to?

Like, if you have complaints about the thing, perhaps you should address them to your supplier directly. None of your readers can help, and nobody's magic folk solution to your problem is better than yours.

And there are so many of these sorts of posts. Are we not entirely cooked?

(I think I have concluded that if people writing about AI aren't writing about interesting things they have achieved with small, local LLMs — which for clarity I am fully interested in reading - then I'm done reading. This whole blogging-about-cloud-AI genre is just weird and irresponsible now)

fortyseven•34m ago
The Blog Police have spoken, folks. No more talking about what you like in your own blog without passing it through the approved discussion censor.
dofm•26m ago
I'm not the Blog Police, I'm a very naughty boy.

I have opinions people apparently don't like, for no subscriber money.

LPisGood•33m ago
I have to ask: do you still write a lot of code yourself? I and most people I know do not.
walt_grata
semiquaver•42m ago
Strongly agree here. claude-code’s memory system is occasionally useful but much more often harmful, pulling in obsolete info that muddies the waters about current tasks. I’ve frequently seen Claude’s own memories severely mislead it.

My guess is that has something to do with the training process leaving models unable to differentiate between “what’s happening now” and “what happened before”. Perhaps if making inferences from memories was actually part of the training process things would be different but my sense is that as an inference-time-only feature this just gets the models confused.

pennomi•6m ago
Humans make memories constantly, but they also forget things that are no longer relevant. Until Claude can do that, it means the LLM will have an ever-increasing, ever more fragmented context.

And LLMs are NOT intelligent enough to survive even mild context poisoning.

oefrha•42m ago
I have this in my global CLAUDE.md after being annoyed by all the random crap memories.

> Don't start generating an auto-memory entry before asking me. Ask first, write only if I confirm — no speculative drafting.

No more crap after this.

Incidentally I don’t recall Opus 4.8 asking me once in the past few weeks. Older models did ask semi-frequently.

aranw•39m ago
t once had to tell claude 3-4 times to stop assuming the state of a system was the way it kept iterating it was cause it was in it's memory. I repeatably told it to otherwise and it just never updated it's memory and instead kept referencing it's memory about the state of a particular system
syntheticcdo•20m ago
Did you try to delete the memory yourself?
andai•12m ago
In my harness I have all the code auto injected at startup (doing mostly very small codebases).

I found that every model will still manually check every file/function, they immediately assume that anything in context is stale.

That's sensible because often the user edits stuff while they're running.

What it does is save it from having to grep blindly about the codebase. But I think I'd get roughly the same benefit by just dumping the function headers then.

zahirbmirza•38m ago
Even with memory off this occurs within a conversation.

It is like an annoying friend, who remembers something from a past conversation, that you have grown and developed from, but they still want to hold it against you.

trjordan•31m ago
It's because it mostly doesn't matter what you are trying to get the code to do. What matters is what the code does.

Session logs can absolutely be useful, but not when building further. It's just that that the place they slot in is during validation. You know, that place between the markdown plan and CI passing, where there's 800 new lines of code and it all seems sort of fine when you click around?

Session logs can show you what sort of manual validation happened. CI will run the tests you had, and the code will show you what new unit tests were added, but session logs can show you that the agent drove the app with Playwright, or that the agent read and considered the prod config as well as the dev config.

Nothing bulletproof, but not every piece of validation work merits a test in the repo that lives forever. We've gotten a lot of mileage out of re-analyzing the sessions, figuring out where the agent made decisions without asking, and forcing the agent to consider validation for those decisions. That's the sort of thing that's hard to dictate up front but easy to highlight with the session logs.

Fabricio20•29m ago
I specifically disabled claude memory in a project because it kept writing down thigns to memory that didn't need to be in memory, including severly wrong statements that then would confuse it later. At some point it got re-enabled automatically which had me ask claude itself to "turn it the fuck off" by which it promptly figured out that both ("autoMemoryEnabled": false, "autoDreamEnabled": false) are necessary and need to be at the user home settings, not in a project override (which is what I had with the original setup that eventually got ignored by a CC update).

I agree with other commenters here, if anything is worth being rememebered, it will be in code comments, git commit messages, CLAUDE.md or other formal documentation. The auto memory system just causes confusion and leaves stale and outdated information written down.

Its an interesting thought experiment as well, I originally thought that having the model write down memory files by itself would be a nice addition, but after playing around with it, it became clear to me that good as an idea turns out bad in practice because the model can't correctly gauge what deserves being stored as a memory.

andai•14m ago
> "turn it the fuck off" -> "autoDreamEnabled": false

So you told it don't go the fuck to sleep ;)

estetlinus•9m ago
Ugh, agentic _dreaming_. Why on earth would I want that?
grimcompanion•26m ago
> I believed this so strongly that my company built an entire product around this concept. I used to tell folks that "session transcripts were the new oil," that they were more valuable than the code itself.

This is infuriatingly common wrt talking/writing about how to use AI effectively. All of the "this is how you write an AGENTS.md" and "you need to talk to it like X to optimize it". Like sure, you can believe that as much as you want but unless you provide some evidence you can keep your shitty CLAUDE.md to yourself and don't pollute the whole company's git repo, thanks.

estetlinus•6m ago
When nobody actually knows (how to write a CLAUDE.md), everyone’s an expert. Infuriating, indeed. Even more so when people vibe code those files without proofreading.
mastax•23m ago
I found that if you allow any low value things into memory, Claude will notice that established pattern and start trying to add low value memories at an ever increasing pace.
charcircuit•18m ago
>We have found zero performance benefit on SWE tasks when agents have search access to their previous transcript sessions

I refuse to believe this is true. The ability for an agent to find information from before a compaction is incredibly useful. At compaction time it's impossible to know what exactly may be still needed.

linsomniac•18m ago
I like the memory system, in general. For reference I'm using mostly Opus 4.8 + Max effort. It will often pull things out of memory that are relevant. Like I'll ask it to come up with a few options I should consider for, say, a self-hosted OIDC provider and it'll say things like "Considering the size of your operations team, this might be a better fit because of X and Y".

Now, I'll agree that this is probably the sort of thing I should put in the CLAUDE.md, but in this case it wasn't on my radar to put that in my CLAUDE.md, so it was nice that it surfaced that.

It does sometimes go awry though. Today I was asking about a problem I was having authenticating, and it said "you may be running into this trusted proxy setting because you put your apps behind an haproxy". That is true of 95% of our apps, so it was worth mentioning, but in this case it was not so I had to correct it. But, I'm glad it mentioned it because if we did have it proxied it could have saved me a lot of time.

estetlinus•10m ago
I remember when OpenAI announced ChatGPT now will remember stuff between sessions. Oh, you mean find random trivia about me and copy paste it between prompts without out my explicit consent.

”compare these three cars. Oh btw I am a data engineer, and my moms maiden name is Joana, and I am allergic to bad poetry. And code should be DRY, I prefer SQL over Python and what’s the most poisonous flower in Scandinavia?”.

I’ve had so much wierd output because context is ”””memorized””” and bleeding into completely unrelated projects and conversations. It’s the first feature I turn off.

saagarjha•6m ago
I mean, it’s pretty clear the people who work on Claude Code aren’t actually looking at what they’re implementing. The thought behind this feature seems like it goes nowhere beyond “oh wouldn’t it be nice if Claude could remember things about you? Ok Claude go implement this” and nobody bothered to see if it was useful or helpful.
•
16m ago
There is some value to agents being able to query the history of work done, docs aren't a good place to accumulate negative evidence for example, but it can be tagged in traces so that it's efficient to look up as needed. Additionally, docs rot while traces can be tagged with commit hashes and other things that make their lifetime clearer.
sdesol•5m ago
The user flow I am trying to get adopted for sessions is to turn them into notes and lessons when you have finished and it should be part of the code review process.

By propery categorizing lessons and notes, it should make it easy to scrub and keep up to date.

I also suggest mapping lessons and notes to files when possible to make discovery and cleanup easier.

•
29m ago
I write code by hand every day. I do the main part of the feature implementation myself and leave comments for the code i want the agent to write. I have some skills and a command that sets the stage to get the agent to fill in the rest
dofm•27m ago
I am a freelancer recovering from severe burnout so the answer is a sort of irrelevant no.

I'm trying to rebuild my life so I am in an experimenting and learning phase rather than a massive coding phase, and most of my code work is maintenance of things I have built. That which I do code, I am still coding by hand, though I am dealing with other people's Claude output and I am really unimpressed by it. It's often rather crass.

But I would say to you that if you personally don't write code now but you do have a dependency on one of two presumably unprofitable cloud AI providers, aren't you in trouble? How is this not a three-alarm fire for you?

andai•22m ago
Worst case scenario you just switch to a free model, which are 2025-ish in quality.
dofm•17m ago
The open weights models I am interested in, and testing, learning, experimenting with etc.; I am confused and cynical, not insane.

I am not convinced it isn't vulnerable to the same problems but the whole tenor of the community around open source/open weights models just doesn't have the same YOLO madness to it.

estearum•18m ago
> That which I do code, I am still coding by hand, though I am dealing with other people's Claude output and I am really unimpressed by it. It's often rather crass.

Unfortunately the point of code is rarely to impress people (certainly not other engineers) or to avoid being "crass." 99.99% of code exists to achieve business outcomes, and velocity matters a lot in many contexts. A lot more than elegance or impressiveness.

The platform risk is a valid concern but alleviated by China's theft and redistribution of open models.

dofm•16m ago
I'm not talking about impressing people.

We used to be concerned about code quality. Are we not anymore?

Crassness was a signal. Still is, to me — in a human I find that people who write crass code are going to cause me trouble.

estearum•13m ago
"Code quality" encompasses a lot of dimensions, one of which is impressing your colleagues, and many of which there's virtually no reason to care about now.
Arainach•5m ago
On the contrary, it's more important than ever. With ever more code being generated, it's essential that the code be understandable and maintainable - by human and machine.
pydry•9m ago
Nobody cares about code quality /s

They only care about the things which you can only get with good code quality like reliability and speed of development.

vidarh•17m ago
Personally I use 5 different model families, 3 of which are open weights with 3rd party inference providers (GLM, DeepSeek, Kimi), so if the frontier labs were to shut down it'd be a nuisance, nothing more.
andai•24m ago
I force myself to do it at least once a week, you know, like cardio. Keeps the doctor away.
Ronsenshi•16m ago
I am. I have Codex running, doing some tasks which I don't care much about, but anything I want to understand I write myself.

Same thing with hobby projects - I might ask ChatGPT or Gemini some questions about best practices in Swift for example, but writing code is done by hand.

As others said - if you don't use it, you'll lose it. And I'd rather keep my skills up to date.

hirako2000•7m ago
You have the privilege to keep yourself sharp, most businesses favor productivity over their workers' long term relevancy.
petcat•5m ago
I write my API specs, my domain models, Postgres schema, and SQL queries myself. Then I'll have AI bots fill in the application details connecting those things since that's mostly boilerplate once you lock in the data structure, query patterns, and API contract.

I never have AI generating database table schemas or really the shape of my data at all.

AlotOfReading•13m ago
Of course? I'm still better than sonnet or opus, just slower and much more expensive.

Sometimes it takes me a day or more to find the one line fix or abstraction necessary, while claude can hammer through a hundred line fix in under an hour.

LastTrain•12m ago
I still write code and sometimes it works well. I also use Claude and it writes code and sometimes that goes well. We have better success together, where I do the interesting stuff and let Claude write my unit tests, reconcile my documentation. That is to say, I’m using it for quality not quantity. There aren’t enough humans to deploy or consume all the sloppy shit it could write on its own.
ungreased0675•24m ago
It reminds me of the peak crypto days. Lots of resources consumed, many late nights, little to no value created.
general_reveal•17m ago
Look man, I’ve got a MMO that I’m working on that’s set in 2014 where everyone is a programmer in SV. It’s a period piece. I NEED as much blog training data of this type so that my NPCs can talk in a historically accurate way (god bless Medium.com, a historical treasure trove of a bygone medieval era).

It’s gonna be a living breathing world, you see. You’re going to be like “omg, this game even accurately captured the blog posts, woah”.

dofm•14m ago
I … I… don't want to play this, thanks ;-)
general_reveal•13m ago
It’s the only way you’ll ever be able to pretend to be a programmer again though.
dofm•12m ago
Oh god, I just realised this really is the logical parallel to all those TV crime dramas set in the early 1900s.
general_reveal•9m ago
It’ll be the programmers version of those civil war reenactments.
bryanrasmussen•10m ago
The perfect world was a dream that your primitive cerebrum kept trying to wake up from. Which is why the Matrix was redesigned to this: the peak of your civilization. I say your civilization, because as soon as we started thinking for you it really became our civilization, but the peak of your civilization was an MMO where everyone is a programmer in SV.
goostavos•10m ago
>session transcripts were the new oil

Something about this idea really resonates with certain personality types. I equate it to the Zettelkasten hype phase from several years ago. People (...like me..) got really wrapped up in the belief that the process was more important that the content. "Linking" was an "activity." Something good will happen as long as you (a) take notes on stuff and (b) link them to other notes on stuff.

You see the same thing with the session transcripts people. They're building ever more sophisticated setups of indexing and storing and cross referencing every conversation they've ever had on the (I would argue) mistaken belief that the transcripts are the valuable part, rather than the uncomfortable part where you go do something. A lot of it, I say from falling in the trap, is fancy procrastination.

(Although, I have found myself jealous on many occasions where their fancy system retrieves something they vaguely recall from a conversation they had 3 months ago. So, who knows.)

micromacrofoot•9m ago
Occasionally posts like this do get the attention of the company responsible, more than an email does... but indeed that's like a one in a million situation

Claude, please stop trying to memorize random crap

https://12gramsofcarbon.com/p/agentics-memorizing-session-transcripts
88•theahura•2h ago•67 comments

The Life and Times of Maxis, Part 1: SimEverything

https://www.filfre.net/2026/07/the-life-and-times-of-maxis-part-1-simeverything/
50•doppp•2h ago•0 comments

Half-Baked Product

https://weli.dev/blog/half-baked-product/
1001•weli•9h ago•296 comments

Jamesob's guide to running SOTA LLMs locally

https://github.com/jamesob/local-llm
80•livestyle•2h ago•31 comments

International chess federation sanctions Kramnik

https://www.fide.com/fide-ethics-disciplinary-commission-issues-a-decision-in-case-involving-gm-v...
22•DarkContinent•57m ago•8 comments

Factories Are Just Rooms

https://interconnected.org/home/2026/07/03/factories
65•arbesman•2h ago•23 comments

Hunting a 16-year-old SQLite WAL bug with TLA+

https://ubuntu.com/blog/hunting-a-16-year-old-sqlite-bug-with-tla-is-dqlite-affected
74•peterparker204•3d ago•2 comments

PostgreSQL and the OOM Killer: Why We Use Strict Memory Overcommit

https://www.ubicloud.com/blog/postgresql-and-the-oom-killer-why-we-use-strict-memory-overcommit
102•furkansahin•5h ago•32 comments

My Dad Helped Build North America's Oat Supply Chain: Can It Be Remade?

https://ambrook.com/offrange/perspective/how-we-lost-our-oats
36•surprisetalk•3d ago•2 comments

Valve open source the Steam Machine e-ink screen so you can make your own

https://www.gamingonlinux.com/2026/07/valve-open-source-the-steam-machine-e-ink-screen-so-you-can...
352•ahlCVA•4h ago•55 comments

The Fall and Rise of Screwworm

https://www.construction-physics.com/p/the-fall-and-rise-of-screwworm
75•crescit_eundo•5h ago•25 comments

Wordgard: The new in-browser rich-text editor from the creator of ProseMirror

https://wordgard.net/
175•indy•9h ago•71 comments

Best Simple System for Now

https://dannorth.net/blog/best-simple-system-for-now/
38•daan-k•2h ago•7 comments

Right to Local Intelligence

https://righttointelligence.org/
442•thoughtpeddler•18h ago•155 comments

Give Smart People the Tools to Do Smart Things

https://superuserdone.com/posts/2026-07-03-give-smart-people-the-tools/
66•SuperUserDone•3h ago•47 comments

America, 1926: What a Forgotten 100-Year-Old Report Says About Who We Are

https://www.derekthompson.org/p/america-1926-an-absurdly-deep-dive
76•momentmaker•2h ago•74 comments

Supersonic flight returning to US after half-century ban

https://www.forbes.com/sites/suzannerowankelleher/2026/06/30/faa-supersonic-flight-no-boom/
97•lobbly•2d ago•99 comments

CarPlay Is Additive

https://www.caseyliss.com/2026/7/2/carplay-is-additive-you-dolts
503•sprawl_•16h ago•644 comments

Anatomy of Persistent Memory's 3 Layers: Comparing ContextNest, Mem0 and Zep

https://promptowl.ai/resources/persistent-memory-ai-agents/
17•sparkystacey•3h ago•0 comments

Show HN: Mcpsnoop – Wireshark for MCP (transparent proxy and live TUI)

https://github.com/kerlenton/mcpsnoop
3•kerlenton•1h ago•0 comments

The Safari MCP server for web developers

https://webkit.org/blog/18136/introducing-the-safari-mcp-server-for-web-developers/
220•coloneltcb•16h ago•63 comments

60% Fable cost cut by converting code to images and having the model OCR it

https://github.com/teamchong/pxpipe
43•dimitropoulos•2h ago•16 comments

How working with a blind client revealed invisible accessibility gaps

https://iinteractive.com/resources/blog/read-only
76•fortyseven•3d ago•58 comments

US residents angry datacenters 'shoved down our throats' are recalling officials

https://www.theguardian.com/us-news/2026/jul/03/datacenter-recall-elections
40•beardyw•1h ago•16 comments

crustc: entirety of `rustc`, translated to C

https://github.com/FractalFir/crustc
360•Philpax•19h ago•81 comments

Commodore 64 Basic for PostgreSQL

https://thombrown.blogspot.com/2026/07/load-plcbmbasic81-commodore-64-basic.html
51•hans_castorp•8h ago•8 comments

Reality has a surprising amount of detail (2017)

https://johnsalvatier.org/blog/2017/reality-has-a-surprising-amount-of-detail
347•vinhnx•5d ago•131 comments

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

https://arxiv.org/abs/2607.02512
28•simonpure•5h ago•4 comments

Markets are competitive if and only if P != NP

https://arxiv.org/abs/2602.20415
178•kscarlet•2h ago•115 comments

Quake in 13 Kilobytes (2021)

https://js13kgames.com/games/q1k3
124•mortenjorck•6d ago•18 comments