frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Claude's memory architecture is the opposite of ChatGPT's

https://www.shloked.com/writing/claude-memory
94•shloked•2h ago

Comments

richwater•1h ago
ChatGPT is quickly approaching (perhaps bypassing?) the same concerns that parents, teachers, psychologists had with traditional social media. It's only going to get worse, but trying to stop the technological process will never work. I'm not sure what the answer is. That they're clearly optimizing for people's attention is more worrisome.
WJW•1h ago
Seems like either a huge evolutionary advantage for the people who can exploit the (sometimes hallucinating sometimes not) knowledge machine, or else a huge advantage for the people who are predisposed to avoid the attention sucking knowledge machine. The ecosystem shifted, adapt or be outcompeted.
aleph_minus_one•47m ago
> Seems like either a huge evolutionary advantage for the people who can exploit the (sometimes hallucinating sometimes not) knowledge machine, or else a huge advantage for the people who are predisposed to avoid the attention sucking knowledge machine. The ecosystem shifted, adapt or be outcompeted.

Rather: use your time to learn serious, deep knowledge instead of wasting your time reading (and particularly: spreading) the science-fiction stories the AI bros tell all the time. These AI bros are insanely biased since they will likely loose a lot of money if these stories turn out to be false, or likely even if people stop believing in these science-fiction fairy tales.

visarga•1h ago
> That they're clearly optimizing for people's attention is more worrisome.

Running LLMs is expensive and we can swap models easily. The fight for attention is on, it acts like an evolutionary pressure on LLMs. We already had the sycophantic trend as a result of it.

simonw•1h ago
This post was great, very clear and well illustrated with examples.
qgin•1h ago
I love Claude's memory implementation, but I turned memory off in ChatGPT. I use ChatGPT for too many disparate things and it was weird when it was making associations across things that aren't actually associated in my life.
pityJuke•30m ago
Exactly. The control over when to actually retrieve historical chats is so worthwhile. With ChatGPT, there is some slop from conversations I might have no desire to ever refer to again.
thinkingtoilet•20m ago
It's funny, I can't get ChatGPT to remember basic things at all. I'm using it to learn a language (I tried many AI tutors and just raw ChatGPT was the best by far) and I constantly have to tell it to speak slowly. I will tell it to remember this as a rule and to do this for all our conversations but it literally can't remember that. It's strange. There are other things too.
kiitos•1h ago
> Anthropic's more technical users inherently understand how LLMs work.

good (if superficial) post in general, but on this point specifically, emphatically: no, they do not -- no shade, nobody does, at least not in any meaningful sense

kingkawn•1h ago
Thanks for this generalization, but of course there is a broad range of understanding how to improve usefulness and model tweaks across the meat populace.
omnicognate•1h ago
Understanding how they work in the sense that permits people to invent and implement them, that provides the exact steps to compute every weight and output, is not "meaningful"?

There is a lot left to learn about the behaviour of LLMs, higher-level conceptual models to be formed to help us predict specific outcomes and design improved systems, but this meme that "nobody knows how LLMs work" is out of control.

lukev•1h ago
If we are going to create a binary of "understand LLMs" vs "do not understand LLMs", then one way to do it is as you describe; fully comprehending the latent space of the model so you know "why" it's giving a specific output.

This is likely (certainly?) impossible. So not a useful definition.

Meanwhile, I have observed a very clear binary among people I know who use LLMs; those who treat it like a magic AI oracle, vs those who understand the autoregressive model, the need for context engineering, the fact that outputs are somewhat random (hallucinations exist), setting the temperature correctly...

kiitos•54m ago
> If we are going to create a binary of "understand LLMs" vs "do not understand LLMs",

"we" are not, what i quoted and replied-to did! i'm not inventing strawmen to yell at, i'm responding to claims by others!

modeless•1h ago
The link to the breakdown of ChatGPT's memory implementation is broken, the correct link is: https://www.shloked.com/writing/chatgpt-memory-bitter-lesson

This is really cool, I was wondering how memory had been implemented in ChatGPT. Very interesting to see the completely different approaches. It seems to me like Claude's is better suited for solving technical tasks while ChatGPT's is more suited to improving casual conversation (and, as pointed out, future ads integration).

I think it probably won't be too long before these language-based memories look antiquated. Someone is going to figure out how to store and retrieve memories in an encoded form that skips the language representation. It may actually be the final breakthrough we need for AGI.

ornornor•1h ago
> It may actually be the final breakthrough we need for AGI.

I disagree. As I understand them, LLMs right now don’t understand concepts. They actually don’t understand, period. They’re basically Markov chains on steroids. There is no intelligence in this, and in my opinion actual intelligence is a prerequisite for AGI.

SweetSoftPillow•58m ago
What is "actual intelligence" and how are you different from a Markov chain?
sixo•54m ago
Roughly, actual intelligence needs to maintain a world model in its internal representation, not merely an embedding of language, which is a very different data structure and probably will be learned in a very different way. This includes things like:

- a map of the world, or concept space, or a codebase, etc

- causality

- "factoring" which breaks down systems or interactions into predictable parts

Language alone is too blurry to do any of these precisely.

SweetSoftPillow•42m ago
Please check an example #2 here: https://github.com/PicoTrex/Awesome-Nano-Banana-images/blob/...

It is not "language alone" anymore. LLMs are multimodal nowadays, and it's still just the beginning.

And keep in mind that these results are produced by a cheap, small and fast model.

coldtea•11m ago
>Roughly, actual intelligence needs to maintain a world model in its internal representation

And how's that not like stored information (memories) and weighted links between them and groups of them?

ornornor•53m ago
What I mean is that the current generation of LLMs don’t understand how concepts relate to one another. Which is why they’re so bad at maths for instance.

Markov chains can’t deduce anything logically. I can.

sindercal•45m ago
You and Chomsky are probably the last 2 persons on earth to believe that.
coldtea•10m ago
It wouldn't matter if they are both right. Social truth is not reality, and scientific consensus is not reality either (just a good proxy of "is this true", but its been know to be wrong many times).
oasisaimlessly•44m ago
The definition of 'Markov chain' is very wide. If you adhere to a materialist worldview, you are a Markov chain. [Or maybe the universe viewed as a whole is a Markov chain.]
ForHackernews•50m ago
For one thing, I have internal state that continues to exist when I'm not responding to text input; I have some (limited) access to my own internal state and can reason about it (metacognition). So far, LLMs do not, and even when they claim they are, they are hallucinating https://transformer-circuits.pub/2025/attribution-graphs/bio...
coldtea•9m ago
>For one thing, I have internal state that continues to exist when I'm not responding to text input

Do you? Or do you just have memory and are run in a short loop?

creata•52m ago
> As I understand them, LLMs right now don’t understand concepts.

In my uninformed opinion it feels like there's probably some meaningful learned representation of at least common or basic concepts. It just seems like the easiest way for LLMs to perform as well as they do.

pontus•51m ago
I'm curious what you mean when you say that this clearly is not intelligence because it's just Markov chains on steroids.

My interpretation of what you're saying is that since the next token is simply a function of the proceeding tokens, i.e. a Markov chain on steroids, then it can't come up with something novel. It's just regurgitating existing structures.

But let's take this to the extreme. Are you saying that systems that act in this kind of deterministic fashion can't be intelligent? Like if the next state of my system is simply some function of the current state, then there's no magic there, just unrolling into the future. That function may be complex but ultimately that's all it is, a "stochastic parrot"?

If so, I kind of feel like you're throwing the baby out with the bathwater. The laws of physics are deterministic (I don't want to get into a conversation about QM here, there are senses in which that's deterministic too and regardless I would hope that you wouldn't need to invoke QM to get to intelligence), but we know that there are physical systems that are intelligent.

If anything, I would say that the issue isn't that these are Markov chains on steroids, but rather that they might be Markov chains that haven't taken enough steroids. In other words, it comes down to how complex the next token generation function is. If it's too simple, then you don't have intelligence but if it's sufficiently complex then you basically get a human brain.

techbruv•48m ago
I don’t understand the argument “AI is just XYZ mechanism, therefore it cannot be intelligent”.

Does the mechanism really disqualify it from intelligence if behaviorally, you cannot distinguish it from “real” intelligence?

I’m not saying that LLMs have certainly surpassed the “cannot distinguish from real intelligence” threshold, but saying there’s not even a little bit of intelligence in a system that can solve more complex math problems than I can seems like a stretch.

lyime•30m ago
How do you define "LLMs don't understand concepts"?

How do you define "understanding a concept" - what do you get if a system can "understand" concept vs not "understanding" a concept?

jjice•23m ago
That's a good question. I think I might classify that as solving a novel problem. I have no idea if LLMs can do that consistently currently. Maybe they can.

The idea that "understanding" may be able to be modeled with general purpose transformers and the connections between words doesn't sound absolutely insane to me.

But I have no clue. I'm a passenger on this ride.

coldtea•12m ago
Didn't Apple had a paper proving this very thing, or at least addressing it?
perching_aix•23m ago
They are capable of extracting arbitrary semantic information and generalize across it. If this is not an understanding, I don't know what is.
ornornor•18m ago
To me, understanding the world requires experiencing reality. LLMs dont experience anything. They’re just a program. You can argue that living things are also just following a program but the difference is that they (and I include humans in this) experience reality.
perching_aix•14m ago
But they're experiencing their training data, their pseudo-randomness source, and your prompts?
coldtea•14m ago
>They’re basically Markov chains on steroids. There is no intelligence in this, and in my opinion actual intelligence is a prerequisite for AGI.

This argument is circular.

A better argument should address (given the LLM successes in many types of reasoning, passing the turing test, and thus at producing results that previously required intelligence) why human intelligence might not also just be "Markov chains on even better steroids".

glial•14m ago
Just leaving this here:

https://ai.meta.com/research/publications/large-concept-mode...

codedokode•21m ago
You don't want an AGI. How do you make it obey?
SweetSoftPillow•1h ago
If I remember correctly, Gemini also have this feature? Is it more like Claude or ChatGPT?
extr•1h ago
They are changing the way memory works soon, too: https://x.com/btibor91/status/1965906564692541621

Edit: They apparently just announced this as well: https://www.anthropic.com/news/memory

pityJuke•30m ago
Would be very sad if they remove the current memory system for this.
jimmyl02•41m ago
This is awesome! It seems to line up with the idea of agentic exploration versus RAG which I think Anthropic leans on the agentic exploration side of.

It will be very interesting to see which approach is deemed to "win out" in the future

jiri•37m ago
I am often surprised how Claude Code make efficient and transparent! use of memory in form of "to do lists" in agent mode. Sometimes miss this in web/desktop app in long conversations.
ankit219•20m ago
The difference is implementation comes down to business goals more than anything.

There is a clear directionality for ChatGPT. At some point they will monetize by ads and affiliate links. Their memory implementation is aimed at creating a user profile.

Claude's memory implementation feels more oriented towards the long term goal of accessing abstractions and past interactions. It's very close to how humans access memories, albeit with a search feature. (they have not implemented it yet afaik), there is a clear path where they leverage their current implementation w RL posttraining such that claude "remembers" the mistakes you pointed out last time. It can in future iterations derive abstractions from a given conversation (eg: "user asked me to make xyz changes on this task last time, maybe the agent can proactively do it or this was the process last time the agent did it").

At the most basic level, ChatGPT wants to remember you as a person, while Claude cares about how your previous interactions were.

threecheese•16m ago
What are the barriers to external memory stores (assuming similar implementations), used via tool calling or MCP? Are the providers RL’ing their way into making their memory implementations better, cementing their usage, similar to what I understand is done wrt tool calling? (“training in” specific tool impls)

I am coming from a data privacy perspective; while I know the LLM is getting it anyway, during inference, I’d prefer to not just spell it out for them. “Interests: MacOS, bondage, discipline, Baseball”

Top model scores may be skewed by Git history leaks in SWE-bench

https://github.com/SWE-bench/SWE-bench/issues/465
186•mustaphah•2h ago•50 comments

Claude's memory architecture is the opposite of ChatGPT's

https://www.shloked.com/writing/claude-memory
102•shloked•2h ago•44 comments

Bulletproof host Stark Industries evades EU sanctions

https://krebsonsecurity.com/2025/09/bulletproof-host-stark-industries-evades-eu-sanctions/
120•todsacerdoti•3h ago•31 comments

Rails on SQLite: new ways to cause outages

https://andre.arko.net/2025/09/11/rails-on-sqlite-exciting-new-ways-to-cause-outages/
38•ingve•2h ago•6 comments

Unusual Capabilities of Nano Banana (Examples)

https://github.com/PicoTrex/Awesome-Nano-Banana-images/blob/main/README_en.md
23•SweetSoftPillow•45m ago•8 comments

NT OS Kernel Information Disclosure Vulnerability

https://www.crowdfense.com/nt-os-kernel-information-disclosure-vulnerability-cve-2025-53136/
85•voidsec•5h ago•21 comments

'Robber bees' invade apiarist's shop in attempted honey heist

https://www.cbc.ca/news/canada/british-columbia/robber-bees-terrace-bc-apiary-1.7627532
76•lemonberry•4h ago•48 comments

Behind the scenes of Bun Install

https://bun.com/blog/behind-the-scenes-of-bun-install
287•Bogdanp•8h ago•87 comments

Making io_uring pervasive in QEMU [pdf]

https://vmsplice.net/~stefan/stefanha-kvm-forum-2025.pdf
29•ingve•2h ago•1 comments

Launch HN: Ghostship (YC S25) – AI agents that find bugs in your web app

26•jessechoe10•2h ago•10 comments

The Helix Text Editor (2024)

https://jonathan-frere.com/posts/helix/
76•gidellav•3d ago•29 comments

Adam (YC W25) Is Hiring to Build the Future of CAD

https://www.ycombinator.com/companies/adam/jobs/q6td4uk-founding-engineer
1•HetengAaronLi•3h ago

Show HN: Making a cross-platform game in Go using WebRTC Datachannels

https://pion.ly/blog/making-a-game-with-pion/
30•valorzard•1d ago•1 comments

AirPods live translation blocked for EU users with EU Apple accounts

https://www.macrumors.com/2025/09/11/airpods-live-translation-eu-restricted/
118•thm•9h ago•112 comments

CRISPR offers new hope for treating diabetes

https://www.wired.com/story/no-more-injections-crispr-offers-new-hope-for-treating-diabetes/
123•manveerc•7h ago•36 comments

A tech-law measurement and analysis of event listeners for wiretapping

https://arxiv.org/abs/2508.19825
51•lapcat•4h ago•5 comments

Conway's Game of Life, but musical

https://www.hudsong.dev/digital-darwin
129•hudsongr•7h ago•26 comments

Adjacency Matrix and std:mdspan, C++23

https://www.cppstories.com/2025/cpp23_mdspan_adj/
18•ashvardanian•3d ago•6 comments

How Palantir Is Mapping Everyone's Data for the Government

https://www.techdirt.com/2025/09/11/how-palantir-is-mapping-everyones-data-for-the-government/
25•mdhb•29m ago•1 comments

Randomly selecting points inside a triangle

https://www.johndcook.com/blog/2025/09/11/random-inside-triangle/
49•ibobev•1h ago•30 comments

GrapheneOS and Forensic Extraction of Data (2024)

https://discuss.grapheneos.org/d/13107-grapheneos-and-forensic-extraction-of-data
275•SoKamil•8h ago•147 comments

ApeRAG: Production-ready GraphRAG with multi-modal indexing and K8s deployment

https://github.com/apecloud/ApeRAG
10•earayu•3d ago•1 comments

From burner phones to decks of cards: NYC teens adjusting to the smartphone ban

https://gothamist.com/news/from-burner-phones-to-decks-of-cards-nyc-teens-are-adjusting-to-the-sm...
109•geox•7h ago•110 comments

An engineering history of the Manhattan Project

https://www.construction-physics.com/p/an-engineering-history-of-the-manhattan
105•rbanffy•8h ago•56 comments

Samsung taking market share from Apple in U.S. as foldable phones gain momentum

https://www.cnbc.com/2025/08/16/samsungs-us-market-share-apple-rivalry-foldable-phones.html
108•mgh2•12h ago•157 comments

Spiral

https://spiraldb.com/post/announcing-spiral
225•jorangreef•5h ago•76 comments

Center for the Alignment of AI Alignment Centers

https://alignmentalignment.ai
113•louisbarclay•9h ago•27 comments

Reshaped is now open source

https://reshaped.so/blog/reshaped-oss
234•michaelmior•11h ago•42 comments

Public Suffix List

https://publicsuffix.org/
42•mooreds•3d ago•10 comments

GrapheneOS accessed Android security patches but not allowed to publish sources

https://grapheneos.social/@GrapheneOS/115164133992525834
223•uneven9434•13h ago•52 comments