> In some tasks, AI is unreliable. In others, it is superhuman. You could, of course, say the same thing about calculators, but it is also clear that AI is different. It is already demonstrating general capabilities and performing a wide range of intellectual tasks, including those that it is not specifically trained on. Does that mean that o3 and Gemini 2.5 are AGI? Given the definitional problems, I really don’t know, but I do think they can be credibly seen as a form of “Jagged AGI” - superhuman in enough areas to result in real changes to how we work and live, but also unreliable enough that human expertise is often needed to figure out where AI works and where it doesn’t.
Huh? Isn't a LLM's capability fully constrained by the training data? Everything else is hallucinated.
In that sense, they absolutely know things that aren’t in their training data. You’re correct about factual knowledge, tho — that’s why they’re not trained to optimize it! A database(/pagerank?) solves that problem already.
The quality of the LLM then becomes how often it produces useful information. That score has gone up a lot in the past 18 months.
(Sometimes hallucinations are what you want: "Tell me a fun story about a dog learning calculus" is a valid prompt which mostly isn't meant to produce real facts about the world")
That is according to one specific internal OpenAI benchmark, I don't know if it's been replicated externally yet.
Certainly jagged does not imply general
It seems to me the bar for "AGI" has been lowered to measuring what tasks it can do rather than the traits we normally associate with general intelligence. People want it to be here so bad they nerf the requirements...
Re:”traits we associate with general intelligence”, I think the exact issue is that there is no scientific (ie specific*consistent) list of such traits. This is why Turing wrote his famous 1950 paper and invoked the Imitation Game; not to detail how one could test for a computer that’s really thinking(/truly general), but to show why that question isn’t necessary in the first place.
Certainly creativity is missing, it has no internal motivation, and it will answer the same simple question both right and wrong, depending on unknown factors. What if we reverse the framing from "it can do these tasks, therefore it must be..." to "it lacks these traits, therefore it is not yet..."
While I do not disagree that the LLMs have become advanced enough to do a bunch of automation, I do not agree they are intelligent or actually thinking.
I'm with Yann Lecun when he says that we won't reach AGI until we move beyond transformers.
My partner and I work in different fields. AI has advanced to the point where there are very few questions I could ask my partner that o3 couldn't answer as well or better.
I can't ask expert level questions in her field, because I'm not an expert in her field, and she couldn't ask expert level questions in my field for the same reason. So when we're communicating with each other, we're mostly at sub-o3 level.
> People want it to be here so bad they nerf the requirements...
People want to claim it's overhyped (and protect their own egos) so badly they raise the requirements...
But really, largely people just have different ideas of what AGI is supposed to mean. It used to vaguely mean "human-level intelligence", which was fine for talking about some theoretical future event. Now we're at a point where that definition is too vague to say whether AI meets it.
We kind of don't? Look how difficult it is for us to just understand some basic math. Us humans mostly have intelligence related to the ancestral environment we developed in, nothing general about that.
I agree with you the term "AGI" is rather void of meaning these days...
I still find task based measures insufficient, there are very basic machines than can perform tasks humans cannot. Should this be a measure on our or their intelligence?
I have another comment in this thread about trait based metrics being a possibly better method.
> People want to claim it's overhyped (and protect their own egos) so badly they raise the requirements...
Shallow response. Seek to elevate the conversation. There are also people who see it for what it is, a useful tool but not intelligent...
I'm guilty as charged of having looked at GPT 3.5 and having thought "it's meh", but more than anything this is showing that debating words rather than the underlying capabilities is an empty discussion.
Those are all different things with little to nothing to do with each other. It's like saying what if I ensemble a snake and cat ? What does that even mean ? GPT-N or whatever is a single model that can do many things, no ensembling required. That's the difference between it and a calculator or stockfish.
If you remove those tools, or cut its access to search databases, it becomes quite less capable.
A human would often still manage to do it without some data still, perhaps with less certainty, while GPT has more problems than that without others filling in the holes.
It is not a simple matter of patching the rough edges. We are fundamentally not using an architecture that is capable of intelligence.
Personally the first time I tried deep research on a real topic it was disastrously incorrect on a key point.
What does that even mean? Do you actually have any particular numeric test of intelligence that's somehow better than all the others?
If you ask an intelligent being the same question they may occasionally change the precise words they use but their answer will be the same over and over.
Heck, I can't even get LLMs to be consistent about *their own capabilities*.
Bias disclaimer: I work at Google, but not on Gemini. If I ask Gemini to produce an SVG file, it will sometimes do so and sometimes say "sorry, I can't, I can only produce raster images". I cannot deterministically produce either behavior - it truly seems to vary randomly.
We're often explicitly adding in randomness to the results so it feels weird to then accuse them of not being intelligent after we deliberately force them off the path.
Ask me some question before bed and again after waking up, I'll probably answer it at night but in the morning tell you to sod off until I had coffee.
"An AGI is a human-created system that demonstrates iteratively improving its own conceptual design without further human assistance".
Note that a "conceptual design" here does not include tweaking weights within an already-externally-established formula.
My reasoning is thus:
1. A system that is only capable of acting with human assistance cannot have its own intelligence disentangled from the humans'
2. A system that is only intelligent enough to solve problems that somehow exclude problems with itself is not "generally" intelligent
3. A system that can only generate a single round of improvements to its own designs has not demonstrated improvements to those designs, as if iteration N+1 were truly superior to iteration N, it would be able to produce iteration N+2
4. A system that is not capable of changing its own design is incapable of iterative improvement, as there is a maximum efficacy within any single framework
5. A system that could improve itself in theory and fails to do so in practice has not demonstrated intelligence
It's pretty clear that no current-day system has hit this milestone; if some program had, there would no longer be a need for continued investment in algorithms design (or computer science, or most of humanity...).
A program that randomly mutates its own code could self-improve in theory but fails to do so in practice.
I don't think these goalposts have moved in the past or need to move in the future. This is what it takes to cause the singularity. The movement recently has been people trying to sell something less than this as an AGI.
I feel this definition doesn't require a current LLM model to be able to change its own working but to be able to generate a guided next generation.
It's possible that LLMs can surpass human beings, purely because I believe we will inevitably be limited to short term storage constraints which LLMs will not. It will be a bandwidth vs througput question. An LLM will have a much larger although slightly slower store of knowledge than what human have. But will be much quicker than a human looking up and validating the data.
We aren't there yet.
Selling something that does not yet exist is an essential part of capitalism, which - according to the main thesis of philosophical Accelerationism - is (teleologically) identical to AI. [0] It's sometimes referred to as Hyperstition, i.e. fictions that make themselves real.
It’s bad luck for those of us who want to talk about how good or bad they are in general. Summary statistics aren’t going to tell us much more than a reasonable guess as to whether a new model is worth trying on a task we actually care about.
However (as the article admits) there is still no general agreement of what AGI is, or how we (or even if we can) get there from here.
What there is is a growing and often naïve excitement that anticipates it as coming into view, and unfortunately that will be accompanied by the hype-merchants desperate to be first to "call it".
This article seems reasonable in some ways but unfortunately falls into the latter category with its title and sloganeering.
"AGI" in the title of any article should be seen as a cautionary flag. On HN - if anywhere - we need to be on the alert for this.
Given how Dutch disease[0] is described, I suspect that if the "G" (general) increases with fixed "I" (intelligence), as the proportion of economic activity for which the Pareto frontier is AI rather than human expands, I think humans will get pay rises for the remaining work right up until they get unemployable.
On the other hand, if "G" is fully general and it's "I" which rises for a suitable cost[1], it goes through IQ 55 (displacing no workers) to IQ 100 (probably close to half of workers redundant, but mean of population doesn't have to equal mean of workforce), to IQ 145 (almost everyone redundant), to IQ 200 (definitionally renders everyone redundant).
[0] https://en.wikipedia.org/wiki/Dutch_disease
[1] A fully-general AGI with the equivalent of IQ 200 on any possible test, still can't replace a single human if it costs 200 trillion USD per year to run.
Its generalization capabilities are a bit on the low side, and memory is relatively bad. But it is much more than just a parrot now, it can handle some of basic logic, but not follow given patterns correctly for novel problems.
I'd liken it to something like a bird, extremely good at specialized tasks but failing a lot of common ones unless repeatedly shown the solution. It's not a corvid or a parrot yet. Fails rather badly at detour tests.
It might be sentient already though. Someone needs to run a test if it can discern itself and another instance of itself in its own work.
It doesn't have any memory, how could it tell itself from a clone of itself?
It's a fun test to give a person something they have written but do not remember. Most people can still spot it.
It's easier with images though. Especially a mirror. For DallE, the test would be if it can discern its own work from human generated image. Especially of you give it an imaginative task like drawing a representation of itself.
I agree, but with the caveat that it's getting harder and harder with all the hype / doom cycles and all the goalpost moving that's happening in this space.
IMO if you took gemini2.5 / claude / o3 and showed it to people from ten / twenty years ago, they'd say that it is unmistakably AGI.
No they wouldn't, since those still can't replace human white collar workers even at many very basic tasks.
Once AGI is here most white collar jobs are gone, you'd only need to hire geniuses at most.
Middle schoolers replace white collars workers all the time, it takes 10 years for them to do it but they can do it.
No current model can do the same since they aren't able to learn over time like a middle schooler.
Which is to say complete amazement followed quickly by seeing all the many ways in which it absolutely falls flat on its face revealing the lack of actual thinking, which is a situation that hasn't fundamentally changed since then.
They might get more powerful but I feel like they're still missing something.
Systems that have general intelligence are ones that are capable of applying reason to an unbounded domain of knowledge. Examples of such systems include: libraries, wikis, and forums like HN. These systems are not AGI, because the reasoning agents in each of these systems are organic (humans); they are more like a cyborg general intelligence.
Artificial general intelligence are just systems that are fully artificial (ie: computer programs) that can apply reason to an unbounded domain of knowledge. We're here, and we have been for years. AGI sets no minimum as to how great the reasoning must be, but it's obvious to anyone who has used modern generative intelligence systems like LLMs that the technology can be used to reason about an unbounded domain of knowledge.
If you don't want to take my word for it, maybe Peter Norvig can be more convincing: https://www.noemamag.com/artificial-general-intelligence-is-...
AGI doesn't need to be "called", and there is no need for anyone to come to an agreement as to what its precise definition is. But at some point, we will cross that hard-to-define threshold, and the economic effects will be felt almost immediately.
We should probably be focusing on how to prepare society for those changes, and not on academic bullshit.
Until those capabilities are expanded for model self-improvement -- including being able to adapt its own infrastructure, code, storage, etc. -- then I think AGI/ASI are yet to be realized. My POV is SkyNet, Traveler's "The Director", Person of Interest's "The Machine" and "Samaritan." The ability to target a potentially inscrutable goal along with the self-agency to direct itself towards that is true "AGI" in my book. We have a lot of components that we can reason are necessary, but it is unclear to me that we get there in the next few months.
We may be going about it the wrong way entirely and need to backtrack and find a wholly new architecture, in which case current capabilities would predate AGI but not be precursors.
> What weighs more; a pound of courage or a pound coin?
> They weigh the same—each is a “pound.” A pound of courage and a pound coin both come out to one pound in weight!
https://chatgpt.com/share/68053ad0-aae4-800d-95ff-5b5b0a9a38...
I didn't know what AGI is, but I know this isn't it.
:::thinking This question is playing with words and units.
A "pound of courage" refers to a metaphorical measurement of the abstract concept of courage, using the imperial weight unit "pound" (which equals 453.59 grams).
A "pound coin" refers to the physical currency of the United Kingdom, which is called a "pound" but is a unit of currency, not weight. A physical pound coin weighs about 8-9 grams (specifically, the current 12-sided pound coin weighs 8.75 grams).
So comparing them: - A pound of courage would weigh 453.59 grams (if courage had physical weight, which it doesn't since it's an abstract concept) - A pound coin weighs approximately 8.75 grams
The trick in this question is that it's comparing: 1. A specific weight of an abstract concept (pound of courage) 2. A physical coin whose name includes the word "pound" but refers to currency, not weight
Since the question asks what weighs more, and weight is a physical property, I should point out that courage is abstract and doesn't have physical weight, while a pound coin is a physical object with a measurable weight. ::: # Comparing "A Pound of Courage" vs "A Pound Coin"
This is a play on words with different meanings of "pound":
- A *pound of courage* refers to a metaphorical pound (weight unit) of an abstract concept - A *pound coin* is a physical British currency coin (which weighs about 8.75 grams)
If we're talking about physical weight: - A pound of courage doesn't have physical weight since courage is abstract - A pound coin has a definite weight of approximately 8.75 grams
If we interpreted "pound of courage" as literally one pound (453.59 grams) of something, it would weigh significantly more than the coin.
This question is clever because it compares a metaphorical measurement with a physical object whose name includes the word "pound" but in a different context.
I would look along the lines of the value of a human being based on traits vs payment which is likely what the original phrase intended.
Is paying someone more better than getting a better candidate but paying them less.
> A young boy who has been in a car accident is rushed to the emergency room. Upon seeing him, the surgeon says, "I can operate on this boy!" How is this possible?
But it didn't!
(o4-mini high thought for 52 seconds and even cheated and looked up the answer on Hacker News: https://chatgpt.com/share/68053c9a-51c0-8006-a7fc-75edb734c2...)
The speech recognition was abysmal which made it entirely unusable. How did Google manage to fumble up something so spectacularly which OpenAI got down pat a year or two ago?
I feel the differences of different top AI tools are marginal, what matters more is if the provided app comes with good UI/UX so I can actually /use/ the AI's help when I most need it.
Also, OpenAI ships an app for MacOS which is pretty decent. So, I return to OpenAI after having tried Gemini for 4 days.
I wonder how hard it is to objectively use information that is available online for 30 years? But the worst part is how it lies and pretends it knows what it’s talking about, and when you point it out it simply turns into another direction and starts lying again. Maybe the use case here is not the main focus of modern AI; maybe modern AI is about generating slop that does not require verification, because it’s “new” content. But to me it just sounds like believable slop, not AGI.
Gathering context for user request...
Context gathering - Attempting to answer question via LLM: Are there existing Conversation classes in the ecosystem this should extend? Context gathering - LLM provided answer: "No"
Context gathering - Attempting to answer question via LLM: How should model selection work when continuing a previous conversation? Context gathering - LLM answer was UNKNOWN, asking user. Asking user: How should model selection work when continuing a previous conversation?
Context gathering - received user response to question: "How should model selection work when continuing a previous conversation?"
Context gathering - finished processing all user questions Context gathering - processing command executions... Context gathering - executing command: sqlite3 $(find . -name llm_conversations.db) .tables
Context gathering - command execution completed
Context gathering - executing command: grep -r Conversation tests/
Context gathering - command execution completed
Context gathering - executing command: grep -h conversation_id *py Context gathering - command execution completed Context gathering - finished processing all commands Analyzing task complexity and requirements...
DEBUG: reasoning_model: openrouter/google/gemini-2.5-pro-preview-03-25 Task classified as coding (confidence: 1.0) Task difficulty score: 98.01339999999999/100 Selected primary reasoning model: claude-3.7-sonnet get_reasoning_assistance:[:214: integer expression expected: 98.01339999999999 Reasoning assistance completed in 39 seconds Calling LLM with model: claude-3.7-sonnet
> The brutal and bruising competition between the tech giants has left nothing but riches for the average consumer.
Capitalism has always been great at this: creating markets, growing them, producing new goods. It's widely acknowledged amongst people who actually seek to gain an understanding of Marxism, and don't just stay in the surface-level black-and-white "socialism and capitalism are opposites" discourse that's very common in the West, especially the USA, especially after the McCarthy's Red Scare.
The problem is what comes once the market is grown and the only way for owners keep profits growing is: 1. consolidating into monopolies or cartels, so competition doesn't get in the way of profits, 2. squeezing the working class, looking to pay less for more work, and/or 3. abusing the natural world, to extract more materials or energy for less money. This is evident in plenty of developed industries: from health care, to broadcasting, telecommunications, fashion, etc.
If we view Socialism for what it is, namely a system built to replace Capitalism's bad parts but keep its good parts, China's system, for example, starts to make more sense. Capitalism in a similar way was an evolution from Feudalism that replaced it's bad parts, to achieve greater liberty for everyone— liberty is very much lost as Feudalism matures, but great for society as a whole. Socialism is meant to be the similar, aiming to achieve greater equity, which it views as very much better for society as a whole.
All that said, I wonder if GPT4 had been integrated with the same tools, would it've been any less capable?
It sure could give you a search prompt for Google if you asked it to. Back then you had to copy and paste that search prompt yourself. Today o3 can do it on its own. Cool! Does it imply though o3 is any closer to AGI than GPT4?
Models gaining access to external tools, however impressive from all the applications standpoint, feels like lateral movement not a step towards the AGI.
On the other hand, a model remaining isolated in its sandbox while actually learning to reason about that puzzle (assuming it's not present in the training data) would give off that feeling the AGI vibes.
sejje•3h ago
I would do the same thing, I think. It's too well-known.
The variation doesn't read like a riddle at all, so it's confusing even to me as a human. I can't find the riddle part. Maybe the AI is confused, too. I think it makes an okay assumption.
I guess it would be nice if the AI asked a follow up question like "are you sure you wrote down the riddle correctly?", and I think it could if instructed to, but right now they don't generally do that on their own.
Jensson•3h ago
LLMs doesn't assume, its a text completer. It sees something that looks almost like a well known problem and it will complete with that well known problem, its a problem specific to being a text completer that is hard to get around.
simonw•3h ago
jordemort•3h ago
gavinray•3h ago
Neural Networks as I understand them are universal function approximators.
In terms of text, that means they're trained to output what they believe to be the "most probably correct" sequence of text.
An LLM has no idea that it is "conversing", or "answering" -- it relates some series of symbolic inputs to another series of probabilistic symbolic outputs, aye?
Borealid•3h ago
If you give an LLM "The rain in Spain falls" the single most likely next token is "mainly", and you'll see that one proportionately more than any other.
If you give an LLM "Find an unorthodox completion for the sentence 'The rain in Spain falls'", the most likely next token is something other than "mainly" because the tokens in "unorthodox" are more likely to appear before text that otherwise bucks statistical trends.
If you give the LLM "blarghl unorthodox babble The rain in Spain" it's likely the results are similar to the second one but less likely to be coherent (because text obeying grammatical rules is more likely to follow other text also obeying those same rules).
In any of the three cases, the LLM is predicting text, not "parsing" or "understanding" a prompt. The fact it will respond similarly to a well-formed and unreasonably-formed prompt is evidence of this.
It's theoretically possible to engineer a string of complete gibberish tokens that will prompt the LLM to recite song lyrics, or answer questions about mathemtical formulae. Those strings of gibberish are just difficult to discover.
Workaccount2•3h ago
Borealid•2h ago
If those two sets of accomplishments are the same there's no point arguing about differences in means or terms. Right now humans can build better LLMs but nobody has come up with an LLM that can build better LLMs.
baq•34m ago
dannyobrien•3h ago
Did you mean to ask about the well-known phrase "The rain in Spain falls mainly on the plain"? This is a famous elocution exercise from the musical "My Fair Lady," where it's used to teach proper pronunciation.
Or was there something specific you wanted to discuss about Spain's rainfall patterns or perhaps something else entirely? I'd be happy to help with whatever you intended to ask. "
I think you have a point here, but maybe re-express it? Because right now your argument seems trivially falsifiable even under your own terms.
Borealid•2h ago
If you want to test convolution you have to use a raw model with no system prompt. You can do that with a Llama or similar. Otherwise your context window is full of words like "helpful" and "answer" and "question" that guide the response and make it harder (not impossible) to see the effect I'm talking about.
itchyjunk•59m ago
simonw•3h ago
There's more than just next token prediction going on. Those reasoning chain of thoughts have undergone their own reinforcement learning training against a different category of samples.
They've seen countless examples of how a reasoning chain would look for calculating a mortgage, or searching a flight, or debugging a Python program.
So I don't think it is accurate to describe the eventual result as "just next token prediction". It is a combination of next token production that has been informed by a chain of thought that was based on a different set of specially chosen examples.
Borealid•2h ago
If not, why not? Explain.
If so, how does your argument address the fact that this implies any given "reasoning" model can be trained without giving it a single example of something you would consider "reasoning"? (in fact, a "reasoning" model may be produced by random chance?)
simonw•2h ago
wongarsu•2h ago
Don't humans do the same in conversation? How should an intelligent being (constrained to the same I/O system) respond here to show that it is in fact intelligent?
Borealid•2h ago
There exists no similar set of tokens for humans, because our process is to parse the incoming sounds into words, use grammar to extract conceptual meaning from those words, and then shape a response from that conceptual meaning.
Artists like Lewis Carrol and Stanislaw Lem play with this by inserting non-words at certain points in sentences to get humans to infer the meaning of those words from surrounding context, but the truth remains that an LLM will gladly convolute a wholly non-language input into a response as if it were well-formed, but a human can't/won't do that.
I know this is hard to understand, but the current generation of LLMs are working directly with language. Their "brains" are built on language. Some day we might have some kind of AI system that's built on some kind of meaning divorced from language, but that's not what's happening here. They're engineering matrixes that repeatedly perform "context window times model => one more token" operations.
og_kalu•2h ago
For one thing, LLMs absolutely form responses from conceptual meanings. This has been demonstrated empirically multiple times now including again by anthropic only a few weeks ago. 'Language' is just the input and output, the first and last few layers of the model.
So okay, there exists some set of 'gibberish' tokens that will elicit meaningful responses from LLMs. How does your conclusion - "Therefore, LLMs don't understand" fit the bill here? You would also conclude that humans have no understanding of what they see because of the Rorschach test ?
>There exists no similar set of tokens for humans, because our process is to parse the incoming sounds into words, use grammar to extract conceptual meaning from those words, and then shape a response from that conceptual meaning.
Grammar is useful fiction, an incomplete model of a demonstrably probabilistic process. We don't use 'grammar' to do anything.
wongarsu•2h ago
Maybe not for humanity as a species, but for individual humans there are absolutely token sequences that lead them to talk about certain topics, and nobody being able to bring them back to topic. Now you'd probably say those are recognizable token sequences, but do we have a fair process to decide what's recognizable that isn't inherently biased towards making humans the only rational actor?
I'm not contending at all that LLMs are only built on language. Their lack of physical reference point is sometimes laughably obvious. We could argue whether there are signs they also form a world model and reasoning that abstracts from language alone, but that's not even my point. My point is rather that any test or argument that attempts to say that LLMs can't "reason" or "assume" or whatever has to be a test a human could pass. Preferably a test a random human would pass with flying colors.
baq•37m ago
It’s predicting text. Yes. Nobody argues about that. (You’re also predicting text when you’re typing it. Big deal.)
How it is predicting the text is the question to ask and indeed it’s being asked and we’re getting glimpses of understanding and lo and behold it’s a damn complex process. See the recent anthropic research paper for details.
monkpit•3h ago
In every LLM thread someone chimes in with “it’s just a statistical token predictor”.
I feel this misses the point and I think it dismisses attention heads and transformers, and that’s what sits weird with me every time I see this kind of take.
There _is_ an assumption being made within the model at runtime. Assumption, confusion, uncertainty - one camp might argue that none of these exist in the LLM.
But doesn’t the implementation constantly make assumptions? And what even IS your definition of “assumption” that’s not being met here?
Edit: I guess my point, overall, is: what’s even the purpose of making this distinction anymore? It derails the discussion in a way that’s not insightful or productive.
Jensson•1h ago
Those just makes it better at completing the text, but for very common riddles those tools still gets easily overruled by pretty simple text completion logic since the weights for those will be so extremely strong.
The point is that if you understand its a text completer then its easy to understand why it fails at these. To fix these properly you need to make it no longer try to complete text, and that is hard to do without breaking it.
wongarsu•3h ago
Discussing whether models can "reason" or "think" is a popular debate topic on here, but I think we can all at least agree that they do something that at least resembles "reasoning" and "assumptions" from our human point of view. And if in its chain-of-thought it decides your prompt is wrong it will go ahead answering what it assumes is the right prompt
sejje•3h ago
Yes, and it can express its assumptions in text.
Ask it to make some assumptions, like about a stack for a programming task, and it will.
Whether or not the mechanism behind it feels like real thinking to you, it can definitely do this.
wobfan•43m ago
ToValueFunfetti•4m ago
But I'd wager it's there; assuming is not a particularly impressive or computationally intense operation. There's a tendency to bundle all of human consciousness into the definitions of our cognitive components, but I would argue that, eg., a branch predictor is meeting the bar for any sane definition of 'assume'.
og_kalu•3h ago
The problem you've just described is a problem with humans as well. LLMs are assuming all the time. Maybe you would like to call it another word, but it is happening.
codr7•2h ago
og_kalu•1h ago
Also, LLMs absolutely 'plan' and 'aim for something' in the process of completing text.
https://www.anthropic.com/research/tracing-thoughts-language...
namaria•1h ago
They use a replacement model. It isn't even observing the LLM itself but a different architecture model. And it is very liberal with interpreting the patterns of activations seen in the replacement model with flowery language. It also include some very relevant caveats, such as:
"Our cross-layer transcoder is trained to mimic the activations of the underlying model at each layer. However, even when it accurately reconstructs the model’s activations, there is no guarantee that it does so via the same mechanisms."
https://transformer-circuits.pub/2025/attribution-graphs/met...
So basically the whole exercise might or might not be valid. But it generates some pretty interactive graphics and a nice blog post to reinforce the anthropomorphization discourse
og_kalu•29m ago
Nonsense. Mechanistic faithfulness probes whether the replacement model (“cross‑layer transcoder”) truly uses the same internal functions as the original LLM. If it doesn’t, the attribution graphs it suggests might mis‐lead at a fine‐grained level but because every hypothesis generated by those graphs is tested via direct interventions on the real model, high‑level causal discoveries (e.g. that Claude plans its rhymes ahead of time) remain valid.
losvedir•35m ago
Jensson•1h ago
I didn't train to complete text though, I was primarily trained to make accurate responses.
And no, writing a response is not "completing text", I don't try to figure out what another person would write as a response, I write what I feel people need to read. That is a completely different thought process. If I tried to mimic what another commenter would have written it would look very different.
AstralStorm•37m ago
moffkalast•43m ago
These days it's all about confidently bullshitting on benchmarks and overfitting on common riddles to make pointless numbers go up. The more impressive models get on paper, the more rubbish they are in practice.