We're reaching scaling limits with transformers. The number of parameters in our largest transformers, N, is now in the order of trillions, which is the most we can apply given the total number of tokens of training data available worldwide, D, also in the order of trillions, resulting in a compute budget C = 6N × D, which is in the order of D². OpenAI and Google were the first to show these transformer "scaling laws." We cannot add more compute to a given compute budget C without increasing data D to maintain the relationship. As the OP puts it, if we want to increase the number of GPUs by 2x, we must also increase the number of parameters and training tokens by 1.41x, but... we've already run out of training tokens.
We must either (1) discover new architectures with different scaling laws, and/or (2) compute new synthetic data that can contribute to learning (akin to dreams).
To be clear I also agree with your (1) and (2).
What exactly are you considering a "data point"?
Are you assuming one model = one agent instance?
I am pretty sure that there is more information (molecular structure) and functional information (I(Ex )) just in the room I am sitting in than all the unique, useful, digitized information on earth.
Having said that, I tend to agree that having AI interact with the world may be key: for one thing, I'm not sure whether there is any sense in which LLMs understand that most of the information content of language is about an external world.
But the networking potential of digital compute is a fundamentally different paradigm than living systems. The human brain is constrained in size by the width of the female pelvis.
So while it's expensive, we can trade scope-constrained robustness (replication and redundancy at many levels of abstraction), for broader cognitive scale and fragility (data centers can't repair themselves and self-replicate).
Going to be interesting to see it all unfold... my bet is on stacking S-curves all the way.
I believe you but I would love to know where this number came from just so I can read more about it
# If there are any bioinformaticians around please come eviscerate or confirm this calc #
Then compared it to TSMC 2nm research macro of (38.1 Mbit/mm^2) normalized to cell scale: 0.00019 Mbit/μm³
Living Cells: 1–10 Mbit/μm³
Current best chips: 0.00019 Mbit/μm³
I was also taking the information from a genome and then dividing it by the volume of a cell, but there are many instances of the genome in a cell. I didn't count all instances because they aren't unique.
There's a lot to unpack with this comparison and my approximation was crude, but the more I've dug into this comparison the more apparent how incredibly efficient life is at managing and processing and storing information. Especially if you also consider the amino acids, proteins, etc as information. No matter how you slice it, life seems orders of magnitude more efficient by every metric.
I'd like to think there's a paper somewhere where someone has carefully unpacked this and formally quantified it all.
Well, it _was_ until recently.
In what sense is this true? That sounds suspiciously like cubic meter of dirt is more advanced than an iPhone because there are 6-7 orders of magnitude more atoms in the dirt.
functional information is basically the amount of data (bits) necessary to explain all the possible functions matter can perform based on its unique configuration (in contrast to random). I am sure I partially butchered this explanation... but hopefully its close enough to catch my drift.
Life is optimized to process and learn from the real world, and it is insanely efficient at it and functional-information dense. (It might even be at the theoretical limit) Our most advanced technology is still 4-5 orders of magnitude behind it.
The capabilities of your iPhone are extremely narrow when compared to a handful of dirt. To you it may seem the opposite, but you are probably mixing up utility to you with functional capability. Your iPhone is has more functional utility to you, but the same amount dirt has way more general functional utility. (Your iphone isn't capable of self-replication, self-repair, and self-nonself distinction aka autopoiesis)
I think this is an old belief that isn't supported by modern research.
From what I can tell, science used to point to this as the only/primary limit to human-brain size, but more recently the picture seems a lot less clear, with some indications that pelvis size doesn't place as hard of a constraint as we thought and there are other constraints such as metabolic (how many calories the mother can sustain during pregnancy and lactation).
So overall I'd say you are technically correct, even though this doesn't really materially change the point I was making; which is that the size of the human brain is constrained in ways that the size of data centers are not.
https://en.wikipedia.org/wiki/Obstetrical_dilemma
While the width is constrained by bipedal locomotion.
Of course we can, this is a non issue.
See e.g. AlphaZero [0] that's 8 years old at this point, and any modern RL training using synthetic data, e.g. DeepSeek-R1-Zero [1].
Yes, distillation is a thing but that is more about compression and filtering. Distillation does not produce new data in the same way that chess games produce new positions.
To me, the answer is clearly no. There is no new information content in the generated data. Its just a remix of what already exists.
But generally the idea is that it's, you need some notion of reward, verifiers etc.
Works really well for maths, algorithms, amd many things actually.
See also this very short essay/introduction: https://www.jasonwei.net/blog/asymmetry-of-verification-and-...
That's why we have IMO gold level models now, and I'm pretty confident we'll have superhuman mathematics, algorithmic etc models before long.
Now domains which are very hard to verify - think e.g. theoretical physics etc - that's another story.
i dont think youre getting the point hes making.
You make models talk to each other, create puzzles for each other's to solve, ask each other to make cases and evaluate how well they were made.
Will some of it look like ramblings of pre-scientific philosophers? (or modern ones because philosophy never progressed after science left it in the dust)
Sure! But human culture was once there too. And we pulled ourselves out of this nonsense by the bootstraps. We didn't need to be exposed to 3 alien internet's with higher truth.
It's really a miracle that AIs got as much as they did from purely human generated mostly garbage we cared to write down.
Human progress was promoted by having to interact with a physical world that anchored our ramblings and gave us a reward function for coherence and cooperation. LLMs would need some analogous anchoring for it to progress beyond incoherent babble.
Yeah. I tried to be funny. It's not that easy. However AI people already started doing it and AI gains perhaps of the last year come mostly from this approach.
> For example, if you just get two LLMs setting each other puzzles and scoring the others solutions how do you stop this just collapsing into nonsense?
That's the trillion dollar question. I wonder how people are doing it. Maybe through economy? You ultimately need to sell your ramblings to somebody to sustain yourself. If you can't, you starve.
Maybe that's enough for AI as well? Companies with AIs that descended into nonsense won't have anymore money to train them further. Maybe companies will need to set up their internal ecosystems of competing AI training organizations and split the budget based on how useful they are becoming?
Phrasing this in a terminology of "truth" is probably counterproductive because there's no truth. There's only what sells. If you have customers in manufacturing probably things that sell will coincide with some physical truths, but this is emergent, not the goal or even part of the process or acquiring capabilities.
That's option (2) in the parent comment: synthetic data.
If C = D^2, and you double compute, then 2C ==> 2D^2. How do you and the original author get 1.41D from 2D^2?
In other words, the required amount of data scales with the square root of the compute. The square root of 2 ~= 1.414. If you double the compute, you need roughly 1.414 times more data.
when Dennard scaling (single core performance) started to fail in 90s-00s, I don't think there was a sentiment "how stupid was it to believe such a scaling at all"?
sure, people were compliant (and we still meme about running Crysis), but in the end the discussion resulted in "no more free lunch" - progress in one direction has hit a bottleneck, so it's time to choose some other direction to improve on (and multi-threading has now become mostly the norm)
I don't really see much of a difference?
We probably don’t have most of the Arabic corpus either — and barely any Sanskrit. Classical Chinese is probably also lacking — only about 1% of it is translated to English.
Marsilio Ficino was hired by the Medici to translate Plato and other classical Greek works into Latin. He directly taught DaVinci, Raphael, Michelangelo, Toscanelli, etc. I mean to say that his ideas and perspectives helped spark the renaissance.
Insofar as we hope for an AI renaissance and not an AI apocalypse, it might benefit us to have the actual renaissance in the training data.
If you make a cursory search you can also find other translations of his works, various biographies, and a wide range of commentary and criticism by later authors.
Many of Ficino's originals are also in the corpus of scanned and OCRed or recently republished texts. I'm sure there are archives here or there with additional materials which have not been digitized, but it seems questionable whether those would make any significant difference to a process as indiscriminate and automatic as LLM training.
And he is one of the most central figures of the renaissance. Less than 20% of neolatin has been digitized, let alone translated.
It is fine to question whether including neolatin, Arabic or Sanskrit in AI training will make AI better.
But for me, it is a core set of humanism that would be a shame to neglect.
So now, in order to progress further, we either have to improve the methods, or synthetically generate more training data, or both.
As you suggest, this costs lots of time and compute. But it's produced breakthroughs in the past (see AlphaGo Zero self-play) and is now supposedly a standard part of model post-training at the big labs.
But if we've already fed them all the data, and we don't have AGI (which we manifestly don't), then there's no way to get to AGI with LLMs and the tech/VC industry is about to have a massive, massive problem justifying all this investment.
Advances in architecture and training protocols can and will easily dwarf "more data". I think that is quite obvious from the fact that humans learn to be quite intelligent using only a fraction of the data available to current LLMs. Our advantage is a very good pre-baked model, and feedback-based training.
Even when written down, without the ability to interact with and probe the world like you did growing up it's not possible to meaningfully tell the difference between 9/11 hoaxers and everyone else save for how frequent the relative texts appear. They don't have the ability to meaningfully challenge their world model, and that makes the current breadth of written content even less useful than it might otherwise appear.
No, the paths forward are: better design, training, feeding in more video, audio, and general data from the outside world. The web is just a small part of our experience. What about apps, webcam streams, radio from all over the world in its many forms, OTA TV, interacting with streaming content via remote, playing every video game, playing board games with humans, feeds and data from robots LLMs control, watching everyone via their phones and computers, car cameras, security footage and CCTV, live weather and atmospheric data, cable television, stereoscopic data, ViewMaster reels, realtime electrical input from various types of brains while interacting with their attached creatures, touch and smell, understanding birth, growth, disease, death, and all facets of life as an observer, observing those as a subject, expanding to other worlds, solar systems, galaxies, etc., affecting time and space, search and communication with a universal creator, and finally understanding birth and death of the universe.
As someone that didn't go to expensive maths club, the way people who did, talk about maths is disgraceful imho. Consider the equasion in this article:
(C ~ 6 N⋅D)
I can look up the symbol for "roughly equals", that was super cool and is a great part of curiousity. But this _implied_ multiplication between the 6 and the N combined with using a fucking diamond symbol (that I already despise given how long it took me to figure the first time I encountered it) is just gross. I figured it was likely that but then I was like: "but why not just 6ND? Maybe there's a reason why N⋅D but 6 N? Does that mean there's a difference between those operations"?
Thankfully I can use gippity these days to get by, but before gippity I had to look up an entire list of maths symbols to find the diamond symbol to work out what it meant. Its why I love code because there's considerably less implicit behaviour once you slap down the formula into code and you can play with the input/output.
I don't think mathsy people realise how exclusionary their communication is, but its so frustrating when I end up fumbling around in slow-mo when the maths kicks in, because "oh the /2 when discussing logarithms in comp sci is _obvious_, so we just don't put it in the equasion" just kills me. Idiot me, staring at the equasion thinking it actually makes sense without knowing the special maths knowledge of implication means that it actually doesn't solve as it reads on the page. Unless of course you went to expensive maths club where they tell you all this.
What drives me nuts is that every time I spend ages finally grokking something, I realise how obvious it is and how non-trivial it is to explain it simply. Comp sci isn't much better to be honest, where we use CQRS instead of "read here, write there". Which results in thousands of newbies trying to parse the unfathomable complexity of "Command Query Responsibility Segregation" and spending as much time staring at its opaqueness as I did the opening sentence of the wikipedia article on logarithms.
Idk what my point is, I just don't understand what's wrong with 6⋅N⋅D or 6*N*D. Do mathmeticians feel ugly if they write something down like that or smth?
> That never happened
Bit harsh, I don't see the need for gaslighting. Sure I might be losing my mind but I specifically remember it because it took me so long to find a symbol that matched it online.
To clarify, if it read:
C ~ X N⋅D
you'd be as confused as me? Its because its a number it has special implied mechanics where we can skip operators because its "obvious".
I don't think of it as eliding obvious operators. Rather in mathematics juxtaposition is used as an operator to represent multiplication. You would never elide an addition operator.
So X next to D still means multiplication as long as you can tell that X and D are separate entities.
I would wonder why they switched conventions in the middle of an expression though.
Sure, but its not clear to me. I'm just cross about implied convention in maths.
I think most people that read this would be confused and try to find out why there are some undisclosed vectorial operations applied to what looks like scalar numbers.
And yeah, mathematical notation is ugly and confusing. But the fix is not as simple as you think it is.
Apropos CQRS, it's a marketing name. It's hard to understand on purpose. Actual CS-made names tend to be easier.
One of the best things I've read in a while about AI.
The premise of this article, that data is more important than compute has been obvious to people who are paying attention.
Sorry but the unnecessary sensationalism in this article was mildly annoying to me. Like the author discovered some novel new insight. A bit like that doctor who published a "no el" paper about how to find the area under a curve.
Well, forgive me but I feel that the article is a much-needed injection of context into my thinking around the Bitter Lesson. I like the imperative to preface compute requests with data roadmaps.
I'm not an AI guy. Not an ML engineer. I've been studiously avoiding the low-level stuff, actually, because I didn't want to half-ass it when off-the-shelf solutions were still providing tremendous novelty and value for my customers.
So, for most of my career, "compute" has been practically irrelevant! RAM and disk constraints presented more frequent obstacles than processor cycles'. I would have easily told you that data presents more of a bottleneck to value than CPU. But that's just the era of computing I came up in.
The last few years have been different. Suddenly compute is at a premium, again. So it's easy to think, "if only I had more," and "line goes up!" and forget about s-curves and logarithmic scaling.
Is the article unnecessarily sensationalist? I don't know, maybe you've been overestimating how much the rest of us are "paying attention."[0]
Reading through the comments, I think there's one key point that might be getting lost: this isn't really about whether scaling is "dead" (it's not), but rather how we continue to scale for language models at the current LM frontier — 4-8h METR tasks.
Someone commented below about verifiable rewards and IMO that's exactly it: if you can find a way to produce verifiable rewards about a target world, you can essentially produce unlimited amounts of data and (likely) scale past the current bottleneck. Then the question becomes, working backwards from the set of interesting 4-8h METR tasks, what worlds can we make verifiable rewards for and how do we scalably make them? [1]
Which is to say, it's not about more data in general, it's about the specific kind of data (or architecture) we need to break a specific bottleneck. For instance, real-world data is indeed verifiable and will be amazing for robotics, etc. but that frontier is further behind: there are some cool labs building foundational robotics models, but they're maybe ~5 years behind LMs today.
[1] There's another path with better design, e.g. CLIP that improves both architecture and data, but let's leave that aside for now.
I feel like there's an interesting symmetry here between the pre and post LLM world, where I've always found that organisations over-optimise for things they can measure (e.g. balance sheets) and under-optimise for things they can't (e.g. developer productivity), which explains why its so hard to keep a software product up to date in an average org, as the natural pressure is to run it into the ground until a competitor suddenly displaces it.
So in a post LLM world, we have this gaping hole around things we either lack the data for, or as you say: lack the ability to produce verifiable rewards for. I wonder if similar patterns might play out as a consequence and what unmodelled, unrecorded, real-world things will be entirely ignored (perhaps to great detriment) because we simply lack a decent measure/verifiable-reward for it.
Recently it doesn't seem to be playing out as such. The current best LLMs I find marvelously impressive (despite their flaws), and yet... where are all the awesome robots? Why can't I buy a robot that loads my dishwasher for me?
Last year this really started to bug me, and after digging into it with some friends I think we collectively realized something that may be a hint at the answer.
As far as we know, it took roughly 100M-1B years to evolve human level "embodiment" (evolve from single celled organisms to human), but it only took around ~100k-1M for humanity to evolve language, knowledge transfer and abstract reasoning.
So it makes me wonder, is embodiment (advanced robotics) 1000x harder than LLMs from an information processing perspective?
I think it's a degrees of freedom question. Given the (relatively) low conditional entropy of natural language, there aren't actually that many degrees of (true) freedom. On the other hand, in the real world, there are massively more degrees of freedom both in general (3 dimensions, 6 degrees of movement per joint, M joints, continuous vs. discrete space, etc.) and also given the path dependence of actions, the non-standardized nature of actuators, actuators, kinematics, etc.
All in, you get crushed by the curse of dimensionality. Given N degrees of true freedom, you need O(exp(N)) data points to achieve the same performance. Folks do a bunch of clever things to address that dimensionality explosion, but I think the overly reductionist point still stands: although the real world is theoretically verifiable (and theoretically could produce infinite data), in practice we currently have exponentially less real-world data for an exponentially harder problem.
Real roboticists should chime in...
What I do have deep experience in is market abstractions and jobs to be done theory. There are so many ways to describe intent, and it's extremely hard to describe intent precisely. So in addition to all the dimensions you brought up that relate to physical space, there is also the hard problem of mapping user intent to action with minimal "error", especially since the errors can have big consequences in the physical world. In other words, the "intent space" also has many dimensions to it, far beyond what LLMs can currently handle.
On one end of the spectrum of consequences is the robot loads my dishwasher such that there is too much overlap and a bunch of the dishes don't get cleaned (what I really want is for the dishes to be clean, not for the dishes to be in the dishwasher), and on the other end we get the robot that overpowers humanity and turns the universe into paperclips.
So maybe we have to master LLMs and probably a whole other paradigm before robots can really be general purpose and useful.
Even the existence of most relationships in the physical world can only be inferred, never mind dimensionality. The correlations are often weak unless you are able to work with data sets that far exceed the entire corpus of all human text, and sometimes not even then. Language has relatively unambiguous structure that simply isn't the norm in real space-time data models. In some cases we can't unambiguously resolve causality and temporal ordering in the physical world. Human brains aren't fussed by this.
There is a powerful litmus test for things "AI" can do. Theoretically, indexing and learning are equivalent problems. There are many practical data models for which no scalable indexing algorithm exists in literature. This has an almost perfect overlap with data models that current AI tech is demonstrably incapable of learning. A company with novel AI tech that can learn a hard data model can demonstrate a zero-knowledge proof of capability by qualitatively improving indexing performance of said data models at scale.
Synthetic "world models" so thoroughly nerf the computer science problem that they won't translate to anything real.
In terms of "world building", it makes sense for the "world" to not be dreamed up by an AI, but to have hard deterministic limits to bump up against in training.
I guess what I mean is that humans in the world constantly face a lot of conditions that can lead to undefined behavior as well, but 99% of the time not falling on your face is good enough to get you a job washing dishes.
Even though the system rules and I/O are tightly constrained, they're still struggling to match human performance in an open-world scenario, after a gigantic R&D investment with a crystal clear path to return.
Fifteen years ago I thought that'd be a robustly solved problem by now. It's getting there, but I think I'll still need to invest in driving lessons for my teenage kids. Which is pretty annoying, honestly: expensive, dangerous for a newly qualified driver, and a massive waste of time that could be used for better things. (OK, track days and mountain passes are fun. 99% of driving is just boring, unnecessary suckage).
What's notable: AVs have vastly better sensors than humans, masses of compute, potentially 10X reaction speed. What they struggle with is nuance and complexity.
Also, AVs don't have to solve the exact same problems as a human driver. For example, parking lots: they don't need to figure out echelon parking or multi-storey lots, they can drop their passengers and drive somewhere else further away to park.
Text really have lot of degrees of freedom, but depends on language, and even more on type of alphabet - modern English with phonetic alphabet is worst choice, because it is simplest, nearly nobody use second-third hidden meaning (I hear about 2-3 to 5-6 meanings depending on source); hieroglyphic languages are much more information rich (10-22 meanings); and what is interest, phonetic languages in totalitarian countries (like Russian) are also much more rich (8-12 meanings), because they used to hide few meanings from government to avoid punishment.
Language difference (more dimensions) could be explanation of current achievements of China, superior to Western, and it could also be hint, on how to boost Western achievements - I mean, use more scientists from Eastern Europe and give more attention to Eastern European languages.
For 3D robots, I see only one way - computational simulated environment.
Is that where learning comes in? Any actual AGI machine will be able to learn. We should be able to buy a robot that comes ready to learn and we teach it all the things we want it to do. That might mean a lot of broken dishes at first, but it's about what you would expect if you were to ask a toddler to load your dishes into the dishwasher.
My personal bar for when we reach actual AGI is when it can be put in a robot body that can navigate our world, understand spatial relationships, and can learn from ordinary people.
Essentially, yes, but I would go further in saying that embodiment is harder than intelligence in and of itself.
I would argue that intelligence is a very simple and primitive mechanism compared to the evolved animal body, and the effectiveness of our own intelligence is circumstantial. We manage to dominate the world mainly by using brute force to simplify our environment and then maintaining and building systems on top of that simplified environment. If we didn't have the proper tools to selectively ablate our environment's complexity, the combinatorial explosion of factors would be too much to model and our intelligence would be of limited usefulness.
And that's what we see with LLMs: I think they model relatively faithfully what, say, separates humans from chimps, but it lacks the animal library of innate world understanding which is supposed to ground intellect and stop it from hallucinating nonsense. It's trained on human language, which is basically the shadows in Plato's cave. It's very good at tasks that operate in that shadow world, like writing emails, or programming, or writing trite stories, but most of our understanding of the world isn't encoded in language, except very very implicitly, which is not enough.
What trips us up here is that we find language-related tasks difficult, but that's likely because the ability evolved recently, not because they are intrinsically difficult (likewise, we find mental arithmetic difficult, but it not intrinsically so). As it turns out, language is simple. Programming is simple. I expect that logic and reasoning are also simple. The evolved animal primitives that actually interface with the real world, on the other hand, appear to be much more complicated (but time will tell).
We may have needed a billion years of evolution from a cell swimming around to a bipedal organism. But we are no longer speed limited by evolution. Is there any reason we couldn't teach a sufficiently intelligent disembodied mind the same physics and let it pick up where we left off?
I like the notion of the LLM's understanding being "shadows on the wall of Plato's cave metaphor," and language may be just that. But math and physics can describe the world much more precisely and, of you pair them with the linguistic descriptors, a wall shadow is not very different from what we perceive with out own senses and learn to navigate.
As for limits, in my opinion, there are a few limits human intelligence has that evolution doesn't. For example, intent is a double-edged sword: it is extremely effective if the environment can be accurately modelled and predicted, but if it can't be, it's useless. Intelligence is limited by chaos and the real world is chaotic: every little variation will eventually snowball into large scale consequences. "Eventually" is the key word here, as it takes time, and different systems have different sensitivities, but the point is that every measure has a half-life of sorts. It doesn't matter if you know the fundamentals of how physics work, it's not like you can simulate physics, using physics, faster than physics. Every model must be approximate and therefore has a finite horizon in which its predictions are valid. The question is how long. The better we are at controlling the environment so that it stays in a specific regime, the more effective we can be, but I don't think it's likely we can do this indefinitely. Eventually, chaos overpowers everything and nothing can be done.
Evolution, of course, having no intent, just does whatever it does, including things no intelligence would ever do because it could never prove to its satisfaction that it would help realize its intent.
Also, isn't that what's actually scary about AI, in a nutshell? The fact that it may radically simplify our world to facilitate e.g. paper clip production?
No, it's more about massive job losses and people left to float alone, mass increase in state control and surveillance, mass brain rot due to AI slop, and full deterioration of responsibility and services through automation and AI as a "responsibility shield".
No, but that's only a small part of what you need to model. It won't help you negotiate a plane-saturated airspace, or avoid missiles being shot at you, for example, but even that is still a small part. Navigation models won't help you with supply chains and acquiring the necessary energy and materials for maintenance. Many things can -- and will -- go wrong there.
> In other words, why are we assuming that agents cannot shape the world
I'm not assuming anything, sorry if I'm giving the wrong impression. They could. But the "shapability" of the world is an environment constraint, it isn't fully under the agent's control. To take the paper clipper example, it's not operating with the same constraints we are. For one, unlike us (notwithstanding our best efforts to do just that), it needs to "simplify" humanity. But humanity is a fast, powerful, reactive, unpredictable monster. We are harder to cut than trees. Could it cull us with a supervirus, or by destroying all oxygen, something like that? Maybe. But it's a big maybe. Such brute force takes requires a lot of resources, the acquisition of which is something else it has to do, and it has to maintain supply chains without accidentally sabotaging them by destroying too much.
So: yes. It's possible that it could do that. But it's not easy, especially if it has to "simplify" humans. And when we simplify, we use our animal intelligence quite a bit to create just the right shapes. An entity that doesn't have that has a handicap.
Meanwhile, entire civilizations in South America developed with little to no use of wheels, because the terrain was unsuited to roads.
Wheeled vehicles aren't inherently better in a natural environment unless they're more efficient economically than the alternatives: pack animals, people carrying cargo, boats, etc.
South America didn't have good draft animals and lots of Africa didn't have the proper economic incentives: Sahara had bad surfaces where camels were absolutely better than carts and sub Saharan Africa had climate, terrain, tsetse flies and whatnot that made standard pack animals economically inefficient.
Humans are smart and lazy, they will do the easiest thing that let's them achieve their goals. This sometimes leads them to local maxima. That's why many "obvious" inventions took thousands of years to create (cotton gin, for example).
You cannot separate the mind and the body. They are the same physiological and material entity. Trying anyway is of course classic western canon.
Nature didn't make decisions about anything.
But it also absolutely didn't "all evolved at the same time, in conjunction" (if by that you mean all features, regarding body and intelligence, at the same rate).
>You cannot separate the mind and the body. They are the same physiological and material entity
The substrate is. Doesn't mean the nature of abstract thinking is the same as the nature of the body, in the same way the software as algorithm is not the same as hardware, even if it can only run on hardware.
But to the point: this is not about separating the "mind and the body". It's about how you can have humanoid form and all the typical human body functions for millions of years before you get human level intelligence, after many later evolution.
>Trying anyway is of course classic western canon.
It's also classic eastern canon, and several others besides.
In this you are positing the existance of a _soul_ that exists separately from the body, and is portable amongst bodies. Analogues to how an algorithm (disembodied software) exists outside of the hardware and is portable amongst it (by embodying it as software).
I don't not agree with that at all, but it's impossible to know of you're right, but I can at least understand why you have a hard time with my argument and the east-west difference if tradition of the existance of a soul is that "obvious" to you.
The argument is that whether consciousness is independent of a specific body or not, it's still of a different nature.
The consciousness part uses the body (e.g. nerve system, neurons etc), but it's nature is the informational exchange and it's essense is not in the construction of the body as a physical machine (though that's its base), but in the stored "weights" encoding memories and world-knowledge.
Same how with a CPU a specific program it runs is not defined by the CPU but the memory contents (data and variables and logic code). It might as well run in an abstract CPU, or one made of water tubes or billiard balls.
Of course in our case, the consciousness runs on a body - and only a specific body - and can't exist without one (same way a program can't exist as a running program without a CPU). But it doesn't mean its of the same nature as the body - just that the body is its substrate.
Human level intelligence is, otoh, qualitatively and quantitatively a bigger deal.
We're better than most animals because we have tools. We have great tools because we have hands.
I think you and I are using different definitions of intelligence. I'm bought into Karl Friston's free energy principle and think it's intelligence all the way down. There is no separating embodiment and intelligence.
The LLM distinction is intelligence via symbols as opposed to embodied intelligence, which is why I really like your shadow world analogy. Without getting caught up in subtle differences in our ontologies, I agree wholeheartedly.
There are basically two approaches to defining intelligence, I think. You can either define it in terms of capability, in which case a system that has no intent and does not plan can be more intelligent than one that does, simply by virtue of being more effective. Or you can define it in terms of mechanism: something is intelligent if it operates in a specific way. But it may then turn out to be the case that some non-intelligent systems are more effective than some intelligent systems. Or you can do both and assume that there is some specific mechanism (human intelligence, conveniently) that is intrinsically better than the others, which is a mistake people commonly make and is the source of a lot of confusion.
I tend to go for the second approach because I think it's a more useful framing to talk about ourselves, but the first is also consistent. As long as we know what the other means.
In either case, the smallest unit of intelligence could be seen as a component of a two-field or particle interaction, where information is exchanged and an outcome is determined. Scaled up, these interactions generate emergent properties, and at each higher level of abstraction, new layers of intelligence appear that drive increasing complexity. Under such a view, a less intelligent system might still excel in a narrow domain, while a more intelligent system, effective across a broader range, might perform worse in that same narrow context.
Depending on the context of the conversation, I might go along with some cut-off on the scale, but I don't see why the scale isn't continuous. Maybe it has stacked s-curves though...
We just happen to exist at an interesting spot on the fractal that's currently the highest point we can see. So it makes sense we would start with our own intelligence as the idea of intelligence itself.
Ken Goldberg shows that getting robots to operate in the real world using methods that have been successful getting LLMs to do things we consider smart -- getting huge amounts of training data -- seems unlikely. The vastness between what little data a company like Physical Intelligence has vs what GPT-5 uses is shown here: https://drive.google.com/file/d/16DzKxYvRutTN7GBflRZj57WgsFN... 84 seconds
Ken advocates plenty of Good Old-Fashioned Engineering to help close this gap, and worries that demos like Optimus actually set the field back because expectations are set too high. Like the AI researchers who were shocked by LLMs' advances, it's possible something out of left field will close this training gap for robots. I think it'll be at least 5 more years before robots will be among us as useful in-house servants. We'll see if the LLM hype has spilled over too much into the humanoid robot domain soon enough.
That is surely the case on limited scopes. For example the non neural net chess engines are better at chess than any human.
I think that neural networks compare with human intelligence in a fair way, because we should limit their training to the number of games that human professionals can reasonably play in their life. Alphago won't be much good after playing, let's say, 10 thousand games even starting from the corpus of existing human games.
And yet whetever IQ you have, it can't make you just play the violin without actually having embodied practice first.
https://en.wikipedia.org/wiki/Allegory_of_the_cave
Also, other than in sculpture/dentistry/medicine I also find "ablation" to not be a particularly insightful metaphor either. Although I see ablation's application to LLMs I simply had to laugh when I first read about it: I envisioned starting with a Greyhound bus and blowing off parts until it was a Lotus 7 sports car!8-). Good luck with that! Kind of like fixing the TV set by kicking it (but it _does_ work sometimes!).
Perhaps we should refrain somewhat from applying metaphors/simile/allegories to describe LLMs relative to human intelligence unless they provide some insight of significant value.
Different people have different goals. You want some form of minimal bus and I want a Lotus 7. There's no guarantee either of us reach our goal.
Ablation is about disassembling something randomly, whether little by little or on an arbitrary scale until [SOMETHING INTERESTING OR DESIRABLE HAPPENS].
https://en.wikipedia.org/wiki/Ablation_(artificial_intellige...
Ablation is laughable but sometimes useful. It is also easy, mostly brainless, NOT guaranteed to provide any useful information (so you've an excuse for the wasted resources), and occasionally provides insight. It's a good tool for software engineers who have no (or seek no) understanding of their system, so I think of ablation as a "last resort" solutions (e.g., another being to randomly modify code until it "works") that I disdain.
But I'm old so I'm probably wrong! Burn those CPU towers down, boys and girls!
Anything can be uninteresting and uninformative when one doesn't see it's interestingness or can't grok its information.
It however stood for millenia as a great device to describe multiple layers of abstractions, deeper reality vs appearance, and so on, with utility as such in countless domains.
coldtea says "...with utility as such in countless domains." So when's the last time you referred to the "Allegory of the cave" in your day, other than on HN?
Several times. But it was with broadly educated people, not over-specialized one-dimensional ones.
This is very interesting and I feel there is a lot to unpack here. Could you elaborate on this theory with a few more paragraphs (or books / blogs that elucidate this)? In what ways do we use brute force to simplify the environment, and are there not ways in which we use highly sophisticated leveraged methods to simplify our environment tools? What proper tools allow us to selectively ablate complexity? Why does our intelligence only operate on simplified forms?
Also, what would convince you that symbolic intelligence is actually “harder” than embodied intelligence? To me the natural test is how hard it is for each one to create the other. We know it took a few billion years to go from embodied intelligence (ie organisms that can undergo evolution, with enough diversity to survive nearly any conditions on Earth) to sophisticated symbolic intelligence. What if it turns out that within 100 years, symbolic intelligence (contained in LLM like systems) could produce the insights to eg create new synthetic life from scratch that was capable of undergoing self-sustained evolution in diverse and chaotic environments? Would this convince you that actually symbolic intelligence is the harder problem?
A. instead of building a house on random terrain with random materials, first we prefer to flatten the place, then we use standard materials (e.g. bricks), which were produced from simple source (e.g. large and relatively homogenous deposit of clay).
B. For mental tasks it’s usual to said, that a person can handle only 7 items at a time (if you disagree multiply by 2-3). But when you ride a bike you process more inputs at the same time (you hear a car behind you, you see person on the right, you feel your balance, you anticipate your direction, if you feel strong wind or sun on your face you probably squint your eyes, you take a breath of air. On top of that all the processes of your body adjust and support your riding: heart, liver, stomach…)
C. “Spherical cows” in physics. (Google this if needed)
Part of the issue with discussing this is that our understanding of complexity is subjective and adapted to our own capabilities. But the gist of it is that the difficulty of modelling and predicting the behavior of a system scales very sharply with its complexity. At the end of the scale, chaotic systems are basically unintelligible. Since modelling is the bread and butter of intelligence, any action that makes the environment more predictable has outsized utility. Someone else gave pretty good examples, but I think it's generally obvious when you observe how "symbolic-smart" people think (engineers, rationalists, autistic people, etc.) They try to remove as many uncontrolled sources of complexity as possible. And they will rage against those that cannot be removed, if they don't flat out pretend they don't exist. Because in order to realize their goals, they need to prove things about these systems, and it doesn't take much before that becomes intractable.
One example of a system that I suspect to be intractable is human society itself. It is made out of intelligent entities, but as a whole I don't think it is intelligent, or that it has any overarching intent. It is insanely complex, however, and our attempts to model its behavior do not exactly have a good record. We can certainly model what would happen if everybody did this or that (aka a simpler humanity), but everybody doesn't do this and that, so that's moot. I think it's an illuminating example of the limitations of symbolic intelligence: we can create technology (simple), but we have absolutely no idea what the long term consequences are (complex). Even when we do, we can't do anything about it. The system is too strong, it's like trying to flatten the tides.
> To me the natural test is how hard it is for each one to create the other.
I don't think so. We already observe that humans, the quintessential symbolic intelligences, have created symbolic intelligence before embodied intelligence. In and of itself, that's a compelling data point that embodied is harder. And it appears likely that if LLMs were tasked to create symbolic intelligences, even assuming no access to previous research, they would recreate themselves faster than they would create embodied intelligences. Possibly they would do so faster than evolution, but I don't see why that matters, if they also happen to recreate symbolic intelligence even faster than that. In other words, if symbolic is harder... how the hell did we get there so quick? You see what I mean? It doesn't add up.
On a related note, I'd like to point out an additional subtlety regarding intelligence. Intelligence (unlike, say, evolution) has goals and it creates things to further these goals. So you create a new synthetic life. That's cool. But do you control it? Does it realize your intent? That's the hard part. That's the chief limitation of intelligence. Creating stuff that is provably aligned with your goals. If you don't care what happens, sure, you can copy evolution, you can copy other methods, you can create literally anything, perhaps very quickly, but that's... not smart. If we create synthetic life that eats the universe, that's not an achievement, that's a failure mode. (And if it faithfully realizes our intent then yeah I'm impressed.)
Compare the economics of purely cognitive AI to in-world robotics AI.
Pure cognitive: Massive scale systems for fast, frictionless and incredibly efficient cognitive system deployment and distribution of benefits are solved. On tap even. Cloud computing and the Internet.
What is the amortized cost per task? Almost nothing.
In-world: The cost of extracting raw resources, parts chain, material process chain, manufacturing, distributing, maintaining, etc.
Then what is the amortized cost per task, for one robot?
Several orders of magnitude more expensive, per task! There is no comparison.
Doing that profitably isn’t going to be the norm for many years.
At what price does a kitchen robot make sense? Not at $1,000,000. “Only $100,000?” “Only $25,000? “Only $10k”? Lower than that?
Compared to a Claude plan? That many people still turn down just to use free tier?
Long before general house helper robots makes any economic sense, we will have had walking talking, socializing, profitable-to-build sex robots at higher price points for price insensitive owners.
There are people who will pay high prices for that, when costs come down.
That will be the canary for general robotic servants or helpers.
The cost isn’t intelligence. There isn’t a particular challenge with in-world information processing and control. It’s the cost of the physical thing that processing happens in.
This is a purely economic problem. Not an AI problem at all.
There is a lot of high quality text from diverse domains, there's a lot of audio or images or videos around. The largest robotics datasets are absolutely pathetic in size compared to that. We didn't collect or stockpile the right data in advance. Embodiment may be hard by itself, but doing embodiment in this data-barren wasteland is living hell.
So you throw everything but the kitchen sink at the problem. You pre-train on non-robotics data to squeeze transfer learning for all its worth, you run hard sims, a hundred flavors of data augmentation, you get hardware and set up actual warehouses with test benches where robots try their hand at specific tasks to collect more data.
And all of that combined only gets you to "meh" real world performance - slow, flaky, fairly brittle, and on relatively narrow tasks. Often good enough for an impressive demo, but not good enough to replace human workers yet.
There's a reason why a lot of those bleeding edge AI powered robots are designed for and ship with either teleoperation capabilities, or demonstration-replay capabilities. Companies that are doing this hope to start pushing units first, and then use human operators to start building up some of the "real world" datasets they need to actually train those robots to be more capable of autonomous operation.
Having to deal with Capital H Hardware is the big non-AI issue. You can push ChatGPT to 100 million devices, as long as you have a product people want to use for the price of "free", and the GPUs to deal with inference demand. You can't materialize 100 million actual physical robot bodies out of nowhere for free, GPUs or no GPUs. Scaling up is hard and expensive.
Sounds like LLMs to me.
I don't think further improvements are impossible, not at all. They're just hard to get at.
We did.
Like, to the point that the AI that radically impacted blue collar work isn't even part of what is considered “AI” any more.
There are endless corners of the physical world right now where it's not worth automating a task if you need to assign an engineer and develop a software competency as a manufacturing or retail company, but would absolutely be worth it if you had a generalizable model that you could point-and-shoot at them.
You need a fairly robust one that needs little maintenance, with a multitude of good sensors and precise actuators to be even remotely useful for sufficiently wide range of tasks (so that you have economy of scales). None of that comes cheap.
The rising tide idea came from a 1997 paper by Moravec. Here's a nice graphic and subsequent history https://lifearchitect.ai/flood/
Interestingly, Moravec also stated: "When the highest peaks are covered, there will be machines than can interact as intelligently as any human on any subject. The presence of minds in machines will then become self−evident." We pretty much have those today so by 1997 standards, machines have minds, yet somehow we moved the goalposts and decided that doesn't count anymore. Even if LLMs end up being strictly more capable than every human on every subject, I'm sure we'll find some new excuse why they don't have minds or aren't really intelligent.
> We pretty much have those today so by 1997 standards, machines have minds, yet somehow we moved the goalposts and decided that doesn't count anymore
What you describe as "moving the goalposts" could also just be explained as simply not meeting the standard of "as intelligently as any human on any subject".
Even in the strongest possible example of LLM's strengths applying their encyclopedic knowledge and (more limited) ability to apply that knowledge for a given subject I don't think they meet that bar. Especially if we're comparing to a human over a time period greater than 30 minutes or so.
The moment you strip away the magical thinking, the humanization (bugs not hallucinations) what you realize is that this is just progress. Ford in the 1960's putting in the first robot arms vs auto manufacturing today. The phone: from switch board operators, to mechanical switching to digital to... (I think phone is in some odd hybrid era with text but only time will tell). Draftsmen in the 1970's all replaced by autocad by the 90's. GO further back to 1920, 30 percent of Americans were farmers, today thats less than 2.
Humans, on very human scales are very good at finding all new ways of making ourselves "busy" and "productive".
But most of these solutions were more crude than they let on, and you wouldn't really know unless you were working in AI already.
Watch John Carmack's recent talk at Upper Bound if you want him to see him destroy like a trillion dollars worth of AI hype.
https://m.youtube.com/watch?v=rQ-An5bhkrs&t=11303s&pp=2AGnWJ...
Spoiler: we're nowhere close to AGI
I'm honestly relieved that one of the brightest minds in computing, with all the resources and desire to create actual super-intelligences, has had to temper hard his expectations.
Same with LLMs. Despite having seen this play out before, and being aware of this, people are falling for it again.
My prediction is a new player will come in who vertically integrates these currently disjoint industries and product. The tableware used should be compatible with the dishwasher, the packaging of my groceries should be compatible with the cooking system. Like a mini-factory.
But current vendors have no financial incentive to do so, because if you take a step back the whole notion of putting one room of your apartment full with random electronics just to cook a meal once in a blue moon is deeply inefficient. End-to-end food automation is coming to the restaurant business, and I hope it pushes prices of meals so far down that having a dedicated room for a kitchen in the apartment is simply not worth it.
That's the "utopia" version of things.
In reality, we see prices for fast food (the most automated food business) going up while quality is going down. Does it make the established players more vulnerable to disruption? I think so.
Not in competition with trash food but with proper food and local ingredients.
You don't use your kitchen? After the rooms we sleep in, the kitchen is probably the most used space in my home. We are planning an upcoming renovation of our home and the kitchen is where we plan on spending the most money.
> The tableware used should be compatible with the dishwasher
Aside from non-dishwasher safe items, what tableware is incompatible with a dishwasher?
With the "tableware" argument I meant something like a standardized (magnetic?) adapter for grabbing plates, forks and knives so they can easily be moved by machines/robots.
I feel a company like Ikea is perfectly set up to make this idea a reality, but they'll never do so because they make much more money when every single household buys all these appliances and items for their own kitchen.
Just from the perspective of a single household in a densely populated city I think it'd be nice to have freshly cooked, reproducibly prepared meals with high-quality ingredients available to me. Like an automated soup kitchen with cleanup. Without all the layers of plastic wrapping needed to move produce from large-scale distributors into single-household fridges and so on.
I'm guessing people mostly overspend on kitchens as well. When our renovation happens, I'm sure we will and I'll feel pretty good about it.
For cars and kitchens, utilization considerations seem to be ranked way, way below things like comfort and convenience and beauty.
Look at how hard it is for us to make reliable laptop hinges or the articulated car door handle trend (started by Tesla) where they constantly break.
These are simple mechanisms compared to any animal or human body. Our bodies last up to 80-100 years through not just constant regeneration but organic super-materials that rival anything synthetic in terms of durability within its spec range. Nature is full of this, like spider silk much stronger than steel or joints that can take repeated impacts for decades. This is what hundreds of millions to billions of years of evolution gets you.
We can build robots this good but they are expensive, so expensive that just hiring someone to do it manually is cheaper. So the problem is that good quality robots are still much more expensive than human labor.
The only areas where robots have replaced human labor is where the economics work, like huge volume manufacturing, or where humans can’t easily go or can’t perform. The latter includes tasks like lifting and moving things thousands of times larger than humans can or environments like high temperatures, deep space, the bottom of the ocean, radioactive environments, etc.
I'm not sure where people get this impression from, even back decades ago. Hardware is always harder than software. We had chess engines in the 20th century but a robotic hand that could move pieces? That was obviously not as easy because dealing with the physical world always has issues that dealing with the virtual doesn't.
No one seems to be working on building an AI model that understands, to any real degree, what it's saying or what it's creating. Without this, I don't see how they can even get to AGI.
> this isn't really about whether scaling is "dead"
I think there's a good position paper by Sara Hooker[0] that mentions some of this. Key point being that while the frontier is being pushed by big models with big data there's a very quiet revolution of models using far fewer parameters (still quite big) and data. Maybe "Scale Is All You Need"[1], but that doesn't mean it is practical or even a good approach. It's a shame these research paths have gotten a lot of pushback, especially given today's concerns about inference costs (this pushback still doesn't seem to be decreasing) > verifiable rewards
There's also a current conversation in the community over world models: is it actually a world model if the model does not recover /a physics/[2]. The argument for why they should recover a physics is that this means a counterfactual model must have been learned (no guarantees on if it is computationally irreducible). A counterfactual model gives far greater opportunities for robust generalization. In fact, you could even argue that the study of physics is the study of compression. In a sense, physics is the study of the computability of our universe[3]. Physics is counterfactual, allowing you to answer counterfactual questions like "What would the force have been if the mass had been 10x greater?" If this were not counterfactual we'd require different algorithms for different cases.I'm in the recovery camp. Honestly I haven't heard a strong argument against it. Mostly "we just care that things work" which, frankly, isn't that the primary concern of all of us? I'm all for throwing shit at a wall and seeing what sticks, it can be a really efficient method sometimes (especially in early exploratory phases), but I doubt it is the most efficient way forward.
In my experience, having been a person who's created models that require magnitudes fewer resources for equivalent performance, I cannot stress enough the importance of quality over quantity. The tricky part is defining that quality.
[0] https://arxiv.org/abs/2407.05694
[1] Personally, I'm unconvinced. Despite success of our LLMs it's difficult to decouple other variables.
[2] The "a" is important here. There's not one physics per-say. There are different models. This is a level of metaphysics most people will not encounter and has many subtleties.
[3] I must stress that there's a huge difference between the universe being computable and the universe being a computation. The universe being computable does not mean we all live in a simulation.
I wonder if this doesn't reify a particular business model, of creating a general model and then renting it out Saas-style (possibly adapted to largish customers).
It reminds me of the early excitement over mainframes, how their applications were limited by the rarity of access, and how vigorously those trained in those fine arts defended their superiority. They just couldn't compete with the hordes of smaller competitors getting into every niche.
It may instead be that customer data and use cases are both the most relevant and the most profitable. An AI that could adopt a small user model and track and apply user use cases would have entirely different structure, and would have demonstrable price/performance ratios.
This could mean if Apple or Google actually integrated AI into their devices, they could have a decisive advantage. Or perhaps there's a next generation of web applications that model use-cases and interactions. Indeed, Cursor and other IDE companies might have a leg up if they can drive towards modeling the context instead of just feeding it as intention to the generative LLM.
Wouldn't the Bitter Lesson be to invest in those models over trying to be clever about ekeing out a little more oomph from today's language models (and langue-based data)?
Do you mean challenges for which the answer is known?
If I'm right (which I give a 50/50 odds to), and we can reduce the power of LLM computation by 95%, trillions can be saved in power bills, and we can break the need for Nvidia or other specialists, and get back to general purpose computation.
I have significant experience on modelling physical world (mostly CFD, but also gamedev - with realistic rigid body collisions and friction).
I admit, exists domain (spectrum of parameters), where CFD and game physics working just well; exists predictable domain (on borders of well working domain), where CFD and game physics working good enough but could show strange things, and exists domain, where you will see lot of bugs.
And, current computing power is so much, that even on small business level (just median gamer desktop), we could save on more than 90% real-world tests with simulations in well working domain (and just avoid use cases in unreliable domains).
So I think, most question is just conservative bosses and investors, who don't believe to engineers and don't understand how to do checks (and tuning) of simulations with real world tests, and what reliable domain is.
Does synthetic data count? What about making several more passes through already available data?
But that isn't how investors operate. They want to know what they will get in exchange for giving a company a billion dollars. If you're running an AI business, you need to set expectations. How do you do that? Go do the thing you know you can do on a schedule, like standing up a new GPU data center.
I don't think the bitter lesson is misunderstood in quite the way the author describes. I think most are well aware we're approaching the data wall within a couple years. However, if you're not in academia you're not trying to solve that problem; you're trying to get your bag before it happens.
That may sound a little flip, but this is yet another incarnation of the hungry beast: https://stvp.stanford.edu/clips/the-hungry-beast-and-the-ugl...
The very existence of openAI and Anthropic are proof of it happening.
Imagine you were an investor and you know what you know now (creativity can’t be predicted). How would you then invest in companies? Your answer might converge on existing VC strategies.
This is why I think China will ultimately win the AI race, they will be able to put tens of millions of people to a specific task until there is enough data generated to replace humans on that task in 99.99% of cases, and they have the manufacturing capability to make the millions of IO devices needed for this.
Yes, humanoid robots are a good idea, but only if you can train them with walking data from real people, I think it will probably translate well enough to most humanoid robots, but ideally you are designing the physical robot from the ground up to model human movement as close as possible. You have to accept that if we go the LM route for AI that the optimal hardware behaves like human wetware. The neuromorphic computing people get it, robotics people should too.
Legal AI would be easy if we made our legal code more robust
I don't know about that. LLMs have been trained mostly on text. If you add photos, audio and videos, and later even 3D games, or 3D videos, you get massively more data than the old plain text. Maybe by many orders of magnitude. And this is certainly that can improve cognition in general. Getting to AGI without audio and video, and 3D perception seems like a non-starter. And even if we think AGI is not the goal, further improvements from these new training datasets are certainly conceivable.
darknets, the deep web, Usenet, BBS, Internet2, and all other paywalled archives .
The AI companies are not only out of such data but their access to it is shrinking as the people who control the hosting sites wall them off (like YouTube).
I think the next steps will be more along this vain of thinking. Treating all training data the same is a mistake. Some data is significantly more valuable to developing an intelligent model than most other training data, even when you pass quality filters. I think we need to revisit how we 'train' these models in the first place, and come up with a more intelligent/interactive system of doing so
Further that the order of training matters is novel to me and seems so obvious in hindsight.
Maybe both of these points are common knowledge/practice among current leading LLM builders. I don't build LLMs, I build on and with them, so I don't know.
The author fundamentally misunderstands the bitter lesson.
[0] https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...
A true general method wouldn't rely on humans at all! Human data would be worthless beyond bootstrapping!
There are teenagers that win gold medals on the math olympiad - they've trained on < 1M tokens of math texts, never mind the 70T tokens that GPT5 appears to be trained on. A difference of eight orders of magnitude.
In other words, data scarcity is not a fundamental problem, just a problem for the current paradigm.
Human students who only learned some new words but have not (yet) even began to really comprehend a subject will just throw around random words and sentences that sound great but have no basis in reality too.
For the same sentence, for example, "We need to open a new factory in country XY", the internal model lighting up inside the brain of someone who has actually participated when this was done previously will be much deeper and larger than that of someone who only heard about it in their course work. That same depth is zero for an LLM, which only knows the relations between words and has no representation of the world. Words alone cannot even begin to represent what the model created from the real-world sensors' data, which on top of the direct input is also based on many times compounded and already-internalized prior models (nobody establishes that new factory as a newly born baby with a fresh neural net, actually, even the newly born has inherited instincts that are all based on accumulated real world experiences, including the complex very structure of the brain).
Somewhat similarly, situations reported in comments like this one (client or manager vastly underestimating the effort required to do something): https://news.ycombinator.com/item?id=45123810 The internal model for a task of those far removed from actually doing it is very small compared to the internal models of those doing the work, so trying to gauge required effort falls short spectacularly if they also don't have the awareness.
If we can reduce the precision of the model parameters by 2~32x without much perceptible drop in performance, we are clearly dealing with something wildly inefficient.
I'm open to the possibility that over parameterization is essential as part of the training process, much like how MSAA/SSAA over sample the frame buffer to reduce information aliasing in the final scaled result (also wildly inefficient but very effective generally). However, I think for more exotic architectures (spiking / time domain) these rules don't work the same way. You can't back propagate a recurrent SNN so much of the prevailing machine learning mindset doesn't even apply.
And don't forget the noise. If you look at the Anthropic papers it's clear from the examples they give that the dataset is still incredibly noisy even after extensive cleaning efforts. A lot of those parameters are being wasted trying to predict garbage outputs from HTML scraping gone wrong.
Somewhat apples and oranges given billions of years of evolution behind that human. GPT-5 started off as a blank slate.
"How could a telescope see saturn, human eyes have billions of years of evolution behind them, and we only made telescopes a few hundred years ago, so they should be much weaker than eyes"
"How can StockFish play chess better than a human, the human brain has had billions of years of evolution"
Evolution is random, slow, and does not mean we arrive at even a local optima.
Also consider that during training LLMs spend much less time on processing, say, TAOCP (Knuth), or SICP (Abelson, Sussman, and Sussman), or Probability Theory (Jaynes) than on the entirety that is r/Frugal.
20 thick books turn a smart teenager into a graduate with a MSc. That's what, 10 million tokens?
When we read difficult, important texts, we reflect on them, make exercises, discuss them, etc. We don't know how to make an LLM do that in a way that improves it. Yet.
But it's indeed apples and oranges. There's no good way to estimate the information encoded by the GPT architecture compared to human DNA. We just have to be empirical and look at what the thing can do.
My brain only needs to get mugged in a dark alley by a guy in a hoodie once to learn something.
Path #2 in TFA.
People are constantly inputting novel data, telling ChatGPT about mistakes it made and suggesting approaches to try, and so on.
For local tools, like claude code, it feels like there's an even bigger goldmine of data in that you can have a user ask claude code to do something, and when it fails they do it themselves... and then if only anthropic could slurp up the human-produced correct solution, that would be high quality training data.
I know paid claude-code doesn't slurp up local code, and my impression is paid ChatGPT also doesn't use input for training... but perhaps that's the next thing to compromise on in the quest for more data.
The data scrapped from the Internet and scanned books served its purpose: it bootstrapped something that we all love talking to and discussing ANYTHING with. That's the new source of data and intelligence.
We all? Speak for yourself, dude
I'm into advaita vedanta, priority cosmopsychism, open individualism
A baby's brain isn't wired to the entire internet. A 2-year-old has access to at most 2 years of HD video data, plus some other belly-ache and poo-smell stimuli. And a baby's brain has no replay capacity.
That's not a lot to work with.
Yet, a 2-year-old clearly thinks, is conscious, can understand and create sentences, and wants to annihilate everything just as much as Grok.
Sure you can scale data all you want. But there should be enough to work with without scaling like crazy.
Having AI know all CSS tricks out there is one thing that requires a lot of data, AGI is different.
Generating more training data from the same original data should not be fundamentally problematic in that sense.
FloorEgg•2d ago
The opportunity in the market is the gap between what people have been doing and what they are trying to do, and I have developed very specialized approaches to narrow this gap in my niche, and so far customers are loving it.
I seriously doubt that the gap could ever be closed by throwing more data and compute at it. I imagine though that the outputs of my approach could be used to train a base model to close the gap at a lower unit cost, but I am skeptical that it would be economically worth while anytime soon.
stego-tech•2d ago
Throwing more compute and data at the problem won’t magically manifest AGI. To reach those lofty heights, we must first address the gaping wounds holding us back.
FloorEgg•2d ago
Instead I developed a UX that made it as easy as possible for people to explain what they want to be done, and a system that then goes and does that. Then we compare the system's output to their historical data and there is always variance, and when the customer inspects the variance they realize that their data was wrong and the system's output is far more accurate and precise than their process (and ~3 orders of magnitude cheaper). This is around when they ask how they can buy it.
This is the difference between making what people actually want and what they say they want: it's untangling the why from the how.
marlott•2d ago
FloorEgg•2d ago
There are multiple long-form text inputs, one set is provided by User A, and another set by User B. User A inputs act as a prompt for User B, and then User A analyzes User B's input according to the original User A inputs, producing an output.
My system takes User A and B inputs and produces the output with more accuracy and precision than User As do, but a wide margin.
Instead of trying to train a model on all the history of these inputs and outputs, the solution was a combination of goal->job->task breakdown (like a fixed agentic process), and lots of context and prompt engineering. I then test against customer legacy samples, and inspect any variances by hand. At first the variances were usually system errors, which informed improvements to context and prompt engineering, and after working through about a thousand of these (test -> inspect variance -> if system mistake improve system -> repeat) iterations, and benefiting from a couple base-model upgrades, the variances are now about 99.9% user error (bad historical data or user inputs) and 0.1% system error. Overall it took about 9 months to build, and this one niche is worth ~$30m a year revenue easy, and everywhere I look there are market niches like this... it's ridiculous. (and a basic chat interface like ChatGPT doesn't work for these types of problems, no matter how smart it gets, for a variety of reasons)
So to summarize:
Instead of training a model on the historical inputs and outputs, the solution was to use the best base model LLMs, a pre-determined agentic flow, thoughtful system prompt and context engineering, and an iterative testing process with a human in the loop (me) to refine the overall system by carefully comparing the variances between system outputs and historical customer input/output samples.
marlott•2d ago
Workaccount2•2d ago
PLenz•2d ago
jandrewrogers•2d ago
mediaman•2d ago
incompatible•2d ago
simianwords•2d ago
We still got pretty far by scraping internet data which we all know is not fully trust worthy.