I was reading lots of scifi in 1977, so I may have tried to talk to the pi like Scotty trying to talk to the mouse in Star Trek IV. And since you can run an LLM and text to speech on an RPi5, it might have answered.
Also, my friend's father in the 80s was the driver of a French Consulate's member in Turkey. His car (a Renault) had speech functionality.
SAM had a basic mode where you just type English, but it also had an advanced phonetic input mode where you could control the sound and stress on every syllable. My favorite thing to do was try to give SAM a British accent.
Yes, and Windows had Narrator. And that's all. Since 20 years.
https://www.autoweek.com/car-life/but-wait-theres-more/a1875...
It would have been a huge deal not being able to drive manuals 20y ago but hybrid and ev all being automatic it is not that much of a downside nowadays unless you want to buy old cars or borrow friend's car. Most renting fleets have autos available nowadays.
Electric vehicles do not have gearboxes as there are no converters, so there is nothing to shift up or down. A few performance EV's that have been announced (and maybe even have released) with a gear stick, do so for nostalgic reasons and the gear shift + the accompanying experience is simulated entirely in the software.
I certainly don't plan to buy anything but hybrids, until EV prices and ranges are at comparable levels.
Our current hybrid has a six gear box.
I'd really like to buy that car so I await your response.
The reason I cannot do this today is laws, not technology. My 2c.
Given that there is no system out there that I can own, jump in the back of in no condition to drive, and get to my destination safely defeats that claim. It's not even so mundane that everyone has the anemic Tesla self-driving feature that runs over kids and slams into highway barriers.
It may also be a matter of laws, but the underlying tech is also still not there given all the warnings any current "self driving car" systems give about having to pay attention to the road and keep your hands on the wheel even if the laws weren't there.
Could I get behind the wheel of my self driving car, drunk, and make it there safely? No, I definitely couldn't, and I understand why those laws exist with all of the existing failure modes of self driving cars.
People have called the current state of LLMs "sparkling AutoComplete". The current state of "self-driving cars" is "sparkling lane assist" with a chaser of adaptive cruise control.
I don't need my car to constantly whine, needle, harass, demean, and insult me.
My response was: "you obviously haven't used one yet"...
They were already bossy enough... 'MAKE A LEGAL U-TURN NOW!'"
Maybe in 100 years. The talking car was more intelligent than Siri, Alexa or Hey Google.
It is not that we are not able to "talk" to computers, it is that we "talk" with computers only so that they can collect more data about us. Their "intelligence" is limited to simple text underestanding.
We're talking about Google Gemini or ChatGPT.
A modest PDP-11/34 cluster with AP-120 vector coprocessors might even have served as a cheaper pathfinder in the late 70s for labs and companies who couldn't afford a Cray 1 and its infrastructure.
But we lacked both the data and the concepts. Massive, curated datasets (and backpropagation!) weren’t even a thing until the late 80s or 90s. And even then, they ran on far less powerful hardware than the Crays. Ideas and concepts were the limiting factor, not the hardware.
A "small Large Language Model", you say? So a "Language Model"? ;-)
> Such an LLM could have handled grammar and code autocompletion, basic linting, or documentation queries and summarization.
No, not even close. You're off by 3 orders of magnitude if you want even the most basic text understanding, 4 OOM if you want anything slightly more complex (like code autocompletion), and 5–6 OOM for good speech recognition and generation. Hardware was very much a limiting factor.
https://www.tomshardware.com/tech-industry/artificial-intell...
John Carmack was also hinting at this: we might have had AI decades earlier, obviously not large GPT-4 models but useful language reasoning at a small scale was possible. The hardware wasn't that far off. The software and incentives were.
50 token/s is completely useless if the tokens themselves are useless. Just look at the "story" generated by the model presented in your link: Each individual sentence is somewhat grammatically correct, but they have next to nothing to do with each other, they make absolutely no sense. Take this, for example:
"I lost my broken broke in my cold rock. It is okay, you can't."
Good luck tuning this for turn-based conversations, let alone for solving any practical task. This model is so restricted that you couldn't even benchmark its performance, because it wouldn't be able to follow the simplest of instructions.
Even at that small scale, you can already do useful things like basic code or text autocompletion, and with a few million parameters on a machine like a Cray Y-MP, you could reasonably attempt tasks like summarizing structured or technical documentation. It's constrained in scope, granted, but it's a solid proof of concept.
The fact that a functioning language model runs at all on a Pentium II, with resources not far off from a 1982 Cray X-MP, is the whole point: we weren’t held back by hardware, we were held back by ideas.
Llama 3 8B took 1.3M hours to train in a H100-80GB.
Of course, it didn't took 1.3M hours (~150 years). So, many machines with 80GB were used.
Let's do some napkin math. 150 machines with a total of 12TB VRAM for a year.
So, what would be needed to train a 300K parameter model that runs on 128MB RAM? Definitely more, much more than 128MB RAM.
Llama 3 runs on 16GB VRAM. Let's imagine that's our Pentium II of today. You need at least 750 times what is needed to run it in order to train it. So, you would have needed ~100GB RAM back then, running for a full year, to get that 300K model.
How many computers with 100GB+ RAM do you think existed in 1997?
Also, I only did RAM. You also need raw processing power and massive amounts of training data.
We simply weren’t looking, blinded by symbolic programming and expert systems. This could have been a wake-up call, steering AI research in a completely different direction and accelerating progress by decades. That’s the whole point.
See how silly it is?
Now, focus on the simple question. How would you train the 300K model in 1997? To run it, you someone to train it first.
Backprop was known. Data was available. Narrow tasks (completion, summarization, categorization) were relevant. The model that runs on a Pentium II could have been trained on a Cray, or across time on any reasonably powerful 90s workstation. That’s not fantasy, LeNet 5 with its 65K weight was trained on a mere Sun station in the early 90s.
The limiting factor wasn’t compute, it was the conceptual framing as well as the datasets. No one seriously tried, because the field was dominated by symbolic logic and rule-based AI. That’s the core of the argument.
My dude, you came up with the Wright brothers comparison, not me. If you don't like fallacies, don't use them.
> on any reasonably powerful 90s workstation
https://hal.science/hal-03926082/document
Quoting the paper now:
> In 1989 a recognizer as complex as LeNet-5 would have required several weeks’ training and more data than were available and was therefore not even considered.
Their own words seem to match my assessment.
Training time and data availability determined how much this whole thing could advance, and researchers were aware of those limits.
Given an estimate of 6 FLOPs per token per parameter, training a 7B parameter model would require about 1.26×10^22 FLOPs. That translates to roughly 500 000 years on an 800 MFLOPS X-MP, far too long to be feasible. Training a 100M parameter model would still take nearly 70 years.
However, a 7M-parameter model would only have required about six months of training, and a 14M one about a year, so let’s settle on 10 million. That’s already far more reasonable than the 300K model I mentioned earlier.
Moreover, a 10M parameter model would have been far from useless. It could have performed decent summarization, categorization, basic code autocompletion, and even powered a simple chatbot with a short context, all that in 1984, which would have been pure sci-fi back in those days. And pretty snappy too, maybe around 10 tokens per second if not a little more.
Too bad we lacked the datasets and the concepts...
I have a Raspberry Pi in a translucent "modular case" from the PiHut.
* https://thepihut.com/products/modular-raspberry-pi-4-case-cl...
It is very close to the same size and appearance as the "key" for Orac in Blake's 7.
I have so far resisted the temptation to slap it on top of a Really Useful Box and play the buzzing noise.
* https://youtube.com/watch?v=XOd1WkUcRzY
Obviously not even Avon figured out that the main box of Orac was a distraction, a fancy base station to hold the power supply, WiFi antenna, GPS receiver, and some Christmas tree lights, and all of the computational power was really in the activation key.
The amusing thing is that that is not the only 1970s SciFi telly prop that could become almost real today. It shouldn't be hard -- all of the components exist -- to make an actual Space 1999 commlock; not just a good impression of one, but a functioning one that could do teleconferencing over a LAN, IR control for doors and tellies and stuff, and remote computer access.
Not quite in time for 1999, alas. (-:
Oh come off it now. This could have been just a good blog post that didn't make me want to throw my phone across the room. GenAI is a hell of a drug. It's shocking how many technical professionals fall into the hype and become irrationally exuberant.
The upper-class "trust me bro"
And some problems are even more complex.
My father spent his career on researching coil forms for Stellerator fusion reactors. Finding the shapes for their experiments then was a huge computational problem using then-state of the art machines (incl. cray for a while) and even today's computing power isn't there, yet.
Other problems we now solve regularly on our phones ...
Cray 1 costs US$7.9 million in 1977 (equivalent to $41 million in 2024) (Source: Wikipedia)
I have no idea what IBM z-series mainframes cost but I think it would be less.
$41 million can buy you one or more thousands of rack-mounted servers and the associated networking hardware.
My rough guess would be the difference in 2024 iphones to mainframes is an order of magnitude more between them than Cray and anything else on the market at the time.
It’s also interesting to note how much software has changed. The actual machine code may be less optimized, but we have better algorithms and we have the option of using vast amounts of memory and disk to save cpu time. And that’s before we get into specialized hardware.
Supercomputers were and are beasts of not only computation but memory size and bandwidth. They're used for tasks where the computation is highly parallel but the memory is not. If you're doing nuclear physics or fluid dynamics every particle in a simulation has some influence on every other. The more particles and more state for each particle you can store and apply to every other particle makes for a more accurate simulation.
As SCs have improved in memory size and bandwidth simulations/modeling with them has gotten more accurate and more useful.
zSeries: z = "zero" downtime
iSeries: i = "integration" (DB2 baked into OS)
pSeries: p = "performance"...software... well, that's a different story.
While a cray could compute millions of things and did a bunch of usable stuff for many groups of people who used it back then, a raspberrypi today has trouble even properly displaying a weather forecast at "acceptable speeds", because modern software has become very bloated, and that includes weather forecast sites that somehow have to include autoplaying video, usually an ad.
I'm sure when the Cray 1 came out, access to it must have been very restricted and there must have been hoards of scientists clamoring to run their experiments and computations on it. What would have happened if we gave every one of those clamoring scientists an RPI5?
And yes I know this raises an interface problem of how would they even use one back in the day but lets put that to the side and assume we figured out how to make an RPI5 behave exactly like a Cray 1 and allowed scientists to use it in a productive way.
When you then explained it was just bit-banging said NTSC output, they'd be amazed even more.
Cray 1 was released 1975, teletypes were old tech at that time.
Scientists then (at least a lot of them) knew what they wanted to do, and it required faster computers rather than more of them. A lot of that Cray power at the national labs was doing fluid simulation (i.e. nuclear explosions), and with the computers they had in the 80s, it was done in one or two dimensions, relying on symmetry. Going from n^2 to n^3 grid cells was the obvious next step, but took a lot more memory and CPU speed.
A few niche uses aside (gaming, llm) a vaguely modern desktop is good enough regardless of details.
For parity, you have to move up to a raspberry pi zero 2, which costs $15 and uses about 2W of powerm
A million times cheaper than a cray in 2025 dollars and quite a bit more capable.
https://www.olimex.com/Products/RaspberryPi/PICO/PICO2-XXL/o...
Another comparison that is equally astonishing to the RPi is that modern GPUs have exceeded Whitted’s prediction. Turner’s paper used 640x480 images. At that resolution, extrapolating the 160 Mflops number, 1 Cray per pixel would be 49 Tera flops. A 4080 GPU has just shy of 50 Tflops peak performance, so it has surpassed what Turner thought we’d need.
Think about that - not just faster than a Cray for a lot less money, but one cheap consumer device is faster than 300,000 Crays.(!) Faster than a whole Cray per pixel. We really have come a long, long way.
The 5090 has over 300 Tflops of ray tracing perf, and the Tensor cores are now in the Petaflops range (with lower precision math), so we’re now exceeding the compute needed for 1 Cray per pixel at 1080p. 1 GPU faster than 2M Crays. Mind blowing.
Interesting, wonder how it compares in terms of transistors. How many transistors combined did one Cray have in compute and cache chips?
200k * 300k Cray-1s would be 60B gates, whereas the 4080 actually has 46B transistors. Seems like we’re totally in the right ballpark.
It was actually pretty close to the model of a GPU.
Now lets compare this to the top 500. ( see the point? )( do not speak of Moore's law, while ignoring the mathematical implications. ) ( and yes, 3/1000s is three thousandths ).
Top 500 is 1.7 Exaflops, but by Moore's law should be 4,241Gf or 4.2Xf. So the top 500 is not keeping up with Moore's law.
I don't feel we are getting results that are thousands of times better today.
You are getting results that are way better than thousands of times. You just aren't aware where they are showing up.
To give you a glimpse, the same modelling problems which a couple of decades ago tool days to come up with a crude solution are now being executed within a loop in optimization problems.
You are also seeing multiphysics and coupling problems showing up in mundane applications. We're talking about problems that augment the same modelling problems that a couple of decades ago tool days to solve with double or triple the degrees of freedom.
Without the availability of these supercomputers the size of credit cards, the whole field of computer-aided engineering would not exist.
Also, to boot, there are indeed diminished returns. Increasing computational resources unblocks constraints such as being able to use doubles instead of floats. This means that lowering numerical errors in 3 or 4 decimal places comes for free at the expense of taking around 4 times longer to solve the same problem.
To top things off, do you think the results of two decades ago were possible without employing a great deal of simplifications and crude approximations? As legend has it, the F117 Nighthawk got it's design due to the computational limits of the time. Since then, stealth planes became more performant and with a smoother design. That's what you get when your computational resources are a thousands times better.
The Cray-1 is really a very simple machine, with a small instruction set. It just has 64 of everything. It was built from discrete components, almost the last CPU built that way.
[1] https://www.cpushack.com/2010/09/15/homebrew-cray-1a-1976-vs...
Imagine traveling back to 1977 and explaining to someone that in 2025 we've allocated all that extra computing power to processing javascript bundles and other assorted webshit.
Animats below said that the Cray-1 was made from discrete components. Good luck making a rp2350 from discrete components, it likely wouldn't even function well at the desired frequency due to speed of light and RF interference issues--it would likely be even worse for GHz broadcoms used in the rpi5. This means that in a post-apocolyptic future you could make another cray-1 given enough time and resources. In 20 years when the fabs have stopped making rp2350s there simply will not be any more of them.
The exciting part back then was that, while computers were never "good enough," they were getting noticeably better every few months. If you were in the market for a computer, you knew you could get a noticeably better one for the same price if you just waited a little while. The next model was exciting, because it was tangibly better. At some point personal computers became "good enough" for most people. Other than compensating for creeping software bloat, there hasn't been much reason for most people to be excited about new computers in a decade or more.
Given a modern flagship phone, what year was that phone equivalent to total world computational power?
For example, based on TFA, a Pi5 represents a around a thousand Cray 1 systems in 1977. Based on that seems likely that a single Pi5 outstrips total world supercomputer capacity in 1977.
We tossed some numbers around, but the rough consensus was that the were likely to have the 60s covered and most of the 70s as well. Given that this was a decade ago, I expect that we could move forward a few years.
hoppp•6mo ago
It kinda reminded me of the trash can mac. I wonder if it was inspiration for it
Mountain_Skies•6mo ago
v9v•6mo ago
einsteinx2•6mo ago
Ironically the trash can Mac actually looked strikingly similar in size and shape to actual small trash cans that were all over the Apple campus when I worked there. I’d see them in the cafeteria every day. They were aluminum though, but otherwise very similar. I always wondered if they had anything to do with the design of the computer, even if only subconsciously.