But one has to imagine that seeing so many huge datacenters go up and not being able to do training runs etc. is motivating a lot of researchers to try things that are really different. At least I hope so.
It seems pretty short sighted that the funding numbers for memristor startups (for example) are so low so far.
Anyway, assuming that within the next several years more radically different AI hardware and AI architecture paradigms pay off in efficiency gains, the current situation will change. Fully human level AI will be commoditized, and training will be well within the reach of small companies.
I think we should anticipate this given the strong level of need to increase efficiency dramatically, the number of existing research programs, the amount of investment in AI overall, and the history of computation that shows numerous dramatic paradigm shifts.
So anyway "the rest of us" I think should be banding together and making much larger bets on proving and scaling radical new AI hardware paradigms.
If the posters other guesses pay out the same rate, this will likely play out never.
I think sparsity is a consequence of some other fundamental properties of brain function that we've yet to understand. Just sparsifying the models we've got is not going to lead anywhere, IMO. (For example it's estimated that current AI models are already within 1%-10% of a human brain in terms of "number of parameters" (https://www.beren.io/2022-08-06-The-scale-of-the-brain-vs-ma...).)
Poaching, acquihirng or acquisitions and the myriad modern forms we are seeing today have been the tools and will not change.
Owners and beneficiaries of the capital do not change, but that is an artifact of our economic system and is much larger a socio-economic discussion beyond the scope of innovation and research
This already exists: https://www.cerebras.ai/chip
They claim 44 GB of SRAM at 21 PB/s.
Waferscale severely limits bandwidth once you go beyond SRAM, because with far less chip perimeter per unit area there is less place to hook up IO.
But memory-centric compute didn't happen because of Moore's law. (SNNs have the problem that we don't actually know how to use them.) Now that it's gone, it may have a chance, but it still takes a large amount of money thrown into the idea and the people with money are so risk-adverse that they create entire new risks for themselves.
Forward neural networks were very lucky that there existed a mainstream use for the kind of hardware it needed.
I'd rather approach these things from the PoV of: "We use distillation to solve your problems today"
The last sentence kind of says it all: "If you have 30k+/mo in model spend, we'd love to chat."
I still believe this is going to be an embarrassing chapter of the history of AI when we actually do create it. “Humans - with the sort of hubris only a neoliberal post-war boom period could produce - honestly thought their first serious development in computing (silicon-based mircoprocessors) would lead to Artificial General Intelligence and usher in a utopia of the masses. Instead they squandered their limited resources on a Fool’s Errand, ignoring more important crises that would have far greater impacts on their immediate prosperity in the naive belief they could create a Digital God from Silicon and Electricity alone.”
Energy models and other substrates are going to be key, and it has nothing to do with text at all as human intelligence existed before language. It's Newspeak to run a chat bot on what is obviously a computer and call it an intelligence like a human. 1984 like dystopia crap.
GPU cost of the final model training isn't the biggest chunk of the cost and you can probably replicate results of models like Llama 3 very cheaply. It's the cost of experiments, researchers, data collection which brings overall cost 1 or 2 order of magnitude higher.
Just imagine his or her 'ChatGPT with 10,000x fewer propagations' Reddit post appearing on a Monday...
...and $3 trillion of Nvidia stock going down the drain by Friday.
There have really been many significant innovations in hardware, model architecture, and software, allowing companies to keep up with soaring demand and expectations.
But that's always how it's been in high technology. You only really hear about the biggest shifts, but the optimizations are continuous.
> Impressively, open source models have been able to quickly catch up to big labs.
And then the beginning of the fourth:
> Open-source has been lagging behind proprietary models for years, but lately this gap has been widening.
Followed by a picture that is more or less inscrutable.
Yeah. Just to make it explicit - that chart has Deepseek r1 at ... presumably an elo of 1418 and Gemini Pro at 1463. That is comparable to the gap between Magnus Carlsen and Fabiano Caruana [0]. I don't think it is reasonable to complain about that sort of performance gap in practice - it is a capable model. Looking at the spread of scores I don't immediately see why someone even needs to use something in the Top 10, presumably anything above 1363 would be good enough for business, research and personal use.
None of these models have even been around that long, Deepseek was only released in January. The rate of change is massive, I expect to have access to an open source model that is better than anything on this leaderboard next year some time.
Until AGI is achieved no one's really won anything.
madars•5h ago