Relativity Priority Dispute: https://en.wikipedia.org/wiki/Relativity_priority_dispute
We all stand on the shoulders of giants, things can be invented and reinvented and ideas can appear twice in a vacuum.
I personally think that he's not doing himself or his argument of favor by presenting it the way he does. While he basically argues that science should be totally objective and neutral, there's no denying that if you put yourself in a less likeable light, you're not going to make any friends.
On the other hand, he's gone at length with compiling detailed references to support his points. I can appreciate that because it makes his argument a lot less hand-wavey: you can go to his blog and compare the cited references yourself. Except that I couldn't because I'm not an ML expert.
If a bona fide scientist makes a mistake about missing attribution, they would correct it as soon as possible. Many, however, would not correct such a re-discovery, because it's embarrassing.
But the worst is when people don't even imagine anything like what they are working on could already exist, and they don't even bother finding and reading related work -- in other words, ignorance. Science deserves better, but there are more and more ignorant folks around that want to ignore all work before them.
> Many, however, would not correct such a re-discovery, because it's embarrassing.
This is a culture thing that needs to change.I'm a pretty big advocate of open publishing and avoiding the "review process" as it stands today. The reason is because we shouldn't be chasing these notions of novelty and "impact". They are inherently subjective and lead to these issues of credit. Your work isn't diminished because you independently invented it, rather, that strengthens your work. There's more evidence! Everything is incremental and so all this stuff does is makes us focus more on trying to show our uniqueness rather than showing our work. The point of publishing is to communicate. The peer review process only happens post communicating: when people review, replicate, build on, or build against. We're just creating an overly competitive environment. It is only "embarrassing" because it "undermines" the work. It only "undermines" the work because how we view credit.
Consider this as a clear example. Suppose you want to revisit a work but just scale it up and use on modern hardware. You could get state of the art results but if you admit to such a thing with no claimed changes (let's say you literally just increase number of layers) you'll never get published. You'll get responses about how we "already knew this" and "obviously it scales". But no one tested it... right? That's just bad for science. It's bad if we can't do mundane boring shit.
I have had plenty of ideas in the last few years that I have played with that I have seen published in papers in the following months. Rather than feeling like "I did it first" I feel gratified that not only was I on the right track, but someone else has done the hard slog.
Most papers are not published by people who had the idea the day before. Their work goes back further than that. Refining the idea, testing it and then presenting the results takes time, sometimes years, occasionally decades.
If this happens to you, don't think "Hey! That idea belongs to me!". Thank them for proving you right.
Now if they patent it, that's a different story. I don't think the ideas that sometimes float through my brain belong to me, but I'm not keen on them belonging to someone else either.
Now whether what Schmidhuber claims is what actually happened or not I don't know... but that is his claim and it's fundamentally different from what you are describing.
Schmidhuber sure seems to be a personality, and so far I've mostly heard negative things about his "I invented this" attitude to modern research.
But more seriously, I'm not a fan of Schmidhuber because even if he truly did invent all this stuff early in the 90s, he's inability to see its application to modern compute held the field back by years. In principle, we could have had GANs and self-supervised models' years earlier if he had "revisited his early work". It's clear to me no one read his early paper's when developing GANs/self-supervision/transformers.
There is the whole thing with Damadian claiming to have invented MRI (he didn't) when the Nobel prize went to Mansfield and Lauterbur (see the Nobel prize part of the article). https://en.m.wikipedia.org/wiki/Paul_Lauterbur
And I've seen other less prominent examples.
It's a lot like the difference between ideas and execution and people claiming someone "stole" their idea because they made a successful business from it.
But personal circumstances matter a lot. He was stuck at IDSIA in Lugano, i.e. relatively small and not-so-well funded academia.
He could have done much better in industry, with access to lots of funding, a bigger headcount, and serious infrastructure.
Ultimately, models matter much less than infrastructure. Transformers are not that important, other architectures such as deep SSMs or xLSTM are able to achieve comparable results.
> if he had "revisited his early work".
Given that you're a researcher yourself I'm surprised by this comment. Have you not yourself experienced the harsh rejection of "not novel"? That sounds like a great way to get stuck in review hell. (I know I've experienced this even when doing novel things just by too closely relating it to other methodologies when explaining "oh, it's just ____").The other part seems weird too. Who isn't upset when their work doesn't get recognized and someone else gets credit. Are we not all human?
I find Schmidhuber's claim on GANs to be tenuous at best, but his claim to have anticipated modern LLMs is very strong, especially if we are going to be awarding nobel prizes for Boltzmann Machines. In https://people.idsia.ch/%7Ejuergen/FKI-147-91ocr.pdf, he really does concretely describe a model that unambiguously anticipated modern attention (technically, either an early form of hypernetworks or a more general form of linear attention, depending on which of its proposed update rules you use).
I also strongly disagree with the idea that his inability to practically apply his ideas held anything back. In the first place, it is uncommon for a discoverer or inventor to immediately grasp all the implications of and applications of their work. Secondly, the key limiter was parallel processing power; it's not a coincidence ANNs took off around the same time GPUs were transitioning away from fixed function pipelines (and Schmidhuber's lab were pioneers there too).
In the interim, when most derided Neural networks, his lab was one of the few that kept research on Neural networks and their application to sequence learning going. Without their contributions, I'm confident Transformers would have happened later.
> It's clear to me no one read his early paper's when developing GANs
This is likely true.
> self-supervision/transformers.
This is not true. Transformers came after lots of research on sequence learners, meta-learning, generalizing RNNs and adaptive alignment. For example, Alex Graves' work on sequence transduction with RNNs eventually led to the direct precursor of modern attention. Graves' work was itself influenced by work with and by Schmidhuber.
His "shtick" to me isn't just about him saying "people didn't give me credit" but it seems more "AI people in general haven't credited the history of the field properly." And in many cases he seems to have a point.
He is an academic that cares for understanding where ideas came from. His detractors need to be the smartest people in the room to get paid millions and raise billions.
It's not very sexy to say 'Oh yes, we are just using an old Soviet learning algorithm on better hardware. Turns out we would have lost the cold war if the USSR had access to a 5090.' , which won't get you the billions you need to build the supercomputers that push the state of the art today.
It's also funny that we laugh at him when we also have a joke that in AI we just reinvent what people did in the 80's. He's just the person being more specific as to what and who.
Ironically, I think the problem is we care too much about credit. It ends up getting hoarded rather than shared. We then just oversell our contributions because if you make the incremental improvements that literally everyone does, you get your works rejected for being incremental.
I don't know what it is about CS specifically, but we have a culture problem or attribution and hype. From building on open source, it's libraries all the way down, but we act like we did it all alone. To jumping on bandwagons as if there's a right and immutable truth to how to do certain things, until the bubbles pop and we laugh at how stupid anyone was to do such a thing. Yet we don't contribute back to those projects that have US foundation, we laugh at "theory" which we stand on, and we listen to the same hype train people who got it wrong last time instead of turning to those who got it right. Why? It goes directly counter to the ideas of a group who love to claim rationalism, "working from first principles", and "I care what works"
This aspect of the industry really annoys me to no end. People in this field are so allergic to theory (which is ironic because CS, of all fields, is probably one of the ones in which theoretical investigations are most directly applicable) that they'll smugly proclaim their own intelligence and genius while showing you a pet implementation of ideas that have been around since the 70s or earlier. Sure, most of the time they implement it in a new context, but this leads to a fragmented language in which the same core ideas are implemented N times with everyone particular personal ignorant terminology choices (see for example, the wide array of names for basic functional data structure primitives like map, fold, etc. that abound across languages).
People at work have, I think, gotten tired of my rant about how people who are ignorant of the history of their field have a tendency to either re-invent things that already exist, or to be snowed by other people who are re-inventing things that already exist.
I suppose my own belief in the importance of understanding and acknowledging history is one reason I tend to be somewhat sympathetic to Schmidhuber's stance.
I wonder at times if it stems back to flaws in the CS pedagogy. I studied philosophy and literature in which tracing the history of thought is basically the entire game. I wonder if STEM fields, since they have far greater operational emphasis, lose out on some of this.
And to bring this full circle... if you really (really) buy into Schmidhuber's argument, then we should consider the genesis of neural networks to date back to around 1800! I think it's fair to say that that might be a little bit of a stretch, but maybe not that much so.
The problem with these types of interpretations is that it's fundamentally authoritarian. Where research itself is fundamentally anti-authoritarian. To elaborate: trust but verify. You trust the results of others, but you replicate and verify. You dig deep and get to the depth (progressive knowledge necessitates higher orders of complexity). If you do not challenge or question results then yes, I'd agree, knowledge harms. But if you're willing to say "okay, it worked in that exact setting, but what about this change?" then there is no problem[1]. In that setting, more reading helps.
I just find these mindsets baffling... Aren't we trying to understand things? You can really only brute force new and better things if you are unable to understand. We can make so much more impact and work so much faster when we let understanding drive as much as outcome.
[0] https://bsky.app/profile/chrisoffner3d.bsky.social/post/3liy...
[1] Other than Reviewer #2
Unfortunately, for most of us, no. We are trying to deliver business units to increase shareholder value
>> Aren't we trying to understand things? ***You can really only brute force new and better things if you are unable to understand. We can make so much more impact and work so much faster when we let understanding drive as much as outcome.***
I'm arguing that if you want to "deliver business units to increase shareholder value" that this is well aligned with "trying to understand things."Think about it this way:
If you understand things:
You can directly address shareholder concerns and adapt readily to market demands. You do not have to search, you already understand the solution space.
If you do not understand things:
You cannot directly address shareholder concerns and must search over the solution space to meet market demands.
Which is more efficient? It is hard to argue that search through an unknown solution space is easier than path optimization over a known solution space. Obviously this is the highly idealized case, but this is why I'm arguing that these are aligned. If you're in the latter situation you advantage yourself by trying to get to the former. Otherwise you are just blindly searching. In that case technical debt becomes inevitable and significantly compounds unless you get lucky. It becomes extremely difficult to pivot as the environment naturally changes around you. You are only advantaged by understanding, never harmed. Until we realize this we're going to continue to be extremely wasteful, resulting is significantly lower returns for shareholders or any measure of value. If you find that you're spending almost all your time on theory, start turning some attention to practical things; it will improve your theories. If you find that you're spending almost all your time on practice, start turning some attention to theoretical things; it will improve your practice.
But yeah, in general I hate how people treat theory, acting as if it has no economic value. Certainly both matter, no one is denying that. But there's a strong bias against theory and I'm not sure why. Let's ask ourselves, what is the economic impact of Calculus? What about just the work of Leibniz or Newton? I'm pretty confident that that's significantly north of billions of dollars a year. And we what... want to do less of this type of impactful work? It seems a handful of examples far covers any wasted money on research that has failed (or "failed").The problem I see with our field, which leads to a lot of hype, is the belief that everything is simple. This just creates "yes men" and people who do not think. Which I think ends up with people hearing "no" when someone is just acting as an engineer. The job of an engineer is to problem solve. That means you have to identify problems! Identifying them and presenting solutions is not "no", it is "yes". But for some reason it is interpreted as "no".
> see for example, the wide array of names for basic functional data structure primitives like map, fold, etc. that abound across languages
Don't get me started... but if a PL person goes on a rant here, just know, yes, I upvoted you ;)[0] You can probably tell I came to CS from "outside". I have a PhD in CS (ML) but undergrad was Physics. I liked experimental physics because I came to the same conclusion as Knuth: Theory and practice drive one another.
https://people.idsia.ch/~juergen/deep-learning-history.html
The history section starts in 1676.
Schmidhuber is nothing but a stickler for backward credit assignment
You may have had many brilliant ideas, but everyone makes an abrupt 180 when they see the tip your beard turn the corner at conferences, that can't be a good signal for getting awards.
As an organization, fostering an organically growing context is like governing a great nation with delicate care. A bottom-up (organic growth) environment is the core context for sustained innovation and development!
No, Schmidhuber gave birth to the transformer in 1991.
As another commenter said, his misfortune is being in a lab with no industrial affiliation.
I would not be at all surprised if this behavior extended to research papers published by people in industry as opposed to academia. Good citation practice simply does not exist in industry. We're lucky in any of the thousand blog posts that reimplement some idea that was cranked out ages ago in academic circles are even aware of the original effort, let alone cite it. Citations are few and far between in industry literature generally. Obviously there are exceptions and this just my personal observation, I haven't done or found any kind of meta literary study illustrating such.
nharada•4h ago
logicchains•4h ago
triceratops•3h ago
dgacmu•2h ago