Jürgen Schmidhuber：the Father of Generative AI Without Turing Award

http://www.jazzyear.com/article_info.html?id=1352

67•kleiba•6h ago

Comments

nharada•4h ago

Doesn’t he know the Turing Award is really just a generalization of the Fields Medal, an award that actually came years earlier?

logicchains•4h ago

I'm sure he wouldn't object to a Fields Medal either.

triceratops•3h ago

I chuckled but I also maybe didn't understand. Is the joke that computer science is a generalization of math? That can't be rigth.

dgacmu•2h ago

The joke is that schmidhuber is known for (rightly or wrongly) pointing to modern contributions in deep neural networks and saying they're just a trivial generalization/adaptation/etc. of work he did 30 years ago.

belval•4h ago

Every so often Schmidhuber is brought back to the front-page of HN, people will argue that he "invented it all" while others will say that he's a-posteriori claiming all the good ideas were his.

Relativity Priority Dispute: https://en.wikipedia.org/wiki/Relativity_priority_dispute

We all stand on the shoulders of giants, things can be invented and reinvented and ideas can appear twice in a vacuum.

kleiba•4h ago

But as far as I understand, Schmidhuber's claim is more severe: namely that Bengio, Hinton and LeCun intentionally failed to cite prior work by others (including himself) but instead only cited each other in order to boost their respective scientific reputation.

I personally think that he's not doing himself or his argument of favor by presenting it the way he does. While he basically argues that science should be totally objective and neutral, there's no denying that if you put yourself in a less likeable light, you're not going to make any friends.

On the other hand, he's gone at length with compiling detailed references to support his points. I can appreciate that because it makes his argument a lot less hand-wavey: you can go to his blog and compare the cited references yourself. Except that I couldn't because I'm not an ML expert.

jll29•3h ago

I have seen many cases where people -- accidentally as well as intentionally -- copied or re-invented the work of others (a friend posted on LinkedIn that someone else plagiarized his whole Ph.D. thesis, including the title, so only the name was changed; only the references to seperately published papers on which the individual chapters were based at the end still had my friend's name in it, so you could see it was a fake thesis).

If a bona fide scientist makes a mistake about missing attribution, they would correct it as soon as possible. Many, however, would not correct such a re-discovery, because it's embarrassing.

But the worst is when people don't even imagine anything like what they are working on could already exist, and they don't even bother finding and reading related work -- in other words, ignorance. Science deserves better, but there are more and more ignorant folks around that want to ignore all work before them.

godelski•2h ago

  > Many, however, would not correct such a re-discovery, because it's embarrassing.

This is a culture thing that needs to change.

I'm a pretty big advocate of open publishing and avoiding the "review process" as it stands today. The reason is because we shouldn't be chasing these notions of novelty and "impact". They are inherently subjective and lead to these issues of credit. Your work isn't diminished because you independently invented it, rather, that strengthens your work. There's more evidence! Everything is incremental and so all this stuff does is makes us focus more on trying to show our uniqueness rather than showing our work. The point of publishing is to communicate. The peer review process only happens post communicating: when people review, replicate, build on, or build against. We're just creating an overly competitive environment. It is only "embarrassing" because it "undermines" the work. It only "undermines" the work because how we view credit.

Consider this as a clear example. Suppose you want to revisit a work but just scale it up and use on modern hardware. You could get state of the art results but if you admit to such a thing with no claimed changes (let's say you literally just increase number of layers) you'll never get published. You'll get responses about how we "already knew this" and "obviously it scales". But no one tested it... right? That's just bad for science. It's bad if we can't do mundane boring shit.

Lerc•4h ago

I can see how someone could feel like that if they looked at the world in a particular way.

I have had plenty of ideas in the last few years that I have played with that I have seen published in papers in the following months. Rather than feeling like "I did it first" I feel gratified that not only was I on the right track, but someone else has done the hard slog.

Most papers are not published by people who had the idea the day before. Their work goes back further than that. Refining the idea, testing it and then presenting the results takes time, sometimes years, occasionally decades.

If this happens to you, don't think "Hey! That idea belongs to me!". Thank them for proving you right.

Now if they patent it, that's a different story. I don't think the ideas that sometimes float through my brain belong to me, but I'm not keen on them belonging to someone else either.

kleiba•3h ago

I think that's slightly mispresenting Schmidhuber's case though, because he does not just say "oh, I already had that same idea before you, I just follow up on it". He is usually referring to work that he or members of his group (or third-party researchers for that amatter) did, in fact, already publish.

Kranar•2h ago

This claim is sheer hubris. There is a big difference between spending years researching and working on a nascent subject and publishing it in academic journals, only to have someone else come along and use most of your ideas without attribution and reaping huge rewards for doing so... and having some random ideas float about in your head on a nice summer afternoon by the lake, and then a few months later find out that someone else also shared those same ideas and managed to work through them into something fruitful.

Now whether what Schmidhuber claims is what actually happened or not I don't know... but that is his claim and it's fundamentally different from what you are describing.

Mond_•4h ago

Oh boy, I sure can't wait to see the comments on this one!

Schmidhuber sure seems to be a personality, and so far I've mostly heard negative things about his "I invented this" attitude to modern research.

kleiba•4h ago

A lot of this is because nobody likes braggers - however, in all fairness, his argument is that a lot of what is considered modern ML is based on many previous results, including but not limited to his own research.

goldemerald•4h ago

No discussion with Schmidhuber is complete without the infamous debate at NIPS 2016 https://youtu.be/HGYYEUSm-0Q?t=3780 . One of my goals as a ML researcher is to publish something and have Schmidhuber claim he's already done it.

But more seriously, I'm not a fan of Schmidhuber because even if he truly did invent all this stuff early in the 90s, he's inability to see its application to modern compute held the field back by years. In principle, we could have had GANs and self-supervised models' years earlier if he had "revisited his early work". It's clear to me no one read his early paper's when developing GANs/self-supervision/transformers.

andy99•4h ago

It's very common in science for people to have had results they didn't understand the significance of that later were popularized by someone else.

There is the whole thing with Damadian claiming to have invented MRI (he didn't) when the Nobel prize went to Mansfield and Lauterbur (see the Nobel prize part of the article). https://en.m.wikipedia.org/wiki/Paul_Lauterbur

And I've seen other less prominent examples.

It's a lot like the difference between ideas and execution and people claiming someone "stole" their idea because they made a successful business from it.

nextos•3h ago

I think he did understand both the significance of his work and the importance of hardware. His group pioneered porting models to GPUs.

But personal circumstances matter a lot. He was stuck at IDSIA in Lugano, i.e. relatively small and not-so-well funded academia.

He could have done much better in industry, with access to lots of funding, a bigger headcount, and serious infrastructure.

Ultimately, models matter much less than infrastructure. Transformers are not that important, other architectures such as deep SSMs or xLSTM are able to achieve comparable results.

godelski•3h ago

  > if he had "revisited his early work".

Given that you're a researcher yourself I'm surprised by this comment. Have you not yourself experienced the harsh rejection of "not novel"? That sounds like a great way to get stuck in review hell. (I know I've experienced this even when doing novel things just by too closely relating it to other methodologies when explaining "oh, it's just ____").

The other part seems weird too. Who isn't upset when their work doesn't get recognized and someone else gets credit. Are we not all human?

cma•2h ago

His group actually used GPUs early (earlier) on and won a competition but didn't get the same press.

Vetch•1h ago

> he's inability to see its application to modern compute held the field back by years.

I find Schmidhuber's claim on GANs to be tenuous at best, but his claim to have anticipated modern LLMs is very strong, especially if we are going to be awarding nobel prizes for Boltzmann Machines. In https://people.idsia.ch/%7Ejuergen/FKI-147-91ocr.pdf, he really does concretely describe a model that unambiguously anticipated modern attention (technically, either an early form of hypernetworks or a more general form of linear attention, depending on which of its proposed update rules you use).

I also strongly disagree with the idea that his inability to practically apply his ideas held anything back. In the first place, it is uncommon for a discoverer or inventor to immediately grasp all the implications of and applications of their work. Secondly, the key limiter was parallel processing power; it's not a coincidence ANNs took off around the same time GPUs were transitioning away from fixed function pipelines (and Schmidhuber's lab were pioneers there too).

In the interim, when most derided Neural networks, his lab was one of the few that kept research on Neural networks and their application to sequence learning going. Without their contributions, I'm confident Transformers would have happened later.

> It's clear to me no one read his early paper's when developing GANs

This is likely true.

> self-supervision/transformers.

This is not true. Transformers came after lots of research on sequence learners, meta-learning, generalizing RNNs and adaptive alignment. For example, Alex Graves' work on sequence transduction with RNNs eventually led to the direct precursor of modern attention. Graves' work was itself influenced by work with and by Schmidhuber.

chermi•43m ago

I don't understand how he's at fault for the field being behind where it maybe could've been, especially the language "held back"? Did he actively discourage people in against trying his ideas as compute grew?

mindcrime•3h ago

I'll probably get flamed to death for saying this, but I like Jürgen. I mean, I don't know him in person (never met him) but I've seen a lot of his written work and interviews and what-not and he seems like an alright guy to me. Yes, I get it... there's that whole "ooooh, Jürgen is always trying to claim credit for everything" thing and all. But really, to me, it doesn't exactly come off that way. Note that he's often pointing out the lack of credit assigned even to people who lived and died centuries before him.

His "shtick" to me isn't just about him saying "people didn't give me credit" but it seems more "AI people in general haven't credited the history of the field properly." And in many cases he seems to have a point.

noosphr•3h ago

It's a clash of cultures.

He is an academic that cares for understanding where ideas came from. His detractors need to be the smartest people in the room to get paid millions and raise billions.

It's not very sexy to say 'Oh yes, we are just using an old Soviet learning algorithm on better hardware. Turns out we would have lost the cold war if the USSR had access to a 5090.' , which won't get you the billions you need to build the supercomputers that push the state of the art today.

ks2048•1h ago

It seems his "detractors" (or at least his foes) are also academics - i.e. the same culture - they just cite Hinton and LeCun instead of Schmidhuber.

noosphr•1h ago

It helps your career to cite the head of AI in Facebook and the former head of Google. Not so much some academician who worked in the 1970s in the Soviet Socialist Republic of Kazakhstan.

ks2048•1h ago

I believe the Schmidhuber-ignoring (according to him) began before those two were at Google/Meta. But, I suppose NYU/Bell Labs and U-Toronto will be more likely to be cited than somewhere in Munich or Switzerland...

godelski•3h ago

I think you sum up my feelings about him as well. He's a bit much sometimes but it's hard to deny that he's made monumental contributions to the field.

It's also funny that we laugh at him when we also have a joke that in AI we just reinvent what people did in the 80's. He's just the person being more specific as to what and who.

Ironically, I think the problem is we care too much about credit. It ends up getting hoarded rather than shared. We then just oversell our contributions because if you make the incremental improvements that literally everyone does, you get your works rejected for being incremental.

I don't know what it is about CS specifically, but we have a culture problem or attribution and hype. From building on open source, it's libraries all the way down, but we act like we did it all alone. To jumping on bandwagons as if there's a right and immutable truth to how to do certain things, until the bubbles pop and we laugh at how stupid anyone was to do such a thing. Yet we don't contribute back to those projects that have US foundation, we laugh at "theory" which we stand on, and we listen to the same hype train people who got it wrong last time instead of turning to those who got it right. Why? It goes directly counter to the ideas of a group who love to claim rationalism, "working from first principles", and "I care what works"

voidhorse•2h ago

> we laugh at "theory" which we stand on

This aspect of the industry really annoys me to no end. People in this field are so allergic to theory (which is ironic because CS, of all fields, is probably one of the ones in which theoretical investigations are most directly applicable) that they'll smugly proclaim their own intelligence and genius while showing you a pet implementation of ideas that have been around since the 70s or earlier. Sure, most of the time they implement it in a new context, but this leads to a fragmented language in which the same core ideas are implemented N times with everyone particular personal ignorant terminology choices (see for example, the wide array of names for basic functional data structure primitives like map, fold, etc. that abound across languages).

mindcrime•1h ago

I get weird looks sometimes lately when I point out that "agents" are not a new thing, and that they date back at least to the 1980's and - depending on how you interpret certain things[1] - possibly back to the 1970's.

People at work have, I think, gotten tired of my rant about how people who are ignorant of the history of their field have a tendency to either re-invent things that already exist, or to be snowed by other people who are re-inventing things that already exist.

I suppose my own belief in the importance of understanding and acknowledging history is one reason I tend to be somewhat sympathetic to Schmidhuber's stance.

[1]: https://en.wikipedia.org/wiki/Actor_model

voidhorse•1h ago

I'm in the same boat. At least there's a couple of us that think this way. I'm always amazed when I run into people who think neural nets are a relatively recent thing, and not something that emerged back in the 1940s-50s. People seem to tend to implicitly equate the emergence of modern applications of ideas with the emergence of the ideas themselves.

I wonder at times if it stems back to flaws in the CS pedagogy. I studied philosophy and literature in which tracing the history of thought is basically the entire game. I wonder if STEM fields, since they have far greater operational emphasis, lose out on some of this.

mindcrime•1h ago

> people who think neural nets are a relatively recent thing, and not something that emerged back in the 1940s-50s

And to bring this full circle... if you really (really) buy into Schmidhuber's argument, then we should consider the genesis of neural networks to date back to around 1800! I think it's fair to say that that might be a little bit of a stretch, but maybe not that much so.

godelski•1h ago

Another interesting thing I see is how people will refuse to learn history thinking it will harm their creativity[0].

The problem with these types of interpretations is that it's fundamentally authoritarian. Where research itself is fundamentally anti-authoritarian. To elaborate: trust but verify. You trust the results of others, but you replicate and verify. You dig deep and get to the depth (progressive knowledge necessitates higher orders of complexity). If you do not challenge or question results then yes, I'd agree, knowledge harms. But if you're willing to say "okay, it worked in that exact setting, but what about this change?" then there is no problem[1]. In that setting, more reading helps.

I just find these mindsets baffling... Aren't we trying to understand things? You can really only brute force new and better things if you are unable to understand. We can make so much more impact and work so much faster when we let understanding drive as much as outcome.

[0] https://bsky.app/profile/chrisoffner3d.bsky.social/post/3liy...

[1] Other than Reviewer #2

bluefirebrand•55m ago

> Aren't we trying to understand things?

Unfortunately, for most of us, no. We are trying to deliver business units to increase shareholder value

godelski•40m ago

I think you should have continued reading from where you quoted.

  >> Aren't we trying to understand things? ***You can really only brute force new and better things if you are unable to understand. We can make so much more impact and work so much faster when we let understanding drive as much as outcome.***

I'm arguing that if you want to "deliver business units to increase shareholder value" that this is well aligned with "trying to understand things."

Think about it this way:

  If you understand things:
    You can directly address shareholder concerns and adapt readily to market demands. You do not have to search, you already understand the solution space.

  If you do not understand things:
    You cannot directly address shareholder concerns and must search over the solution space to meet market demands.

Which is more efficient? It is hard to argue that search through an unknown solution space is easier than path optimization over a known solution space. Obviously this is the highly idealized case, but this is why I'm arguing that these are aligned. If you're in the latter situation you advantage yourself by trying to get to the former. Otherwise you are just blindly searching. In that case technical debt becomes inevitable and significantly compounds unless you get lucky. It becomes extremely difficult to pivot as the environment naturally changes around you. You are only advantaged by understanding, never harmed. Until we realize this we're going to continue to be extremely wasteful, resulting is significantly lower returns for shareholders or any measure of value.

godelski•1h ago

My favorite Knuth quote[0]

  If you find that you're spending almost all your time on theory, start turning some attention to practical things; it will improve your theories. If you find that you're spending almost all your time on practice, start turning some attention to theoretical things; it will improve your practice.

But yeah, in general I hate how people treat theory, acting as if it has no economic value. Certainly both matter, no one is denying that. But there's a strong bias against theory and I'm not sure why. Let's ask ourselves, what is the economic impact of Calculus? What about just the work of Leibniz or Newton? I'm pretty confident that that's significantly north of billions of dollars a year. And we what... want to do less of this type of impactful work? It seems a handful of examples far covers any wasted money on research that has failed (or "failed").

The problem I see with our field, which leads to a lot of hype, is the belief that everything is simple. This just creates "yes men" and people who do not think. Which I think ends up with people hearing "no" when someone is just acting as an engineer. The job of an engineer is to problem solve. That means you have to identify problems! Identifying them and presenting solutions is not "no", it is "yes". But for some reason it is interpreted as "no".

  > see for example, the wide array of names for basic functional data structure primitives like map, fold, etc. that abound across languages

Don't get me started... but if a PL person goes on a rant here, just know, yes, I upvoted you ;)

[0] You can probably tell I came to CS from "outside". I have a PhD in CS (ML) but undergrad was Physics. I liked experimental physics because I came to the same conclusion as Knuth: Theory and practice drive one another.

FL33TW00D•3h ago

If you guys were the inventors of Facebook, you’d have invented Facebook

noosphr•3h ago

Before people say that he is claiming credit for things he didn't do, or that he invented everything, please read his own paper on the subject:

https://people.idsia.ch/~juergen/deep-learning-history.html

The history section starts in 1676.

swyx•2h ago

> 1676: The Chain Rule For Backward Credit Assignment

Schmidhuber is nothing but a stickler for backward credit assignment

ur-whale•2h ago

If there ever was an example of a terrible personality getting in the way of career, Schmidhuber is most definitely it.

You may have had many brilliant ideas, but everyone makes an abrupt 180 when they see the tip your beard turn the corner at conferences, that can't be a good signal for getting awards.

banq•2h ago

Why did Google give birth to the Transformer? Because Google created an ecosystem where everything could flourish, while the old man in Switzerland lacked such an environment—what could even the smartest and greatest individual do against that?

As an organization, fostering an organically growing context is like governing a great nation with delicate care. A bottom-up (organic growth) environment is the core context for sustained innovation and development!

ks2048•1h ago

> Why did Google give birth to the Transformer?

No, Schmidhuber gave birth to the transformer in 1991.

https://x.com/SchmidhuberAI/status/1683870175299239937

esafak•2h ago

The fact that the field of machine learning keeps "discovering" things already established in other fields, and christening them with new names does lend some credence to Schmidhuber. The field is more industrial than academic, and cares about money more than credit, and industrial-scale data theft is all in a day's work.

As another commenter said, his misfortune is being in a lab with no industrial affiliation.

voidhorse•2h ago

I haven't read the article or paper yet, but if the gist I'm getting from the comments is correct, Schmidhuber is generally correct about industry having horrible citation practices. I even see it at a small scale at work. People often fail or forget to mention the others that helped them generate their ideas.

I would not be at all surprised if this behavior extended to research papers published by people in industry as opposed to academia. Good citation practice simply does not exist in industry. We're lucky in any of the thousand blog posts that reimplement some idea that was cranked out ages ago in academic circles are even aware of the original effort, let alone cite it. Citations are few and far between in industry literature generally. Obviously there are exceptions and this just my personal observation, I haven't done or found any kind of meta literary study illustrating such.

AbsenceBench: Language models can't tell what's missing

Phoenix.new – Remote AI Runtime for Phoenix

Wiki Radio: The thrilling sound of random Wikipedia

Harper – an open-source alternative to Grammarly

AMD's Freshly-Baked MI350: An Interview with the Chief Architect

Visualizing environmental costs of war in Hayao Miyazaki's Nausicaä

Show HN: Inspect and extract files from MSI installers directly in your browser

Show HN: Nxtscape – an open-source agentic browser

YouTube's new anti-adblock measures

No More Shading Languages: Compiling C++ to Vulkan Shaders [pdf]

College baseball, venture capital, and the long maybe

Tuxracer.js play Tux Racer in the browser

Verified dynamic programming with Σ-types in Lean

Cracovians: The Twisted Twins of Matrices

Smartphones: Parts of Our Minds? Or Parasites?

Proba-3's first artificial solar eclipse

AtomicOS – A security-first OS with real crypto and deterministic language

Dancing Naked on the Head of a Pin: The Early History of Microphotography

Oklo, the Earth's Two-billion-year-old only Known Natural Nuclear Reactor (2018)

A brief, incomplete, and mostly wrong history of robotics

Alpha Centauri

Rose-Gold-Tinted Liquid Glasses

A Python-first data lakehouse

The JAWS shark is public domain

BYD begins testing solid-state EV batteries in the Seal

Jürgen Schmidhuber：the Father of Generative AI Without Turing Award

Klong: A Simple Array Language

Show HN: SnapQL – Desktop app to query Postgres with AI

Minimal auto-differentiation engine in Rust

Ancient termite poo reveals 120M-year-old secrets of Australia's forests

AbsenceBench: Language models can't tell what's missing

Phoenix.new – Remote AI Runtime for Phoenix

Wiki Radio: The thrilling sound of random Wikipedia

Harper – an open-source alternative to Grammarly

AMD's Freshly-Baked MI350: An Interview with the Chief Architect

Visualizing environmental costs of war in Hayao Miyazaki's Nausicaä

Show HN: Inspect and extract files from MSI installers directly in your browser

Show HN: Nxtscape – an open-source agentic browser

YouTube's new anti-adblock measures

No More Shading Languages: Compiling C++ to Vulkan Shaders [pdf]

College baseball, venture capital, and the long maybe

Tuxracer.js play Tux Racer in the browser

Verified dynamic programming with Σ-types in Lean

Cracovians: The Twisted Twins of Matrices

Smartphones: Parts of Our Minds? Or Parasites?

Proba-3's first artificial solar eclipse

AtomicOS – A security-first OS with real crypto and deterministic language

Dancing Naked on the Head of a Pin: The Early History of Microphotography

Oklo, the Earth's Two-billion-year-old only Known Natural Nuclear Reactor (2018)

A brief, incomplete, and mostly wrong history of robotics

Alpha Centauri

Rose-Gold-Tinted Liquid Glasses

A Python-first data lakehouse

The JAWS shark is public domain

BYD begins testing solid-state EV batteries in the Seal

Jürgen Schmidhuber：the Father of Generative AI Without Turing Award

Klong: A Simple Array Language

Show HN: SnapQL – Desktop app to query Postgres with AI

Minimal auto-differentiation engine in Rust

Ancient termite poo reveals 120M-year-old secrets of Australia's forests

Jürgen Schmidhuber：the Father of Generative AI Without Turing Award

Comments