Many of the breakthrough and game changing inventions were done this way with the back of the envelope discussions, the other popular example was the Ethernet network.
Some good stories of similar culture in AT&T Bell lab is well described in the Hamming's book [1].
[1] Stripe Press The Art of Doing Science and Engineering:
https://press.stripe.com/the-art-of-doing-science-and-engine...
According to various stories pieced together, the ideas of 4 of Pixar’s early hits were conceived on or around one lunch.
Bug’s Life, Wall-E, Monsters, Inc
I'm wondering how big impact work from home will really have on humanity in general, when so many of our life changing discoveries comes from the odd chance of two specific people happening to be in the same place at some moment in time.
The agile treadmill (with PM's breathing down our necks) and features getting planned and delivered in 2 week-sprints, has also reduced our ability to just do something we feel needs getting done. Today you go to work to feed several layers of incompetent managers - there is no room for play, or for creativity. At least in most orgs I know.
I think innovation (or even joy of being at work) needs more than just the office, or people, or a canteen, but an environment that supports it.
Basically, I set aside as much time as I can to squeeze in creativity and real engineering work into the job. Otherwise I'd go crazy from the grind of just cranking out deliverables
As for agile: I've made it clear to my PMs that I generally plan on a quarterly/half year basis and my work and other people's work adheres to that schedule, not weekly sprints (we stay up to date in a slack channel, no standups)
It seems, in those days, people at Bell Labs did get the best of both worlds: being able to have chance encounters with very smart people while also being able to just be gone for weeks to work undistracted.
A dream job that probably didn’t even feel like a job (at least that’s the impression I get from hearing Thompson talk about that time).
"""Thompson's design was outlined on September 2, 1992, on a placemat in a New Jersey diner with Rob Pike. In the following days, Pike and Thompson implemented it and updated Plan 9 to use it throughout,[11] and then communicated their success back to X/Open, which accepted it as the specification for FSS-UTF.[9]"""
Related thread:https://threadreaderapp.com/thread/1864023344435380613.html
The LLM stack has enough branches of evolution within it for efficiency, agent-based work can power a new industrial revolution specifically around white collar workers on its own, while expanding the self-expression for personal fulfillment for everyone else
Well have fun sir
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
It's like if someone invented the hamburger and every single food outlet decided to only serve hamburgers from that point on, only spending time and money on making the perfect hamburger, rather than spending time and effort on making great meals. Which sounds ludicrously far-fetched, but is exactly what happened here.
i think you analogously just described Sun Microsystems, where Unixes (BSD originally in their case, generalized to SVR4 (?) hybrid later) worked soooo well, that NT was built as a hybridization for the Microsoft user base and Apple reabsorbed the BSD-Mach-DisplayPostscript hybridization spinoff NeXT, while Linux simultaneously thrived.
realistically, I think the valuable idea is probabilistic graphical models- of which transformers is an example- combining probability with sequences, or with trees and graphs- is likely to continue to be a valuable area for research exploration for the foreseeable future.
As if this approach [1] does not exist.
This seems extremely, extremely unlikely for many reasons. The HP model is a simplification of true protein folding/structure adoption, while AlphaFold (and the open source equivalents) works with real proteins. The SAT approach uses little to no prior knowledge about protein structures, unlike AlphaFold (which has basically memorized and generalized the PDB). To express all the necessary details would likely exceed the capabilities of the best SAT solvers.
(don't get me wrong- SAT and other constraint approaches are powerful tools. But I do not think they are the best approach for protein structure prediction).
on the other hand it's also led to improvements in many places hidden behind the scenes. for example, vision transformers are much more powerful and scalable than many of the other computer vision models which has probably led to new capabilities.
in general, transformers aren't just "generate text" but it's a new foundational model architecture which enables a leap step in many things which require modeling!
Like, vision transformers? They seem to work best when they still have a CNN backbone, but the "transformer" component is very good at focusing on relevant information, and doing different things depending on what you want to be done with those images.
And if you bolt that hybrid vision transformer to an even larger language-oriented transformer? That also imbues it with basic problem-solving, world knowledge and commonsense reasoning capabilities - which, in things like advanced OCR systems, are very welcome.
Simultaneously discovering and leveraging the functional nature of language seems like kind of a big deal.
All that remains is to come up with a way to integrate short-term experience into long-term memory, and we can call the job of emulating our brains done, at least in principle. Everything after that will amount to detail work.
...lol. Yikes.
I do not accept your premise. At all.
> use it to compose original works and solve original problems
Which original works and original problems have LLMs solved, exactly? You might find a random article or stealth marketing paper that claims to have solved some novel problem, but if what you're saying were actually true, we'd be flooded with original works and new problems being solved. So where are all these original works?
> All that remains is to come up with a way to integrate short-term experience into long-term memory, and we can call the job of emulating our brains done, at least in principle
What experience do you have that caused you to believe these things?
If anyone still insists on hidden magical components ranging from immortal souls to Penrose's quantum woo, well... let's see what you've got.
The International Math Olympiad qualifies as solving original problems, for example. If you disagree, that's a case you have to make. Transformer models are unquestionably better at math than I am. They are also better at composition, and will soon be better at programming if they aren't already.
Every time a magazine editor is fooled by AI slop, every time an entire subreddit loses the Turing test to somebody's ethically-questionable 'experiment', every time an AI-rendered image wins a contest meant for human artists -- those are original works.
Heck, looking at my Spotify playlist, I'd be amazed if I haven't already been fooled by AI-composed music. If it hasn't happened yet, it will probably happen next week, or maybe next year. Certainly within the next five years.
>If anyone still insists on hidden magical components ranging from immortal souls to Penrose's quantum woo, well... let's see what you've got.
This isn't too far off from the marketing and hypesteria surrounding "AI" companies.
No they dont. Humans also know when they are pretending to know what they are talking about - put said people against the wall and they will freely admit they have no idea what the buzzwords they are saying mean.
Machines possess no such characteristic.
>No they dont.
WTAF? Maybe you're new here, but the term "hallucinate" came from a very human experience, and was only usurped recently by "AI" bros who wanted to anthropomorphize a tin can.
>Humans also know when they are pretending to know what they are talking about - put said people against the wall and they will freely admit they have no idea what the buzzwords they are saying mean.
>Machines possess no such characteristic.
"AI" will say whatever you want to hear to make you go away. That's the extent of their "characteristic". If it doesn't satisfy the user, they try again, and spit out whatever garbage it calculates should make the user go away. The machine has far less of an "idea" what it's saying.
I also do not accept your assertion, at all. Humans largely function on the basis of desire-fulfilment, be that eating, fucking, seeking safety, gaining power, or any of the other myriad human activities. Our brains, and the brains of all the animals before us, have evolved for that purpose. For evidence, start with Skinner or the millions of behavioral analysis studies done in that field.
Our thoughts lend themselves to those activities. They arise from desire. Transformers have nothing to do with human cognition because they do not contain the basic chemical building blocks that precede and give rise to human cognition. They are, in fact, stochastic parrots, that can fool others, like yourself, into believing they are somehow thinking.
[1] Libet, B., Gleason, C. A., Wright, E. W., & Pearl, D. K. (1983). Time of conscious intention to act in relation to onset of cerebral activity (readiness-potential). Brain, 106(3), 623-642.
[2] Soon, C. S., Brass, M., Heinze, H. J., & Haynes, J. D. (2008). Unconscious determinants of free decisions in the human brain. Nature Neuroscience, 11(5), 543-545.
[3] Berridge, K. C., & Robinson, T. E. (2003). Parsing reward. Trends in Neurosciences, 26(9), 507-513. (This paper reviews the "wanting" vs. "liking" distinction, where unconscious "wanting" or desire is driven by dopamine).
[4] Kavanagh, D. J., Andrade, J., & May, J. (2005). Elaborated Intrusion theory of desire: a multi-component cognitive model of craving. British Journal of Health Psychology, 10(4), 515-532. (This model proposes that desires begin as unconscious "intrusions" that precede conscious thought and elaboration).
They are, in fact, stochastic parrots, that can fool others, like yourself, into believing they are somehow thinking.
What makes you think you're not arguing with one now?
You are not making an argument, you are just making assertions without evidence and then telling us the burden of proof is on us to tell you why not.
If you went walking down the streets yelling the world is run by a secret cabal of reptile-people without evidence, you would rightfully be declared insane.
Our feelings and desires largely determine the content of our thoughts and actions. LLMs do not function as such.
Whether I am arguing with a parrot or not has nothing to do with cognition. A parrot being able to usefully fool a human has nothing to do with cognition.
Language is like a disembodied science-fiction narration.
Wegener's Illusion of Conscious Will
https://www.its.caltech.edu/~squartz/wegner2.pdf
Fedorenko's Language and Thought are Not The Same Thing
edit: post-transformers meaning "in the era after transformers were widely adopted" not some mystical new wave of hypothetical tech to disrupt transformers themselves.
Unless I misinterpreted the post, render me confused.
People who started their NLP work (PhDs etc; industry research projects) before the LLM / transformer craze had to adapt to the new world. (Hence 'post-mass-uptake-of-transformers')
I think this might be the ONLY example that doesn't back up the original claim, because of course an advancement in language processing is an advancement in language processing -- that's tautological! every new technology is an advancement in its domain; what's claimed to be special about transformers is that they are allegedly disruptive OUTSIDE of NLP. "Which fields have been transformed?" means ASIDE FROM language processing.
other than disrupting users by forcing "AI" features they don't want on them... what examples of transformers being revolutionary exist outside of NLP?
Claude Code? lol
If you have something relevant to say, you can summarize for the class & include links to your receipts.
Some directly, because LLMs and highly capable general purpose classifiers that might be enough for your use case are just out there, and some because of downstream effects, like GPU-compute being far more common, hardware optimized for tasks like matrix multiplication and mature well-maintained libraries with automatic differentiation capabilities. Plus the emergence of things that mix both classical ML and transformers, like training networks to approximate intermolecular potentials faster than the ab-initio calculation, allowing for accelerating molecular dynamics simulations.
Reading the newspaper is such a lovely experience these days. But hey, the AI researchers are really excited so who really cares if stuff like this happens if we can declare that "therapy is transformed!"
It sure is. Could it have been that attention was all that kid needed?
I had a friend who did PhD research in NLP and I had a problem of extracting some structured data from unstructured text, and he told me to just ask ChatGPT to do it for me.
Basically ChatGPT is almost always better at language-based tasks than most specialized techniques for the specific problems the subfields meant to address, that were developed over decades.
That's a pretty effing huge deal, even if it falls short of the AGI 2027 hype
Therefore, the correct attitude to take regarding LLMs is to create ways for them to receive useful feedback on their outputs. When using a coding agent, have the agent work against tests. Scaffold constraints and feedback around it. AlphaZero, for example, had abundant environmental feedback and achieved amazing (superhuman) results. Other Alpha models (for math, coding, etc.) that operated within validation loops reached olympic levels in specific types of problem-solving. The limitation of LLMs is actually a limitation of their incomplete coupling with the external world.
In fact you don't even need a super intelligent agent to make progress, it is sufficient to have copying and competition, evolution shows it can create all life, including us and our culture and technology without a very smart learning algorithm. Instead what it has is plenty of feedback. Intelligence is not in the brain or the LLM, it is in the ecosystem, the society of agents, and the world. Intelligence is the result of having to pay the cost of our execution to continue to exist, a strategy to balance the cost of life.
What I mean by feedback is exploration, when you execute novel actions or actions in novel environment configurations, and observe the outcomes. And adjust, and iterate. So the feedback becomes part of the model, and the model part of the action-feedback process. They co-create each other.
They didn't create those markets, but they're the markets for which LLMs enhance productivity and capability the best right now, because they're the ones that need the least supervision of input to and output from the LLMs, and they happen to be otherwise well-suited to the kind of work it is, besides.
> This isn't unique to models; even we, humans, when operating without feedback, generate mostly slop.
I don't understand the relevance of this.
> Curation is performed by the environment and the passage of time, which reveals consequences.
It'd say it's revealed by human judgement and eroded by chance, but either way, I still don't get the relevance.
> LLMs taken in isolation from their environment are just as sloppy as brains in a similar situation.
Sure? And clouds are often fluffy. Water is often wet. Relevance?
The rest of this is a description of how we can make LLMs work better, which amounts to more work than required to make LLMs pay off enormously for the purposes I called out, so... are we even in disagreement? I don't disagree that perhaps this will change, and explicitly bound my original claim ("so far") for that reason.
... are you actually demonstrating my point, on purpose, by responding with LLM slop?
You should hear HN talk about crypto. If the knife were invented today they'd have a field day calling it the most evil plaything of bandits, etc. Nothing about human nature, of course.
Edit: There it is! Like clockwork.
This also describes most modern software development
I just bought Robokiller. I habe it set to contacts cuz the AI's were calling me all day.
Defenders are supposed to defend against attacks on AI, but here it misfired, so the conversation should be interesting.
That's because the defender is actually a skeptic of AI. But the first sentence sounded like a typical "nothing to see here" defense of AI.
Takes like this are utterly insane to me
Days that I’d normally feel overwhelmed from requests by management are just Claude Code and chill days now.
quite
the transformer innovation was to bring down the cost of producing incorrect, but plausible looking content (slop) in any modality to near zero
not a positive thing for anyone other than spammers
Are there any papers that compare predictive power against compute needed?
In many cases, I can't even see how many GPU hours or what size cluster of what GPU's the pretraining required. If I can't afford it, then it doesn't matter what it achieved. What I can afford is what I have to choose from.
Wish there were more hours in the day.
As somebody who was a biiiiig user of probabilistic graphical models, and felt kind of left behind in this brave new world of stacked nets, I would love for my prior knowledge and experience to become valuable for a broader set of problem domains. However, I don't see it yet. Hope you are right!
Source: I am a PhD student, this is kinda my wheelhouse
I agree. Causal inference and symbolic reasoning would SUPER juicy nuts to crack , more so than what we got from transformers.
The softmax has issues regarding attention sinks [1]. The softmax also causes sharpness problems [2]. In general this decision boundary being Euclidean dot products isn't actually optimal for everything, there are many classes of problem where you want polyhedral cones [3]. Positional embedding are also janky af and so is rope tbh, I think Cannon layers are a more promising alternative for horizontal alignment [4].
I still think there is plenty of room to improve these things. But a lot of focus right now is unfortunately being spent on benchmaxxing using flawed benchmarks that can be hacked with memorization. I think a really promising and underappreciated direction is synthetically coming up with ideas and tests that mathematically do not work well and proving that current arhitectures struggle with it. A great example of this is the VITs need glasses paper [5], or belief state transformers with their star task [6]. The Google one about what are the limits of embedding dimensions also is great and shows how the dimension of the QK part is actually important to getting good retrevial [7].
[1] https://arxiv.org/abs/2309.17453
[2] https://arxiv.org/abs/2410.01104
[3] https://arxiv.org/abs/2505.17190
[4] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5240330
[5] https://arxiv.org/abs/2406.04267
No but seriously, just fix the fucking softmax. Add a dedicated "parking spot" like GPT-OSS does and eat the gradient flow tax on that, or replace softmax with any of the almost-softmax-but-not-really candidates. Plenty of options there.
The reason why we're "benchmaxxing" is that benchmarks are the metrics we have, and the only way by which we can sift through this gajillion of "revolutionary new architecture ideas" and get at the ones that show any promise at all. Of which there are very few, and fewer still that are worth their gains when you account for: there not being an unlimited amount of compute. Especially not when it comes to frontier training runs.
Memorization vs generalization is a well known idiot trap, and we are all stupid dumb fucks in the face of applied ML. Still, some benchmarks are harder to game than others (guess how we found that out), and there's power in that.
literally every new release of something point X model of every major player includes some benchmark graphs to show off
Not familiar with this topic, but intrigued-anywhere I can read more about it?
Having done my PhD in probabilistic programming... what?
In biology, PGMs were one of the first successful forms of "machine learning"- given a large set of examples, train a graphical model using probabilities using EM, and then pass many more examples through the model for classification. The HMM for proteins is pretty straightforward, basically just a probabilistic extension of using dynamic programming to do string alignment.
My perspective- which is a massive simplification- is that sequence models are a form of graphical model, although the graphs tend to be fairly "linear" and the predictions generate sequences (lists) rather than trees or graphs.
So, this is really just a BS hype talk. This is just trying to get more funding and VCs.
/s
It doesn't mean that you'll get good results by abandoning prior art, either with LLMs or musicians. But it does signal a sort of personal stress and insecurity, for sure.
I wonder if he can simply sit back and bask in the glory of being one of the most important people during the infancy of AI. Someone needs to interview this guy, would love to see how he thinks.
There are other considerations that don't revolve around money, but I feel it's arrogant to assume success is the only motivation for musicians.
Alternatively, if anything it could be the exact opposite of what you’re describing. Maybe he sees an ecosystem based on hype that provides little value compared to the cost and wants to distance himself from it, like the Keurig inventor.
Arrogance would be if explicitly chose to abandon it because he thought he was better
Edit: there is a cult around transformers.
If you look at AI research papers, most of them are by people trying to earn a PhD so they can get a high-paying job. They demonstrate an ability to understand the current generation of AI and tweak it, they create content for their CVs.
There is actual research going on, but it's tiny share of everything, does not look impressive because it's not a product, or a demo, but an experiment.
I would also just fundamentally disagree with the assertion that a new architecture will be the solution. We need better methods to extract more value from the data that already exists. Ilya Sutskever talked about this recently. You shouldn’t need the whole internet to get to a decent baseline. And that new method may or may not use a transformer, I don’t think that is the problem.
If you check the DeepSeek OCR paper it shows text based tokenization may be suboptimal. Also all of the MoE stuff, reasoning, and RLHF. The 2017 paper is pretty primitive compared to what we have now.
Like humans think about things and learn which may require some differences from feed the internet in to pre-train your transformer.
Models need to move beyond the domain of parsing existing information into existing ideas.
And the decepticons.
We will spend more time in the space until we see bigger roadblocks.
I really wished energy consumption was a very big roadblock that forced them into still researching.
Or it’s possible China just builds the power capabilities faster because they actually build new things
I get the impulse to do something new, to be radically different and stand out, especially when everyone is obsessing over it, but we are going to be stuck with transformers for a while.
There’s a reason so much engineering effort has gone into speculative execution, pipelining, multicore design etc - parallelism is universally good. Even when “computers” were human calculators, work was divided into independent chunks that could be done simultaneously. The efficiency comes from the math itself, not from the hardware it happens to run on.
RNNs are not parallelizable by nature. Each step depends on the output of the previous one. Transformers removed that sequential bottleneck.
Unless you are pushing back on my comment "all kinds" - if so, I meant "all kinds" in the way someone might say "there are all kinds of animals in the forest", it just means "lots of types".
I am not surprised that everyone is trying to make faster horses instead of combustion engines…
Xcelerate•21h ago
ai-christianson•11h ago
Mehvix•11h ago
isn't this what [etched](https://www.etched.com/) is doing?
imtringued•11h ago
kadushka•10h ago
Davidzheng•8h ago