>Pretty much. I think the language network is very similar in many ways to early LLMs, which learn the regularities of language and how words relate to each other. It’s not so hard to imagine, right?
Yet, completely glosses over the role of rhythm in parsing language. LLMs aren’t rhythmic at all, are they? Maybe each token production is a cycle, though… hmm…
Do you have any evidence for this?
I am a former linguistics student (got my masters), and, after years of absenteeism in academia, interested in the current state of the affairs. So: "quite separated in our heads" Evidence for? against?
There are various kinds of afasia, often linked to specific brain areas (Wernicke's and Broca's are well-known). And M/EEG and fMRI research suggests similar distinctions. It is difficult to reconcile with the idea that there is only one language system.
And you will also have noticed that your skills in perception and production differ. You can read/listen better than write/speak. Timing, ambiguity and errors in perception and production differ.
And more logically: the tasks are very different. In perception, you have to perceive the structure and meaning from a highly ambiguous, but ordered input of sound triggering auditory nerves, while during production, meaning is given (in non-linear order), and you have to find a way to fit it in a linear, grammatical order with matching words, which then have to be translated to muscle movements.
However, I find it also unlikely that the networks are totally separate, and I wonder if there are any evidence of areas that encode the "core/abstract" linguistic de/serialization (multidimensional and messy internal semantic information ←→ linear morphophonological information) both ways, or at least mechanism that manages to use gained input network competence to "train" or "manage" output network competence.
Why? Because even though, as you say, there is a differing performance in perception and production, there is also plenty of evidence of gaining linguistic competence from input, and then managing to convert that to performance in output.
Is it though? If rhythm or tone changes meaning, then just add symbols for rhythm and tone to LLM input and train it. You'll get not just words out that differ based on those additional symbols wrapping words, but you'll also get the rhythm and tone symbols in the output.
If you're talking about speech cadence/rhythm, then we also parse written language which doesn't have that. And we're quite capable of parsing a monotone robotic voice speaking with a monotonous mechanical rhythm too.
It almost seems like we got inspiration from our brain to build neural networks!
I find that proposition totally implausible. Some people certainly report only thinking in words & having a continuous inner monologue, but I'm not one of them. I think, then I describe my thoughts in words if I'm speaking or writing or thinking about speaking or writing.
LLMs might be trained via words, but as a backend transformers are not just for words.
They're for high dimensional structured sequences. To make an analogy, transformers are not working on:
Vector<Word>
but Vector<ContextualizedEmbedding>
where words just happens to be a handy training set we use.And, we too, might not think in words, but I bet that we do think using multi-dimensional sequences/vectors.
I also learned to think in hmm "concepts", and then apply a language of my choice to express them. It's a fun skill to have :) Obviously works of Chomsky are great, especially exploring if language evolves mind or is the other way around, does mind evolve language? [let's skip his rather controversial political views lately].
I was blown-away on holiday to Croatia. It was so unexpectedly relatively easily understandable after Czechia, Austria, and Slovenia. I was all, "What just happened!? Shouldn't this be something more like Italian?"
It took only a month for me to be able to communicate in Ukrainian with my ESL students, you're totally right about Cyrillic. And I too think in concepts but switch my brain to express them externally via language, whatever that language may be at the moment. I am terrible at translating OTOH, so unnatural!
But it has it's limits, I got to a point after German and Norwegian that I thought I harbored a super-power. Then I went to school in Hungary ;) I also had an ESL student from Lithuania, yep incomprehensible.
> The brain’s general object-recognition machinery is at the same level of abstractness as the language network. It’s not so different from some higher-level visual areas such as the inferotemporal cortex (opens a new tab) storing bits of object shapes, or the fusiform face area storing a basic face template.
In other words, it sounds like the brain may start with the same basic methods of pattern matching for many different contexts, but then different areas of the brain specialize in looking for patterns in specific contexts such as vision or language.
This seems to align with the research of Jenny Saffran, for example, who has studied how babies recognize language, arguing that this is largely statistical pattern matching.
I knew two brothers that would mix words from different languages while speaking to each other because they shared the same set of languages and presumably used the best words to express their thoughts.
Your daughter probably knows other people generally speak and understand one language at a time and just conforms because its most effective.
I'm not sure if or at what age it might be good to start mixing languages with others who can.
Speaking to babies is incredibly important for linguistics but probably for all types of complex brain function, I don't think there is an upper bound on how many words we should expose children too.
That's downstream of the sponge phase. So much so, that initially we only absorb and don't talk yet.
It's an interesting hint at the deeper evolutionary origins of language in the ability to plan complex actions, providing a neural basis for the observation that language and action planning have this common structure of an overall goal that can be decomposed into a structure of subgoals, which we see formalized in computer programs too.
This is an older reference (1991) where I first heard about it. there are more recent studies reinforcing various aspects of it but I didn't find one that was as comprehensive
In any case, there's a key disanalogy:
> Unlike a large language model, the human language network doesn’t string words into plausible-sounding patterns with nobody home; instead, it acts as a translator between external perceptions (such as speech, writing and sign language) and representations of meaning encoded in other parts of the brain (including episodic memory and social cognition, which LLMs don’t possess).
Level 1: Nearly autonomic — pattern-matched language that acts directly on the nervous system. Evidence: how insults land before you "process" them, how fluent speakers produce speech faster than conscious deliberation allows, and the entire body of work on hypnotic suggestion, which relies on language bypassing conscious evaluation entirely.
Level 2: The conscious formulation you describe — the translator between perception and meaning.
LLMs might be decent models of Level 1 but have nothing corresponding to Level 2. Fedorenko's "glorified parser" could be the Level 1 system.
I don't think so. Fast speakers and hyponotized people are still clearly conscious and "at home" inside, vastly more "human" than any LLM. Deliberation and evaluation imply thinking before you speak but do not imply that you can't otherwise think while you speak.
Doing mathematical proofs might be an extreme example of that: a mathematician has (I am told) an intuition--a thought--but has to work it out rigorously. Once they've done that, the intuition becomes much clearer. I guess.
"We human beings are living systems that exist in language. This means that although we exist as human beings in language and although our cognitive domains (domains of adequate actions) as such take place in the domain of languaging, our languaging takes place through our operation as living systems. Accordingly, in what follows I shall consider what takes place in language[,] as language arises as a biological phenomenon from the operation of living systems in recurrent interactions with conservation of organization and adaptation through their co-ontogenic structural drift, and thus show language as a consequence of the same mechanism that explains the phenomena of cognition:"
French has obligatory subject-verb agreement, gender marking on articles/adjectives, and rich verbal morphology. English has largely shed these. If you trained identical neural networks on French vs English corpora, holding everything else constant, you might expect French models to hit certain capability thresholds earlier — not because of anything about the network, but because the language itself carries more redundant structural information per token.
This would support Fedorenko's view that the language network is revealing structure already present in language, rather than constructing it. The "LLM in your head" isn't doing the thinking — it's a lookup/decode system optimized for whatever linguistic code you learned.
(Disclosure: I'm running this exact experiment. Preregistration: https://osf.io/sj48b)
But also, maybe the difficulty of parsing recruits other/executive function and is beneficial in other ways?
The per phoneme density/efficiency of English is supposed to be quite high as an emergent trade language.
Perhapse speaking a certain language would promote slower more intentional parsing, humility through syntax uncertainty, maybe not, all I know is that from a global network resilience perspective it's good that dumb memes have difficulty propagating across cultures/languages.
Your intuition about "slower more intentional parsing" connects to something I'm exploring: we may parse language at two levels simultaneously; a fast, nearly autonomic level (think: how insults land before you consciously process them) and a slower deliberate level. Whether those levels interact differently across languages is an open question.
Second: multiple levels of language processing have been identified, although it's not at all clear how well separated they are. The higher levels (semantics, pragmatics) are by necessity lagging behind the lower (phonetics, syntax). The higher levels also seem more "deliberate."
I don't think so. It's medicalization or pathologization of dyslexia that's probably more of a thing in Engish. Same way many issues get medicalized and whole cottage industries and jobs grow around them
This was used to build the first modern language translation systems, testing them going from English->french->english. And in reverse.
You could do similar here , understanding that your language is quite stilted legalese.
Edit: there might be other countries with similar rules in place that you could source test data from as well.
Now I will. Thanks.
But no English so you might not be interested.
You're also going to use an artificial neural network to make claims about the human brain? That distance is too large to bridge with a few assumptions.
BTW, nobody believes our language faculties are doing the thinking. There are however, obviously, connections to thought: not only the concepts/meaning, but possibly sharing neural structures, such as the feedback mechanism that allows us to monitor ourselves.
I have a slightly better proposal: if you want to see the effect of gender, genderize English or neutralize French, and compare both versions of the same language. Careful with tokenization, though.
Your proposal is interesting though. Synthetic manipulation of morphology within a single language. Have you seen this done? The challenge I'd anticipate is that "genderized English" wouldn't have natural text to train on, so you'd need to generate it somehow, which introduces its own artifacts. But comparing French vs artificially gender-neutralized French might be feasible with existing parallel corpora. Worth thinking about as a follow-up.
On the neural network → brain distance: agreed it's a leap. The claim isn't that transformers are brains, but that if both are extracting structure from language, they might reveal something about what structure is there to extract. Fedorenko's own comparison to "early LLMs" suggests she thinks the analogy has some merit.
But you have no grounds to ascribe it to the posited difference. Finding no effect might yield more information, but that's hard: given the amount of noise, you're bound to find a great many effects.
> Have you seen this done?
Not in LLMs, but there have been experiments with regularizing languages, and getting people to learn them in Second Language Acquisition (L2) studies. But what I've seen is inconclusive and sometimes outright contradictory.
I think people have also looked via information theory at this. Probably using Markov models.
> Fedorenko's own comparison to "early LLMs" suggests she thinks the analogy has some merit.
I don't think she can seriously entertain that thought. We simply know practically nothing about language processes in the brain. What we know about the hardware is very different from LLMs, early or not.
Just to give an indication of how much we don't know: the Stroop effect (https://en.wikipedia.org/wiki/Stroop_effect) is almost 100 years old. We have no idea what causes it. There's no working model of word recognition. There are only vague suggestions about the origin of the delay. We have no clue how the visual signals for the color and the letters are separated, where they join again, and how that's related to linguistic knowledge. And that's almost 100 years of very, very much research. IF you go to Google Scholar and type "Stroop task", you'll get 197.000 (!) hits. That's nearly 200k articles etc. resulting in no knowledge whatsoever about a very simple, artificial task.
The L2 regularization and information theory pointers are helpful, it will go on my reading list. If you have favorites, I'll start there.
On the "we know nothing" point: I'm sympathetic. The Stroop example is exactly why I'm skeptical of strong claims in either direction. 197k papers and no mechanism suggests language processing has properties we don't yet have frameworks to describe. That's not mysticism. It's just acknowledging the gap between phenomenon and explanation.
One classic finding in linguistics is that languages with lots of morphology tend to have freer word order. Latin has lots of morphology and you can move the verb or subject anywhere in the sentence and it's still grammatical. In a language like English syntax and word order and word choice take on the same role as morphology.
Inflected languages may indeed have more information encoded in each token. But the relative position of the tokens to each other also encodes information. And inflected languages appear to do this less.
Languages with richer morphology may also have smaller vocabularies. To be fair, this is a contested conjecture too. (It depends a lot on how you define a morpheme.) But the theory is that languages like Ojibwe or Sansrkit with rich derivational morphologies and grammatical inflections simply don't need a dozen words for different types of snow, or to describe thinking. A single morpheme with an almost infinite number of inflected forms can carry all the shades of meaning, where different morphemes might be used to make the same distinctions, in a less inflected language.
> Languages with richer morphology may also have smaller vocabularies. To be fair, this is a contested conjecture too.
I agree with the criticism of this to an extent. A lot of has seemed to me like it relies on thinking of English as a sort of normal, baseline language when it is actually very odd. It has so many vowels, and it also isn't open so has all of these little weird distinguishing consonant clusters at the end of syllables. And when you compare it to a language conjugated with a bunch of suffixes, those suffixes gradually both make the words very long, and add a bunch of sounds that can't be duplicated very often at the end of roots without causing confusion.
All of that together means that there's a lot more bandwidth for more words. English, even though it has a lot more words than other languages, doesn't have more precise words. Most of them are vague duplications, including duplicating most of Norman French just to have special, fancy versions of words that already existed. The strong emphasis on position in the grammar and the vast number of vowels also allows it to easily borrow words from other languages without a compelling reason.
I think all of that is enough to explain why English is such an outlier on vocabulary size, and I think you see similar in other languages that share a subset of these features.
One difference I'm betting on: morphological agreement is redundant (same information marked multiple times), while word order encodes information once. Redundancy aids error correction and may lower pattern extraction thresholds. But I'm genuinely uncertain whether that outweighs the structural information carried by strict word order.
Do you have intuitions on which would be "easier" for a statistical learner? Or pointers to relevant literature? The vocabulary size / morpheme count tradeoff is also something I hadn't fully considered as a confound.
You might be interested to look into the Leiden Theory of Language[1][2]. It's been my absolutely favourite fringe theory of mind since I stumbled across the rough premise in 2018, and went looking for other angles on it.
[1] https://www.kortlandt.nl/publications/art067e.pdf
[2]: https://en.wikipedia.org/wiki/Symbiosism
> Language is a mutualist symbiont and enters into a mutually beneficial relationship with its hominid host. Humans propagate language, whilst language furnishes the conceptual universe that guides and shapes the thinking of the hominid host. Language enhances the Darwinian fitness of the human species. Yet individual grammatical and lexical meanings and configurations of memes mediated by language may be either beneficial or deleterious to the biological host.
EDIT: almost forgot the best link!
Language as Organism: A Brief Introduction to the Leiden Theory of Language Evolution https://www.isw.unibe.ch/e41142/e41180/e523709/e546679/2004f...
The mule analogy is going to stick with me. LLMs have inherited the statistical structure of the symbiont without the host: pattern without grounding. Whether that makes them useful instruments for studying the symbiont itself, or just misleading simulacra, is exactly what I'm trying to work out.
Going to dig into Kortlandt tonight.
> LLMs have inherited the statistical structure of the symbiont without the host: pattern without grounding.
I like this. I think it's not too far a leap to suggest something like "soul" without "body" -- a spirit in the truest sense. I think there's real value in the things we've believed ourselves to be made of though deep time, though without evidence or proper provenance. I suspect we've always been grappling to find language for the unnameable things.
Some of my own [somewhat outdated] reflections on language from the time I came across it, in case you're interested :) https://nodescription.net/notes/#2019-07-13
As for gender marking on adjectives--or nouns--it does almost no semantic work in French, except where you're talking about professional titles (doctor, professor...) that can be performed by men or by women.
If you want a heavily inflected language, you should look at something like Turkish, Finnish, Swahili, Quechua, Nahuatl, Inuit... Even Spanish (spoken or written) has more verbal inflection than spoken French.
Genomes have statistical regularities (motifs, codon patterns, regulatory grammar). Language has statistical regularities (morphology, syntax, collocations). Both are sequences with latent structure. Similar architectures trained on either will repeat those structures.
That's consistent with my "instrumentation" view: the transformer is revealing structure that exists in the domain, whether that domain is English, French, or DNA. The architecture is the microscope; the structure was already there.
Language seems to be taking advantage of this pre-existing predictive architecture, and would have again learnt by predicting sensory inputs (heard language), which as we have seen is enough to induce ability to generate it too.
With every technological breakthrough we always posit that the brain has to work like the newly discovered thing. At various times brains were hydraulic, mechanical, electrical, like a computer, like a network. Now, of course, the brain has to be like an LLM.
I do think that a transformer, a somewhat generic hierarchical/parallel predictive architecture, learning from prediction failure, has to be at least somewhat similar to how we learn language, as opposed to a specialized Chompyskan "language organ".
The main difference is perhaps that the LLM is only predicting based on the preceding sequence, while our brain is driving language generation by a combination of sequence prediction and the thoughts being expressed. You can think of the thoughts being a bias to the language generation process, a bit like language being a bias to a diffusion based image generator.
What would be cool would be if we could to some "mechanistic interpretability" work on the brain's language generation circuits, and perhaps discover something similar to induction heads.
Indeed, and I wasn't even saying it's wrong, it may be pretty close.
> What would be cool would be if we could to some "mechanistic interpretability" work on the brain's language generation circuits, and perhaps discover something similar to induction heads.
Yeah, I wouldn't be surprised. And maybe the more we find out about the brain, it could lead to some new insights about how to improve AI. So we'd sort of converge from both sides.
Given that the only similarity between the two of is just the "network" structure I'd say that point is pretty weak. The name "artificial neural network" it's just an historical artifact and an abstraction totally disconnected from the real thing.
I had a sad day in college when I thought I'd build my own ANN using C++.
First thing I did was create a "Neuron" class, to mimic the idea of a human neuron.
Second thing I did was realize that ANNs are actually just Weiner filters with a sigmoid on top. The base unit is not a "neuron".
There's also a progression in your sequence. There were rudimentary mechanical calculating devices, then electrical devices begat electrical computers, and LLMs are a particular program running on a computer. So in a way the analogies are becoming more refined as we develop systems more and more capable of mimicking human capabilities.
Not just in full language, mind, but consider the last time you heard a song in a major key? Do you even know what that means? Because many of us do not.
Same goes for listening to people discuss things like sports. I'm inclined to think many people effectively run a simulation in their mind of a game as they listen to it broadcast. This almost certainly isn't inherent to the language, it is part of the learning of it, though. Think looking over lists of the moves in a chess game. Then go from that to laying out the pieces as they are after that list. Or calling what the next move can be.
Can this be a completely separate set of "circuitry" in our brains that first parses the language and then builds the simulation? I suppose. Seems more likely there is something that is active between the two that can effectively get merged in advanced practitioners.
New Vistas to study Bhartrhari: Cognitive NLP (Natural Language Processing) - https://arxiv.org/abs/1810.04440
moralIsYouLie•2mo ago
I can't do this anymore.
Al-Khwarizmi•1mo ago
Of course this doesn't mean one shouldn't question what she says (that would be an obvious authority fallacy), but I do think it's fair to say that if you want to question it, the argument should be more elaborate that "this sounds like she has no idea of the topic".
Timwi•1mo ago
jimbokun•1mo ago
Also:
> it gives me the (probably flawed) impression that her research isn't the part of her life that's supposed to be important or impressive.
I don't see this at all in the article. There's just some human interest content to make her research more approachable.
mcswell•1mo ago