The ear does not do a Fourier transform

https://www.dissonances.blog/p/the-ear-does-not-do-a-fourier-transform

234•izhak•4h ago

Comments

p0w3n3d•4h ago

Tbh I used to think that it does. For example, when playing higher notes, it's harder to hear the out-of-tune frequencies than on the lower notes.

fallingfrog•3h ago

I haven't noticed that effect, to be honest. Actually I think its the really low bass frequencies that are harder to tune- especially if you remove the harmonics and just leave the fundamental.

Are you perhaps experiencing some high frequency hearing loss?

jacquesm•3h ago

It's even more complex than that. The low notes are hard to tune because the fundamentals are very close to each other and you need to have super good hearing to match the beats, fortunately they sound for a long time so that helps. Missing fundamentals are a funny thing too, you might not be 'hearing' what you think you hear at all! The high notes are hard to tune because they sound very briefly (definitely on a piano) and even the slightest movement of the pin will change the pitch considerably.

In the middle range (say, A2 through A6) neither of these issues apply, so it is - by far - the easiest to tune.

TheOtherHobbes•3h ago

See also, psychoacoustics. The ear doesn't just do frequency decomposition. It's not clear if it even does frequency decomposition. What actually happens is lot of perceptual modelling and relative amplitude masking which makes it possible to do real-time source separation.

Which is why we can hear individual instruments in a mix.

And this ability to separate sources can be trained. Just as pitch perception can be trained, with varying results from increased acuity up to full perfect pitch.

A component near the bottom of all that is range-based perception of consonance and dissonance, based on the relationships between beat frequencies and fundamentals.

Instead of a vanilla Fourier transform, frequencies are divided into multiple critical bands (q.v.) with different properties and effects.

What's interesting is that the critical bands seem to be dynamic, so they can be tuned to some extent depending on what's being heard.

Most audio theory has a vanilla EE take on all of this, with concepts like SNR, dynamic range, and frequency resolution.

But the experience of audio is hugely more complex. The brain-ear system is an intelligent system which actively classifies, models, and predicts sounds, speech, and music as they're being heard, at various perceptual levels, all in real time.

jacquesm•3h ago

Yes, indeed, to think about the ear as the thing that hears is already a huge error. The ear is - at best - a faulty transducer with its own unique way of turning air pressure variations into nerve impulses and what the brain does with those impulses is as much a part of hearing as the mechanics of the ear, just like a computer keyboard does not interpret your keystrokes, it just turns them into electrical signals.

fallingfrog•38m ago

Welll. On guitar you cant really use the "matching the beats" or the thing where you play the 4th on the string below and make them sound in unison, because if you do that all the way up the neck your guitar will be tuned to Just intonation instead of equal interval intonation and certain chords will sound very bad. A series of perfect 4ths and a perfect 3rd does not add up to an octave. Its better to reference everything to the low e string and just kind of know where the pitches are supposed to land.

That's a side note, the rest of what you wrote was very informative!

philip-b•28m ago

No, it's vice versa. If two wind instruments play unison slightly out of tune from each other, it will be very noticeable. If the bass is slightly out of tune or mistakenly plays a different note a semitone up or down, it's easy to not notice it.

bloppe•3h ago

Man, I've been spreading disinformation for years.

rolph•3h ago

the closest i have been, was acoustic phase discrimination by owls.

there appears to be no software for this, its all hardware, the signal format flips as it travels through the anatomy.

nakulgarg22•3h ago

This might be interesting for you - https://nakulg.com/assets/papers/owlet_mobisys2021_nakul.pdf

Owls use asymmetric skull structure which helps them in spatial perception of sound.

rolph•3h ago

that was the start of it. the offset otic openings result in differential arrival times of the acoustic peaks, thus phase differential.

neurosynaptically, there is no phase, there is frequency shift corresponding to presynaptic intensity, and there is spatio-temporal integration of these signals. temporal integration is where "phase" matters

its all a mix of "digital" all or nothing "gates" and analog frequency shift propagation of the "gate" output.

its all made nebulous by the adaptive, and hysteretic nature of the elements in neural "circuitry"

lukeinator42•2h ago

also, the common ancestor of mammals and birds did not have a tympanic ear, so sound localization evolved differently in the avian vs. mammalian hearing systems. A good review is here: https://journals.physiology.org/doi/pdf/10.1152/physrev.0002.... How the brain calculates interaural time delays is actually an interesting problem as the time delays are so short, that it is less time than a neuron has to fire an action potential.

saltcured•3h ago

This is one of those pedant vs cocktail chatterer distinctions. It's an interesting dive and gives a nice, triggering, headline.

But, to the vast majority who don't really know or care about the math, "Fourier Transform" is, at best, a totem for the entire concept space of "frequency domain", "spectral decomposition", etc.

They are not making fine distinctions of tradeoffs among different methods. I'm not sure I'd even call it disinformation to tell this hand-wavy story and pique someone's interest in a topic they otherwise never thought about...

rolph•3h ago

FT is frequency domain representation.

neural signaling by action potential, is also a representation of intensity by frequency.

the cochlea is where you can begin to talk about bio-FT phenomenon.

however the format "changes" along the signal path, whenever a synapse occurs.

xeonmc•3h ago

Nit: It’s an unfortunate confusion of naming conventions, but Fourier Transform in the strictest sense implies an infinite “sampling” period, while the finite “sample” period counterpart would correspond to Fourier Series even though we colloquially refer to them interchangeably.

(I had put “sampling” in quotes as they’re actually “integration period” in this context of continuous time integration, though it would be less immediately evocative of the concept people are colloquially familiar with. If we actually further impose a constraint of finite temporal resolution so that it is honest-to-god “sampling” then it becomes Discrete Fourier Transform, of which the Fast Fourier Transform is one implementation of.)

It is this strict definition that the article title is rebuking, but it’s not quite what the colloquial usage loosely evokes in most people’s minds when we usually say Fourier Transform as an analysis tool.

So this article should have been comparing to Fourier Series analysis rather than Fourier Transform in the pedantic sense, albeit that’ll be a bit less provocative.

Regardless, it doesn’t at all take away from the salient points of this excellent article which are really interesting reframing of the concepts: what the ear does mechanistically is applying a temporal “weigting function” (filter) so it’s somewhere between Fourier series and Fourier transform. This article hits the nail on the head on presenting the sliding scale of conjugate domain trade offs (think: Heisenberg)

meowkit•3h ago

I was a bit peeved by the title, but I think its a fair use of clickbait as the article has a lot of little details about acoustics in humans that I was unfamiliar with (i.e. a link to a primer on the the transduction implementation of cochlear cilia)

But yeah there is a strict vs colloquial collision here.

BrenBarn•2h ago

Yeah, it's sort of like saying the ear doesn't do "a" Fourier transform, it does a bunch of Fourier transforms on samples of data, with a varying tradeoff between temporal and frequency resolution. But most people would still say that's doing a Fourier transform.

As the article briefly mentions, it's a tempting hypothesis that there is a relationship between the acoustic properties of human speech and the physical/neural structure of the auditory system. It's hard to get clear evidence on this but a lot of people have a hunch that there was some coevolution involved, with the ear's filter functions favoring the frequency ranges used by speech sounds.

foobarian•2h ago

This is something you quickly learn when you read the theory in the textbook, get excited, and sit down to write some code and figure out that you'll have to pick a finite buffer size. :-)

tryauuum•3h ago

man I need to finally learn what a Fourier transform is

TobTobXX•3h ago

3Blue1Brown has a really good explanation here: https://www.youtube.com/watch?v=spUNpyF58BY

It gave me a much better intuition than my math course.

jama211•3h ago

Hahaha, I was working on learning these in second year uni… which was also exactly when I switched from an electrical engineering focussed degree to a software one!

Perhaps finally I should learn too…

garbageman•3h ago

It's an absolutely brilliant bit of maths that breaks a complex waveform into the individual components. Kind of like taking an orchestral song and then working out each individual instrument's contribution. Learning about this left me honestly aghast and in shock that it's not only possible but that someone (Joseph Fourier) figured it out and then shared it with the world.

This video does a great job explaining what it is and how it works to the layman. 3blue1brown - https://www.youtube.com/watch?v=spUNpyF58BY

adzm•3h ago

the very simplest way to describe it: it is what turns a waveform (amplitude x time) to a spectrogram like on a stereo (amplitude x frequency)

Chabsff•3h ago

And phase. People always forget about the phase as if it was purely imaginary.

JKCalhoun•2h ago

Ha ha, as I understand it, phase is imaginary in a Fourier transform. Complex numbers are used and the imaginary portion does indeed represent phase.

I have been told that reversing the process — creating a time-based waveform — will not resemble (visually) the original due to this phase loss in the round-tripping. But then our brain never paid phase any mind so it will sound the same to our ears. (Yay, MP3!)

Chabsff•2h ago

I'm glad someone picked up on my dumb joke :), I was getting worried.

That being said, round-tripping works just fine, axiomatically so, until you go out of your way to discard the imaginary component.

DonHopkins•11m ago

Even more reflectively imaginative than the Fourier Transform is the Cepstrum!

https://en.wikipedia.org/wiki/Cepstrum

It’s literally a "backwards spectrum", and the authors in 1963 were having such fun they reversed the words too: quefrency => frequency, saphe => phase, alanysis => analysis, liftering => filtering

The cepstrum is the "spectrum of a log spectrum," where taking the log turns multiplicative spectral features into additive ones, the foundation of cepstral alanysis, and later, the physiologically tuned Mel-frequency cepstrum used in audio compression and speech recognition.

As Tukey might say: once you start doing cepstral alanysis, there’s no turning back, except inversely.

Skeptics said he was just going through a backwards phase, but it turned out to work! ;)

https://news.ycombinator.com/item?id=24386845

DonHopkins on Sept 5, 2020 | parent | context | favorite | on: Mathematicians should stop naming things after eac...

I love how they named the inverse spectrum the cepstrum, which uses quefrency, saphe, alanysis, and liftering, instead of frequency, phase, analysis and filtering. It should not be confused with the earlier concept of the kepstrum, of course! ;)

https://en.wikipedia.org/wiki/Cepstrum

>References to the Bogert paper, in a bibliography, are often edited incorrectly. The terms "quefrency", "alanysis", "cepstrum" and "saphe" were invented by the authors by rearranging some letters in frequency, analysis, spectrum and phase. The new invented terms are defined by analogies to the older terms.

>Thus: The name cepstrum was derived by reversing the first four letters of "spectrum". Operations on cepstra are labelled quefrency analysis (aka quefrency alanysis[1]), liftering, or cepstral analysis. It may be pronounced in the two ways given, the second having the advantage of avoiding confusion with "kepstrum", which also exists (see below). [...]

>The kepstrum, which stands for "Kolmogorov-equation power-series time response", is similar to the cepstrum and has the same relation to it as expected value has to statistical average, i.e. cepstrum is the empirically measured quantity, while kepstrum is the theoretical quantity. It was in use before the cepstrum.[12][13]

https://news.ycombinator.com/item?id=43341806

DonHopkins 7 months ago | parent | context | favorite | on: What makes code hard to read: Visual patterns of c...

Speaking of filters and clear ergonomic abstractions, if you like programming languages with keyword pairs like if/fi, for/rof, while/elihw, goto/otog, you will LOVE the cabkwards covabulary of cepstral quefrency alanysis, invented in 1963 by B. P. Bogert, M. J. Healy, and J. W. Tukey:

cepstrum: inverse spectrum

lifter: inverse filter

saphe: inverse phase

quefrency alanysis: inverse frequency analysis

gisnal orpcessing: inverse signal processing

https://en.wikipedia.org/wiki/Cepstrum

https://news.ycombinator.com/item?id=44062022

DonHopkins 5 months ago | parent | context | favorite | on: The scientific “unit” we call the decibel

At least the Mel-frequency cepstrum is honest about being a perceptual scale anchored to human hearing, rather than posing as a universally-applicable physical unit.

https://en.wikipedia.org/wiki/Mel-frequency_cepstrum

>Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum"). The difference between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal spectrum. This frequency warping can allow for better representation of sound, for example, in audio compression that might potentially reduce the transmission bandwidth and the storage requirements of audio signals.

https://en.wikipedia.org/wiki/Psychoacoustics

>Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branch of science studying the psychological responses associated with sound including noise, speech, and music. Psychoacoustics is an interdisciplinary field including psychology, acoustics, electronic engineering, physics, biology, physiology, and computer science.

xeonmc•1h ago

Actually, by the Kramers-Kronig relation you can infer the imaginary part just from the real parts, if given that your time signal is causal. So the phase isn’t actually lost in any way at all, if you assume causality.

Also, pedantic nit: phase would be the imaginary exponent of the spectrum rather than the imaginary part directly, i.e, you take the logarithm of the complex amplitude to get log-magnitude (real) plus phase (imag)

CGMthrowaway•3h ago

It's a Copy>Paste Special>Transpose on a waveform, converting Rows/Columns that are time/amplitude (with wavelength embedded) into Rows/Columns that are frequency/amplitude (for a snapshot in time).

People love to go on about how brilliant it is and they're probably right but that's how I understand it.

TheOtherHobbes•3h ago

Pretty much, but phase is also included. Which matters for some things.

bobmcnamara•3h ago

But mostly not for ears it turns out!

Phase matters for some wideband signals, but most folks struggle to tell apart audio from hilbert-90-degree-shifted-audio

xeonmc•3h ago

Phase is required if it is to be a reversible transform. Otherwise would just be a Functional.

anigbrowl•2h ago

Read this (which is free): The Scientist's and Engineer's Guide to Digital Signal Processing* https://www.dspguide.com
It's very comprehensive, but it's also very well written and walks you through the mechanics of Fourier transforms in a way that makes them intuitive.

dsego•13m ago

humble plug https://dsego.github.io/demystifying-fourier/

edbaskerville•3h ago

To summarize: the ear does not do a Fourier transform, but it does do a time-localized frequency-domain transform akin to wavelets (specifically, intermediate between wavelet and Gabor transforms). It does this because the sounds processed by the ear are often localized in time.

The article also describes a theory that human speech evolved to occupy an unoccupied space in frequency vs. envelope duration space. It makes no explicit connection between that fact and the type of transform the ear does—but one would suspect that the specific characteristics of the human cochlea might be tuned to human speech while still being able to process environmental and animal sounds sufficiently well.

A more complicated hypothesis off the top of my head: the location of human speech in frequency/envelope is a tradeoff between (1) occupying an unfilled niche in sound space; (2) optimal information density taking brain processing speed into account; and (3) evolutionary constraints on physiology of sound production and hearing.

AreYouElite•3h ago

Do you believe it might be possible that the frequency band of human speech is not determined by such factors at all but more of a function of height? kids have higher voices adults have deeper voices. Similar to stringed instruments: viola high pitched and bass low pitched.

I'm no expert in these matters just speculating...

fwip•2h ago

It's not height, but vocal cord length and thickness. Longer vocal cords (induced by testosterone during puberty) vibrate more slowly, with a lower frequency/pitch.

matthewdgreen•3h ago

If you take this thought process even farther, specific words and phonemes should occupy specific slices of the tradeoff space. Across all languages and cultures, an immediate warning that a tiger is about to jump on you should sit in a different place than a mother comforting a baby (which, of course, it does.) Maybe that even filters down to ordinary conversational speech.

xeonmc•3h ago

Analogy: when you knock on doors, how do you decide what rhythm and duration to use, so that it won’t be mistaken as accidentally hitting the door?

toast0•3h ago

Shave and a haircut is the only option in my knocking decision tree.

cnity•2h ago

Thanks for giving your two bits on the matter.

a-dub•3h ago

> At high frequencies, frequency resolution is sacrificed for temporal resolution, and vice versa at low frequencies.

this is the time-frequency uncertainty principle. intuitively it can be understood by thinking about wavelength. the more stretched out the waveform is in time, the more of it you need to see in order to have a good representation of its frequency, but the more of it you see, the less precise you can be about where exactly it is.

> but it does do a time-localized frequency-domain transform akin to wavelets

maybe easier to conceive of first as an arbitrarily defined filter bank based on physiological results rather than trying to jump directly to some neatly defined set of orthogonal basis functions. additionally, orthogonal basis functions cannot, by definition, capture things like masking effects.

> A more complicated hypothesis off the top of my head: the location of human speech in frequency/envelope is a tradeoff between (1) occupying an unfilled niche in sound space; (2) optimal information density taking brain processing speed into account; and (3) evolutionary constraints on physiology of sound production and hearing.

(4) size of the animal.

notably: some smaller creatures have supersonic vocalization and sensory capability, sometimes this is hypothesized to complement visual perception for avoiding predators, it also could just have a lot to do with the fact that, well, they have tiny articulators and tiny vocalizations!

Terr_•2h ago

> it also could just have a lot to do with the fact that, well, they have tiny articulators and tiny vocalizations!

Now I'm imagining some alien shrew with vocal-cords (or syrinx, or whatever) that runs the entire length of its body, just so that it can emit lower-frequency noises for some reason.

bragr•1h ago

Well without the humorous size difference, this is basically what whales and elephants do for long distance communication.

Y_Y•44m ago

Sounds like an antenna, if you'll accept electromagnetic noise then there are some fish that could pass for your shrew, e.g. https://en.wikipedia.org/wiki/Gymnotus

SoftTalker•3h ago

Ears evolved long before speech did. Probably in step with vocalizations however.

Sharlin•3h ago

Not sure about that; I'd guess that vibration-sensing organs first evolved to sense disturbances (in water, on seafloor, later on dry ground and in air) caused by movement, whether of a predator, prey, or a potential mate. Intentional vocalizations for signalling purposes then evolved to utilize the existing modality.

FarmerPotato•2h ago

Is that an human understanding or is it just an AI that read the text and ignored the pictures?

Why do we need a summary in a post that adds nothing new to the conversation?

pests•2h ago

Are you saying your parent post was an AI summary? There is original speculation at the end and it didn’t come off that way to me.

dsp_person•2h ago

Even if it is doing a wavelet transform, I still see that as made of Fourier transforms. Not sure if there's a good way to describe this.

We can make a short-time fourier transform or a wavelet transform in the same way either by:

- filterbank approach integrating signals in time

- take fourier transform of time slices, integrating in frequency

The same machinery just with different filters.

psunavy03•2h ago

Well from an evolutionary perspective, this would be unsurprising, considering any other forms of language would have been ill-fitted for purpose and died out. This is really just a flavor of the anthropic principle.

lgas•1h ago

> It does this because the sounds processed by the ear are often localized in time.

What would it mean for a sound to not be localized in time?

littlestymaar•1h ago

A continuous sinusoidal sound, I guess?

hansvm•1h ago

It would look like a Fourier transform ;)

Zooming in to cartoonish levels might drive the point home a bit. Suppose you have sound waves

  |---------|---------|---------|

What is the frequency exactly 1/3 the way between the first two wave peaks? It's a nonsensical question. The frequency relates to the time delta between peaks, and looking locally at a sufficiently small region of time gives no information about that phenomenon.

Let's zoom out a bit. What's the frequency over a longer period of time, capturing a few peaks?

Well...if you know there is only one frequency then you can do some math to figure it out, but as soon as you might be describing a mix of frequencies you suddenly, again, potentially don't have enough information.

That lack of information manifests in a few ways. The exact math (Shannon's theorems?) suggests some things, but the language involved mismatches with human perception sufficiently that people get burned trying to apply it too directly. E.g., a bass beat with a bit of clock skew is very different from a bass beat as far as a careless decomposition is concerned, but it's likely not observable by a human listener.

Not being localized in time means* you look at longer horizons, considering more and more of those interactions. Instead of the beat of a 4/4 song meaning that the frequency changes at discrete intervals, it means that there's a larger, over-arching pattern capturing "the frequency distribution" of the entire song.

*Truly time-nonlocalized sound is of course impossible, so I'm giving some reasonable interpretation.

jancsika•50m ago

> It's a nonsensical question.

Are you talking about a discrete signal or a continuous signal?

xeonmc•1h ago

Means that it is a broad spectrum signal.

Imagine the dissonant sound of hitting a trashcan.

Now imagine the sound of pressing down all 88 keys on a piano simultaneously.

Do they sound similar in your head?

The localization is located at where the phase of all frequency components are aligned coherently construct into a pulse, while further down in time their phases are misaligned and cancel each other out.

patrickthebold•1h ago

I think I might be missing something basic, but if you actually wanted to do a Fourier transform on the sound hitting your ear, wouldn't you need to wait your entire lifetime to compute it? It seems pretty clear that's not what is happening, since you can actually hear things as they happen.

xeonmc•1h ago

You’ll also need to have existed and started listening before the beginning of time, forever and ever. Amen.

cherryteastain•1h ago

Not really, just as we can create spectrograms [1] for a real time audio feed without having to wait for the end of the recording by binning the signal into timewise chunks.

[1] https://en.wikipedia.org/wiki/Spectrogram

IshKebab•54m ago

Those use the Short-Time Fourier Transform, which is very much like what the ear does.

https://en.wikipedia.org/wiki/Short-time_Fourier_transform

bonoboTP•1h ago

Yes, for the vanilla Fourier transform you have to integrate from negative to positive infinity. But more practically you can put put a temporally finite-support window function on it, so you only analyze a part of it. Whenever you see a 2d spectrogram image in audio editing software, where the audio engineer can suppress a certain range of frequencies in a certain time period they use something like this.

It's called the short-time Fourier transform (STFT).

https://en.wikipedia.org/wiki/Short-time_Fourier_transform

IshKebab•55m ago

Yes exactly. This is a classic "no cats and dogs don't actually rain from the sky" article.

Nobody who knows literally anything about signal processing thought the ear was doing a Fourier transform. Is it doing some like a STFT? Obviously yes and this article doesn't go against that.

km3r•40m ago

> one would suspect that the specific characteristics of the human cochlea might be tuned to human speech while still being able to process environmental and animal sounds sufficiently well.

I wonder if these could be used to better master movies and television audio such that the dialogue is easier to hear.

kiicia•37m ago

You are expecting too much, we still have no technology to do that, unless it’s about clarity of advertisement jingles /s

crazygringo•16m ago

Yeah, this article feels like it's very much setting up a ridiculous strawman.

Nobody who knows anything about signal processing has ever suggested that the ear performs a Fourier transform across infinite time.

But the ear does perform something very much akin to the FFT (fast Fourier transform), turning discrete samples into intensities at frequencies -- which is, of course, what any reasonable person means when they say the ear does a Fourier transform.

This article suggests it's accomplished by something between wavelet and Gabor. Which, yes, is not exactly a Fourier transform -- but it's producing something that is about 95-99% the same in the end.

And again, nobody would ever suggest the ear was performing the exact math that the FFT does, down to the last decimal point. But these filters still work essentially the same way as the FFT in terms of how they respond to a given frequency, it's really just how they're windowed.

So if anyone just wants a simple explanation, I would say yes the ear does a Fourier transform. A discrete one with windowing.

kazinator•3h ago

> A Fourier transform has no explicit temporal precision, and resembles something closer to the waveforms on the right; this is not what the filters in the cochlea look like.

Perhaps the ear does someting more vaguely analogous to a discrete Fourier transforms on samples of data, which is what we do in a lot of signal processing.

In signal processing, we take windowed samples, and do discrete transforms on these. These do give us some temporal precision.

There is a trade off there between frequency and temporal precision, analgous to the Pauli exclusion principle in quantum mechanics. The better we know a frequency, the less precisely we know the timing. Only an infinite, periodic signal has a single precise frequency (or precise set of harmonics) which are infinitely narrow blips in the frequency domain.

The continuous Fourier transform deals with periodic signals only. We transform an entire function like sin(x) over the entire domain. If that domain is interpreted as time, we are including all of eternity, so to speak from negative infinite time to positive.

xeonmc•3h ago

> analgous to the Pauli exclusion principle

Did you mean the Heisenberg Uncertainty Principle instead? Or is there actually some connection of Pauli Exlusion Principle to conjugate transforms that I was’t aware of?

kvakkefly•29m ago

They are not connected afaik.

HarHarVeryFunny•2h ago

> There is a trade off there between frequency and temporal precision

Sure, and the FFT isn't inherently biased towards one vs the other. If you take an FFT over a long time window (narrowband spectrogram) then you get good frequency resolution at the cost of time resolution, and vice versa for a short time window (wideband spectrogram).

For speech recognition ideally you'd want to use both since they are detecting different things. TFA is saying that this is in fact what our cochlea filter bank is doing, using different types of filter at different frequency ranges - better frequency resolution at lower frequencies where the formants are (carrying articulatory information), and better time resolution at the high frequencies generated by fricatives where frequency doesn't matter but accurate onset detection is useful for detecting plosives.

energy123•2h ago

STFT?

adornKey•3h ago

This subject has bothered me for a long time. My question to guys into acoustics was always: If the cochlea performs some kind of Fourier transform, what are the chances, that it uses sinus waves as a base for the vector-space? - if it did anything like that it could just as good use any slightly different wave-forms as a base for transformation. Stiffness and non-linearity will for sure take care that any ideal rubber model in physics will in reality be different from the perfect sinus.

FarmerPotato•2h ago

I find it beautiful to see the term "sinus wave."

empiricus•33m ago

well, cochlea is working withing the realm of biological and physical possibilities. basically it is a triangle through which waves are propagating, and sensors along the edge. smth smth this is similar to a filter bank of gabor filters that respond to rising freq along the triangle edge. ergo you can say fourier, but it only means sensors responding to different freq becasue of their location.

adornKey•15m ago

Yeah, but not only the frequency is important - the wave-form is very relevant. For example if your wave-form is a triangle, listerners will tell you that it is very noisy compared to a simple sinus. If you use sinus as a base of your vector space triangles really look like a noisy mix. My question is, if the basic elements are really sinus, or if the basic Eigen-Waves of the cochlea are other Wave-Forms (e.g. slightly wider or narrower than sinus, ...). If physics in the ear isn't linear, maybe sinus isn't the purest wave-form for a listener.

Most people in Physics only know sinus and maybe sometimes rectangles as a base for transformations, but mathematically you could use a lot of other things - maybe very similar to sinus, but different.

gowld•3h ago

Why is there no box diagram for cochlea "between wavelet and Gabor" ?

anticensor•13m ago

Would look still too much like wavelet.

shermantanktop•3h ago

The thesis about human speech occupying less crowded spectrum is well aligned with a book called "The Great Animal Orchestra" (https://www.amazon.com/Great-Animal-Orchestra-Finding-Origin...).

That author details how the "dawn chorus" is composed of a vast number of species making noise, but who are able to pick out mating calls and other signals due to evolving their vocalizations into unique sonic niches.

It's quite interesting but also a bit depressing as he documents the decline in intensity of this phenomenon with habitat destruction etc.

HarHarVeryFunny•2h ago

Birds have also evolved to choose when to vocalize to best be heard - doing so earlier in urban areas where later there will be more traffic noise, and later in some forest environments to avoid being drowned out by the early rising noisy insects.

kulahan•1h ago

Probably worth mentioning that as evolutions that allow them to compete well in nature die out, ones that allow them to compete well in cities takes their place. Evolution is always a series of tradeoffs.

Maybe we don't have sonic variation, but temporal instead.

bitwize•10m ago

Nature uh, finds a way.

brcmthrowaway•2h ago

OT: Does anyone here believe in Intelligent Design?

xeonmc•1h ago

As low-level physical mechanistic processes? Absolutely not.

As higher-order, statistically transparent abstract nudges of providence existing outside the confines of causality? Metaphysically interesting but philosophically futile.

superb-owl•1h ago

The title seems a little click-baity and basically wrong. Gabor transforms, wavelet transforms, etc are all generalizations of the fourier transform, which give you a spectrum analysis at each point in time

The content is generally good but I'd argue that the ear is indeed doing very Fourier-y things.

debo_•1h ago

Fourear transform

antognini•1h ago

If you want to get really deep into this, Richard Lyon has spent decades developing the CARFAC model of human hearing: Cascade of Asymmetric Resonators with Fast-Acting Compression. As far as I know it's the most accurate digital model of human hearing.

He has a PDF of his book about human hearing on his website: https://dicklyon.com/hmh/Lyon_Hearing_book_01jan2018_smaller...

javier_e06•1h ago

This is fascinating.

I know of vocoders in the military hardware that encode voices to resemble something more simple for compression (a low-tone male voice), smaller packets that take less bandwidth. This evolution of the ear to must also have evolved with our vocal chords and mouth to occupy available frequencies for transmission and reception for optimal communication.

The parallels with waveforms don't end there. Waveforms are also optimized for different terrains (urban, jungle).

Are languages organic waveforms optimized to ethnicity and terrain?

Cool article indeed.

rolph•39m ago

supplemental:

Neuroanatomy, Auditory Pathway

https://www.ncbi.nlm.nih.gov/books/NBK532311/

Cochlear nerve and central auditory pathways

https://www.britannica.com/science/ear/Cochlear-nerve-and-ce...

Molecular Aspects of the Development and Function of Auditory Neurons

https://pmc.ncbi.nlm.nih.gov/articles/PMC7796308/

fennec-posix•32m ago

"It appears that human speech occupies a distinct time-frequency space. Some speculate that speech evolved to fill a time-frequency space that wasn’t yet occupied by other existing sounds."

I found this quite interesting, as I have noticed that I can detect voices in high-noise environments. E.g. HF Radio where noise is almost a constant if you don't use a digital mode.

amelius•14m ago

What does the continuous tingling of a hair cell sound like to the subject?

Affinity Studio now free

The ear does not do a Fourier transform

Apple reports fourth quarter results

Jujutsu at Google [video]

Springs and bounces in native CSS

Minecraft HDL, an HDL for Redstone

TruthWave – A platform for corporate whistleblowers

987654321 / 123456789

Launch HN: Propolis (YC X25) – Browser agents that QA your web app autonomously

Show HN: I made a heatmap diff viewer for code reviews

NPM flooded with malicious packages downloaded more than 86k times

Free software scares normal people

Lenses in Julia

Show HN: Run a GitHub Actions step in a gVisor sandbox

Learn Multiplatform Z80 Assembly Programming with Vampires

Independently verifying Go's reproducible builds

Show HN: Meals You Love – AI-powered meal planning and grocery shopping

Israel demanded Google and Amazon use secret 'wink' to sidestep legal orders

Zig's New Async I/O

Show HN: In a single HTML file, an app to encourage my children to invest

ZOZO's Contact Solver for physics-based simulations

Taking money off the table

I have released a 69.0MB version of Windows 7 x86

Some people can't see mental images

US declines to join more than 70 countries in signing UN cybercrime treaty

Estimating the perceived 'claustrophobia' of New York City's streets (2024)

Frozen DuckLakes for Multi-User, Serverless Data Access

Tweakcc

Qt Creator 18 Released

Replacing EBS and Rethinking Postgres Storage from First Principles

Affinity Studio now free

The ear does not do a Fourier transform

Apple reports fourth quarter results

Jujutsu at Google [video]

Springs and bounces in native CSS

Minecraft HDL, an HDL for Redstone

TruthWave – A platform for corporate whistleblowers

987654321 / 123456789

Launch HN: Propolis (YC X25) – Browser agents that QA your web app autonomously

Show HN: I made a heatmap diff viewer for code reviews

NPM flooded with malicious packages downloaded more than 86k times

Free software scares normal people

Lenses in Julia

Show HN: Run a GitHub Actions step in a gVisor sandbox

Learn Multiplatform Z80 Assembly Programming with Vampires

Independently verifying Go's reproducible builds

Show HN: Meals You Love – AI-powered meal planning and grocery shopping

Israel demanded Google and Amazon use secret 'wink' to sidestep legal orders

Zig's New Async I/O

Show HN: In a single HTML file, an app to encourage my children to invest

ZOZO's Contact Solver for physics-based simulations

Taking money off the table

I have released a 69.0MB version of Windows 7 x86

Some people can't see mental images

US declines to join more than 70 countries in signing UN cybercrime treaty

Estimating the perceived 'claustrophobia' of New York City's streets (2024)

Frozen DuckLakes for Multi-User, Serverless Data Access

Tweakcc

Qt Creator 18 Released

Replacing EBS and Rethinking Postgres Storage from First Principles

The ear does not do a Fourier transform

Comments