How problematic is resampling audio from 44.1 to 48 kHz?

51•brewmarche•4w ago

Comments

ZeroConcerns•3w ago

> it's probably worth avoiding the resampling of 44.1 to 48 kHz

Ehhm, yeah, duh? You don't resample unless there is a clear need, and even then you don't upsample and only downsample, and you tell anyone that tries to convince you otherwise to go away and find the original (analog) source, so you can do a proper transfer.

mort96•3w ago

If only it was that simple T_T

I'm working on a game. My game stores audio files as 44.1kHz .ogg files. If my game is the only thing playing audio, then great, the system sound mixer can configure the DAC to work in 44.1kHz mode.

But if other software is trying to play 48kHz sound files at the same time? Either my game has to resample from 44.1kHz to 48kHz before sending it to the system, or the system sound mixer needs to resample it to 48kHz, or the system sound mixer needs to resample the other software from 48kHz to 44.1kHz.

Unless I'm missing something?

mort96•3w ago

And actually, why do we have both 48kHz and 44.1kHz anyway? If all "consumer grade high quality audio" was in 44.1kHz (or 48kHz) we probably could've avoided resampling in almost all circumstances other than professional audio contexts (or for already low quality audio like 8kHz files). What benefit do we get out of having both 44.1 and 48 that outweighs all the resampling it causes?

zamadatix•3w ago

As far as I understood, both rates ultimately come from trying to map to video standards of the time. 44.1 kHz mapped great to reusing analog tape of the time, 48 kHz mapped better to digital clocking and integer multiples of video standards while also having a slightly wider margin on oversampling the high frequency.

44.1 kHz never really went away because CDs continued using it, allowing them to take any existing 44.1 kHz content as well as to fit slightly more audio per disc.

At the end of the day, the resampling between the two doesn't really matter and is more of a minor inconvenience than anything. There are also lots of other sampling rates which were in use for other things too.

CharlesW•3w ago

> And actually, why do we have both 48kHz and 44.1kHz anyway?

Those two examples emerged independently, like rail standards or any number of other standards one can cite. That's really just the top of the rabbit-hole, since there are 8-20 "standard" audio sample rates, depending how how you count.

This isn't really a drawback, and it does provide flexibility when making tradeoffs for low bitrates (e.g. 8 kHz narrowband voice is fine for most use cases) and for other authoring/editing vs. distribution choices.

Joeboy•3w ago

> This isn't really a drawback

But, that's only true because people freely resample between them all the time and nobody knows or cares about it.

pixelpoet•3w ago

The nice thing about standards is, there are so many from which to choose! :)

sneak•3w ago

CDs used 44.1, DAT and DVDs used 48. That’s it.

Most stuff on the internet ripped from CD is 44.1. 48 is getting more common. We’re like smack in the middle of the 75 year transition period to 48kHz.

For new projects, I use 48, because my mics are 32bit (float!)/48kHz.

brudgers•3w ago

Perhaps the benefit we get is access to existing recordings?

44.1khz exists because it was the lowest technically practical speed and was an optimization for processing speed and storage space.

48khz exists because it syncs with video easily — I’ve also heard it allows for more tolerance in the anti-aliasing filter.

cormorant•3w ago

> 48khz exists because it syncs with video easily

I guess meaning 24fps video? Because 44100 is already a multiple of 25, 30, 50, and 60.

adgjlsfhk1•3w ago

48k also supports 24, 48, 120, and 240 which are all nice to haves.

brudgers•3w ago

24fps is where the money is because 24fps is the standard for film.

PunchyHamster•3w ago

technically we could use 40kHz and just upsample, the extra frequency over 40kHz is basically leeway to make analog part possible/cheap, but it is not technically needed in the signal

the first CD player didn't had compute power to upsample perfectly but modern devices certainly do.

mort96•3w ago

AFAIU, 40kHz exactly wouldn't really work, if your goal is to represent 0Hz-20kHz: in order to avoid aliasing, you need a low pass filter to remove all frequency content above half your sample rate, and no filter is infinitely hard (and you generally want to give the filter a decent range of frequencies to work with). If you want to start your low pass filter at 20kHz, you want it to end (i.e reach practically -∞dB) at a few kHz above 20kHz. If you used a sample rate of exactly 40kHz, you would need your low pass filter to reach -∞dB at 20kHz, meaning it'd have to start somewhere in the audible region.

Though this is just my understanding. Maybe I'm wrong.

rerdavies•3w ago

> Why do we have both 48kHz and 44.1kHz anyway

Because of greed.

Early audio manufacturers (SONY notably) used 48kHz for profession-grade audio equipment, that would be used in studios or TV stations, and degraded 44.1khz audio for consumer devices. Typically you would pay an order of magnitude more for the 48kHz version of the hardware.

48khz is better for creating and mixing audio. You cannot practically mix audio at 44.1khz without doing very slight damage to audible high frequencies. But enough to make a difference. If you were creating for consumer devices, you would mix at 48Khz, and then downsample to 44.1khz during final mastering, since conversion from 48kHz to 44.1kHz can be done theoretically (and practically) perfectly. (Opinions of the OP notwithstanding).

I think it's safe to say that the 44.1kHz sampling rate was maliciously selected specifically because it is just low enough that perfect playback is still possible, but perfect mixing is practically not possible. And obviously maliciously chosen to be a rate with no convenient greatest common denominator with 48Khz, which would have allowed easy and cheap perfect realtime resampling. Had Sony chose 44.0kHz, it would be trivially easy to do sample rate conversion to 48Khz in realtime even with primitive hardware available in the late 1970s. That extra .1kHz is transparently obvious malice and greed in plain sight.

Presumably SONY would sell you the software or hardware to perform perfect non-realtime conversion of audio from 48khz to 44.1khz for a few tens of thousands of dollars. Not remotely subtle how greedy all of this was.

There has been no serious reason to use 44.1kHz instead of 48kHz for about 50 years, at least from a technology point of view. (And no real reason to EVER use 44.1khz instead of 48kHz other than GREED).

brendyn•3w ago

Are you able to share evidence for this?

rerdavies•3w ago

What would you consider evidence? Emails between standards committee members agreeing to collude in order to screw pro-audio customers?

The evidence is: why on earth would anyone on a standards committee choose 44.1kHz, instead of 44.0kHz? The answer: 44.1kHz was transparently obviously chosen to make it impossible to perform on-the-fly rate conversions.

The mathematics of polyphase rate converters was perfectly well understood at the time these standards were created.

brewmarche•3w ago

Someone else wrote that it was chosen to best match PAL and NTSC. IIRC there is also a Technology Connections video about those early PCM adaptor devices that would record to VHS tape.

<https://en.wikipedia.org/w/index.php?title=44,100_Hz&oldid=1...>

Take it with a grain of salt, I’m not really knowledgeable about this.

E: also note the section about prime number squares below

rerdavies•2w ago

4800kHz and 44100kHz devices appeared at roughly the same time. Sony's first 44100kHz device was shipped in 1979. Phillips wanted to use 44.0kHz.

If you can do 44.1khz on an NSTC recording device, you can do 44.0khz too. Neither NTSC digital format uses the fully available space in the horizontal blanking intervals on an NTSC VHS device, so using less really isn't a problem.

Why is 44Khz better? There's a very easy way to do excellent sample rate conversions from 44.0Khz to 48Khz, you upsample the audio by 12 (by inserting 11 zeros between each sample), apply a 22Khz low-pass filter, and then decimate by 11 (by keeping only every 11th sample. To go in the other direction, upsample by 11, filter, and decimate by 12. Plausibly implementable on 1979 tech. And trivially implementable on modern tech.

To perform the same conversion from 44.1kHz to 48kHz, you would have to upsample by 160, filter at at a sample rate of 160x44.1kHz, and then decimate by 147. Or upsample by 147, filter, and decimate by 160. Impossible with ancient tech, and challenging even on modern tech. (I would imagine modern solutions would use polyphase filters instead, with tables sizes that would be impractical on 1979 VLSI). Polyphase filter tables for 44.0kHz/48.0kHz conversion are massively smaller too.

As for the prime factors... factors of 7 (twice) of 44100 really aren't useful for anything. More useful would be factors of two (five times), which would in increase the greatest common divisor from 300 to 4,000!

ianburrell•3w ago

The Wikipedia page explains it as coming from PCM adaptors that put digital audio on video tapes. The constraints of recording on videotape led to 44.1kHz being best option. It sounds like there wasn't enough capacity for 48kHz.

Then Sony used the frequency on CDs.

zamadatix•3w ago

That's a clear need IMO, but it'd be slightly better if the game could have 48 kHz audo files and downsampled them to 44.1 kHz playback than the other way around (better to downsample than upsample).

adgjlsfhk1•3w ago

They're both fine (as long as the source is band limited to 20khz which it should be anyway).

adrian_b•3w ago

The analog source is never perfectly limited to 20 kHz because very steep filters are expensive and they may also degrade the signal in other ways, because their transient response is not completely constrained by their amplitude-frequency characteristic.

This is especially true for older recordings, because for most newer recordings the analog filters are much less steep, but this is compensated by using a much higher sampling frequency than needed for the audio bandwidth, followed by digital filters, where it is much easier to obtain a steep characteristic without distorting the signal.

Therefore, normally it is much safer to upsample a 44.1 kHz signal to 48 kHz, than to downsample 48 kHz to 44.1 kHz, because in the latter case the source signal may have components above 22 kHz that have not been filtered enough before sampling (because the higher sampling frequency had allowed the use of cheaper filters) and which will become aliased to audible frequencies after downsampling.

Fortunately, you almost always want to upsample 44.1 kHz to 48 kHz, not the reverse, and this should always be safe, even when you do not know how the original analog signal had been processed.

PunchyHamster•3w ago

yeah but you can record it in 96kHz, then resample it perfectly to 44.1 (hell, even just 40) in digital domain, then resample it back to 48kHz before sending to DAC

adrian_b•3w ago

True.

If you have such a source sampled at a frequency high enough above the audio range, then through a combination of digital filtering and resampling you can obtain pretty much any desired output sampling frequency.

adgjlsfhk1•3w ago

the point is that when down sampling from 48 to 44.1 you can for "free" do the filtering since the down sampling is being done digitally with an fft anyway

sneak•3w ago

44.1kHz sampling is sufficient to perfectly describe all analog waves with no frequency component above 22050Hz, which is substantially above human hearing. You can then upsample this band limited signal (0-22050Hz) to any sampling rate you wish, perfectly, because the 44.1kHz sampling is lossless with respect to the analog waveform. (The 16 bits per sample is not, though for the purposes of human hearing it is sufficient for 99% of use cases.)

https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampli...

adrian_b•3w ago

22050 Hz is an ideal unreachable limit, like the speed of light for velocities.

You cannot make filters that would stop everything above 22050 Hz and pass everything below. You can barely make very expensive analog filters that pass everything below 20 kHz while stopping everything above 22 kHz.

Many early CD recordings used cheaper filters with a pass-band smaller than 20 kHz.

For 48 kHz it is much easier to make filters that pass 20 kHz and whose output falls gradually until 24 kHz, but it is still not easy.

Modern audio equipment circumvents this problem by sampling at much higher frequencies, e.g. at least 96 kHz or 192 kHz, which allows much cheaper analog filters that pass 20 kHz but which do not attenuate well enough the higher frequencies, then using digital filters to remove everything above 20 kHz that has passed through the analog filters, and then downsampling to 48 kHz.

The original CD sampling frequency of 44.1 kHz was very tight, despite the high cost of the required filters, because at that time, making 16-bit ADCs and DACs for a higher sampling frequency was even more difficult and expensive. Today, making a 24-bit ADC sampling at 192 kHz is much simpler and cheaper than making an audio anti-aliasing filter for 44.1 kHz.

throwaway290•3w ago

You mean average human hearing?

zipy124•3w ago

you're not missing something. You can re-sample them safely as stated by the author. They simply state you should check the re-sampler as:

> Although this conversion can be done in such a way as to produce no audible errors, it's hard to be sure it actually is.

That is, you should verify the re-sampler you are using or implement yourself in order to be sure it is done correctly, and that with todays hardware it is easily possible.

Veliladon•3w ago

Getting pristine resampling is insanely expensive and not worth it.

If you have a mixer at 48KHz you'll get minor quantization noise but if it's compressed already it's not going to do any more damage than compression already has.

ZeroConcerns•3w ago

DAC frequency and the audio format requirements for whatever you supply to your platform audio API have literally nothing to do with each other.

My reply was from an audio mastering perspective.

Joeboy•3w ago

> Unless I'm missing something?

I suppose the option you're missing is you could try to get pristine captures of your samples at every possible sample rate you need / want to support on the host system.

zelphirkalt•3w ago

Is this not the job of the operating system or its supporting parts, to deal with audio from various sources? It should not be necessary to inspect the state of the OS your game is running on, to know what kind of audio you can playback. In fact, that could even be considered spying on things you shouldn't. Maybe the OS or its sound system does not abstract that from you and I am wrong about the state of OS in reality, but this seems to me like a pretty big oversight, if true. If I extrapolate from your use-case, then that would mean any application performing any playback of sound, needs to inspect whether something else is running on the system. That seems like a pretty big overreach.

As an example, lets say I change frequency in Audacity and press the play button. Does Audacity now go and inspect, whether anything else on my system is making any sound?

Joeboy•3w ago

> Is this not the job of the operating system or its supporting parts, to deal with audio from various sources

I think that's the point? In practice the OS (or its supporting parts) resample audio all the time. It's "under the hood" but the only way to actually avoid it would be to limit all audio files and playback systems to a single rate.

zelphirkalt•3w ago

I don't understand then, why they need to deal with that when making a game, unless they are not satisfied with the way that the OS resamples under the hood.

Joeboy•3w ago

My reading is not that they're saying it's something they necessarily have deal with themselves, but that it's something they can't practically avoid.

rerdavies•3w ago

But they CAN practically avoid it. lol. Just let the system do it for them.

Joeboy•3w ago

I suppose, if you interpret "avoid" as "not care about".

zelphirkalt•3w ago

I interpret them to mean "avoid doing it oneself" not "avoid it happening entirely".

Joeboy•3w ago

If you read the comments with the other interpretation I think the conversation will make more sense.

mort96•3w ago

If my audio files are 44.1kHz, and the user plays 48kHz audio at the same time, how do I practically avoid my audio being resampled?

zelphirkalt•3w ago

You cannot avoid it either way then, I guess. Either you let the system do it for you, or you take matters into your own hands. But why do you feel it necessary to take matters into your own hands? I think that's the actual question that begs answering. Are you unsatisfied with how the system does the resampling? Does it result in a worse quality than your own implementation of resampling? Or is there another reason?

mort96•3w ago

I don't feel it necessary to take matters into my own hands. If you read my original message again:

    > Either my game has to resample from 44.1kHz to 48kHz
    > before sending it to the system, or the system
    > sound mixer needs to resample it to 48kHz, or the
    > system sound mixer needs to resample the other software
    > from 48kHz to 44.1kHz

I expressed no preference with regard to those 3. I was outlining the theoretically possible options, to illustrate that there is no way to avoid resampling.

zelphirkalt•3w ago

I got a different impression, because you also wrote:

> If only it was that simple T_T

Which to me sounded like _for you_ it's not simple because reasons, which led me to believe, that you _do_ want to take it into your own hands, making it not simple, ergo not being able to let the OS do it, for reasons. Now I understand what you mean, thanks!

PunchyHamster•3w ago

It is and it is done but you might not have control over process.

In PulseAudio you can choose resample method you want to use for the whole mixing daemon but I don't think that's option in windows/macos

rerdavies•3w ago

Depends on platform. But yes.

It is also the job of the operating system or its supporting parts to allow applications to configure audio devices to specific sample rates if that's what the application needs.

It's fine to just take whatever you get if you are a game app, and either allow the OS to resample, or do the resampling yourself on the fly.

Not so fine if you are authoring audio, where the audio device rate ABSOLUTELY has to match the rate of content that's being created. It is NOT acceptable to have the OS doing resampling when that's the case.

Audacity allows you to force the sample rate of the input and output devices on both Windows and Linux. Much easier on Windows; utterly chaotic and bug-filled and miserable and unpredictable on Linux (although up-to-date versions of Pipewire can almost mostly sometimes do the right thing, usually).

mort96•3w ago

The OS "deals with it" by resampling when necessary.

adzm•3w ago

You are right; the system sound mixer should handle all resampling unless you explicitly take exclusive control of the audio device. On Windows at least, this means everything generally gets resampled to 48khz. If you are trying to get the lowest latency possible, this can be an obstacle... on the order of single digit milliseconds.

bob1029•3w ago

If 44.1kHz is otherwise sufficient but you have a downstream workflow that is incompatible, there are arguments for doing this. It can be done with no loss in quality.

From an information theory perspective, this is like putting a smaller pipe right through the middle of a bigger one. The channel capacity is the only variable that is changing and we are increasing it.

anonymars•3w ago

This isn't the whole picture because they aren't multiples of each other, so there will be interpolation/jitter

For example if you watch a 24fps film on a 60fps screen, in contrast to a 120fps screen

mort96•3w ago

That's not how audio works. PCM data at some sample rate with infinite precision samples perfectly encodes all the audio data of all frequencies up to half the sample rate. Resampling is a theoretically lossless operation when the frequency content of the audio fits within half the sample rate of both the source and the destination sample rates (which will always be true when resampling to a higher sample rate, FWIW).

The issues are that 1) resampling has a performance and latency cost, 2) better resampling has a higher performance and latency cost

zipy124•3w ago

That seems a rather shallow - and probably incorrect - reading of the source. This is an efficiency and trust trade off as noted:

> given sufficient computing resources, we can resample 44.1 kHz to 48 kHz perfectly. No loss, no inaccuracies.

and then further

> Your smartphone probably can resample 44.1 kHz to 48 kHz in such a way that the errors are undetectable even in theory, because they are smaller than the noise floor. Proper audio equipment can certainly do so.

That is you don't need the original source to do a proper transfer. The author is simply noting

> Although this conversion can be done in such a way as to produce no audible errors, it's hard to be sure it actually is.

That is that re-sampling is not a bad idea in this case because it's going to have any sort of error if done properly, it's just that the Author notes you cannot trust any random given re-sampler to do so.

Therefore if you do need to resample, you can do so without the analog source, as long as you have a re-sampler you can trust, or do it yourself.

sgerenser•3w ago

Speaking of a resampler you trust, I’ve had good experience with libsamplerate (http://www.mega-nerd.com/SRC/), which as of 2016 is BSD licensed.

brudgers•3w ago

A very common clear need is incorporating 44.1khz audio sourcesinto video. 48khz is 48khz because 48khz divided by 24fps, 25fps, or 30fps is an integer (and 44.1khz is not).

Also, for decades upsampling on ingest and downsampling on egress has been standard practice for DSP because it reduces audible artifacts from truncation and other rounding techniques.

Finally, most recorded sound does not have an original analog source because of the access digital recording has created…youtube for example.

AshamedCaptain•3w ago

Lots of Live/Audigy era Creative sound cards would resample everything to 48kHz, with probably one of the worst quality resamplers available, to the chagrin of all bitperfect fanatics... still probably one of their best selling sound cards.

I.e. no one cares.

Tanoc•3w ago

I had a Soundblaster Live! Gold card back in the day, and I would route my record player or stereo through it so I could use a visualizer on my computer. You could hear the digital noise that was introduced on the highhats. And the source for the sound was a late '70s era Realistic system where everything was analogue. I never knew it was because of the soundcard. I'd always just chalked it up to either Windows XP or VLC doing something.

AshamedCaptain•3w ago

Unlikely to be this issue since this is about resampling of (e.g. from a CD's native) 44.1kHz PCM to 48kHz, and has nothing at all to do when recording since you'd most likely record at 48kHz and play at 48kHz (no HW resampling involved).

everfrustrated•3w ago

Changing the sample rate of audio only affects the frequency range. All audio signal is _perfectly_ represented in a digital form.

I am ashamed to admit this took me a long time to properly understand. For further reading I'd recommend:

https://people.xiph.org/~xiphmont/demo/neil-young.html https://www.youtube.com/watch?v=cIQ9IXSUzuM

pxndxx•3w ago

Sampling does not lose information below the Nyquist limit, but quantization does introduce errors that can't be fixed. And resampling at a different rate might introduce extra errors, like when you recompress a JPEG.

brudgers•3w ago

In the audio world, quantization is usually discussed in terms of bit-depth rather than sample rate.

mort96•3w ago

Yeah, they know and their comment reflects that knowledge. They're saying that if we had infinite bit depth, we could arbitrarily resample anything to anything as long as the sample rate is above the Nyquist frequency; however we don't have an infinite bit depth, we have a finite bit depth (i.e the samples are quantized), which limits the dynamic range (i.e introduces noise). This noise can compound when resampling.

adgjlsfhk1•3w ago

The key point is that even with finite bit depth (as long as you dither properly), the effect of finite bit depth is easily controlled noise of program chosen spectrum. i.e. as long as your sampling isn't doing anything really dumb, the noise introduced by sampling is well below noise floor.

pxndx•3w ago

Yeah, but that noise keeps compounding every time you resample.

adgjlsfhk1•3w ago

Sure, but you will resample ~5 times max.

mort96•3w ago

I've been trying to figure out whether this is true or not, and I've only been able to find that it doesn't compound: the noise from the original quantization and the noise from the second (post-resample) quantisation aren't independent, both arise from rounding the data to the same 16-bit grid. Somehow, this seems to mean the noise doesn't compound.

I wish I understood this better and at least knew whether it's true or false. I have to do more reading on it.

dietr1ch•3w ago

I see I lose data on the [18kHz..) range, but at the same time as a male I'm not supposed to hear that past in my early 30s, sprinkle concerts on top and make it more like 16kHz :/

At least I don't have tinnitus.

Here's my test,

    ```fish
    set -l sample ~/Music/your_sample_song.flac  # NOTE: Maybe clip a 30s sample beforehand
    set -l borked /tmp/borked.flac # WARN: Will get overwritten (but more likely won't exist yet)

    cp -f $sample $borked

    for i in (seq 10)
        echo "$i: Resampling to 44.1kHz..."
        ffmpeg -i $borked -ar 44100 -y $borked.tmp.flac 2>/dev/null
        mv $borked.tmp.flac $borked

        echo "$i: Resampling to 48kHz..."
        ffmpeg -i /tmp/borked.flac -ar 48000 -y $borked.tmp.flac 2>/dev/null
        mv $borked.tmp.flac $borked
    end

    echo "Playing original $sample"
    ffplay -nodisp -autoexit $sample 2>/dev/null
    echo "Playing borked file $borked"
    ffplay -nodisp -autoexit $borked 2>/dev/null

    echo "Diffing..."
    set -l spec_config 's=2048x1024:start=0:stop=22000:scale=log:legend=1'
    ffmpeg -i $sample -lavfi showspectrumpic=$spec_config /tmp/sample.png -y 2>/dev/null
    ffmpeg -i $borked -lavfi showspectrumpic=$spec_config /tmp/borked.png -y 2>/dev/null

    echo "Spectrograms,"
    ls -l /tmp/*.spec.png
    ```

amluto•3w ago

There’s one thing that bothers me about this. Sure, PCM sampling is a lossless representation of the low frequency portions of a continuous signal. But it is not a latency-free representation. To recover a continuous signal covering the low frequencies (up to 20kHz) from PCM pulses at a sampling frequency f_s (f_s >= 40kHz), you turn each pulse into the appropriate kernel (sinc works and is ideal in a sense, but you probably want to low-pass filter the result as well), and that gives you the decoded signal. But it’s not causal! To recover the signal at time t, you need some pulses from times beyond t. If you’re using the sinc kernel, you need quite a lot of lookahead, because sinc decays very slowly and you don’t want to cut it off until it’s decayed enough.

So if you want to take a continuous (analog) signal, digitize it, then convert back to analog, you are fundamentally adding latency. And if you want to do DSP operations on a digital signal, you also generally add some latency. And the higher the sampling rate, the lower the latency you can achieve, because you can use more compact approximations of sinc that are still good enough below 20kHz.

None of this matters, at least in principle, for audio streaming over the Internet or for a stored library — there is a ton of latency, and up to a few ms extra is irrelevant as long as it’s managed correctly when at synchronizing different devices. But for live sound, or for a potentially long chain of DSP effects, I can easily imagine this making a difference, especially at 44.1ksps.

I don’t work in audio or DSP, and I haven’t extensively experimented. And I haven’t run the numbers. But I suspect that a couple passes of DSP effects or digitization at 44.1ksps may become audible to ordinary humans in terms of added latency if there are multiple different speakers with different effects or if A/V sync is carelessly involved.

Sesse__•3w ago

This is all true, but it is also true for most _other_ filters and effects, too; you always get some added delay. You generally don't have a lot of conversions in your chain, and they are more on the order of 16 samples and such, so the extra delay from chunking/buffering (you never really process sample-by-sample from the sound card, the overhead would be immense) tends to be more significant.

wildzzz•3w ago

Wouldn't each sample be just an amplitude(say, 16bit), not a since function? You can't recover frequency data without a significant number of pulses but that's what the low pass filter is for. Digital audio is cool but PCM is just a collection of analog samples. There's no reason why it couldnt be an energy signal.

amluto•3w ago

This is the sampling theorem. You start with a continuous band-limited signal (e.g. sound pressure [0], low-pass filtered such that there is essentially no content above 20kHz [1]). You then sample it by measuring and recording the pressure, f_s times per second (e.g. 48 kHz). The result is called PCM (Pulse Code Modulation).

Now you could play it back wrong by emitting a sharp pulse f_s times per second with the indicated level. This will have a lot of frequency content above 20kHz and, in fact, above f_s/2. It will sounds all kinds of nasty. In fact, it’s what you get by multiplying the time-domain signal by a pulse train, which is equivalent to convolving the frequency-domain signal with some sort of comb, and the result is not pretty.

Or you do what the sampling theorem says and emit a sinc-shaped pulse for each sample, and you get exactly the original signal. Except that sinc pulses are infinitely long in both directions.

[0] Energy is proportional to pressure squared. You’re sampling pressure, not energy.

[1] This is necessary to prevent aliasing. If you feed this algorithm a signal at f_s/2 + 5kHz, it would come back out at f_s - 5kHz, which may be audible.

brewmarche•3w ago

> Now you could play it back wrong by emitting a sharp pulse f_s times per second with the indicated level. This will have a lot of frequency content above 20kHz and, in fact, above f_s/2. It will sounds all kinds of nasty.

Wouldn’t the additional frequencies be inaudible with the original frequencies still present? Why would that sound nasty?

amluto•3w ago

Because the rest of the system is not necessarily designed to tolerate high frequency content gracefully. Any nonlinearities can easily cause that high frequency junk to turn back into audible junk.

This is like the issues xiphmont talks about with trying to reproduce sound above 20kHz, but worse, as this would be (trying to) play back high energy signals that weren’t even present in the original recording.

brewmarche•3w ago

That would mean that higher sampling rates (which add more inaudible frequencies) could cause similar problems. OK xiphmont actually mentions that, sorry, I had only watched the video when I replied.

amluto•3w ago

If I were designing a live audio workflow from scratch, my intuition would be to sample at a somewhat high frequency (at least 48kHz but maybe 96kHz), do the math to figure out the actual latency / data rate tradeoff, but to also filter the data as needed to minimize high frequency content (again, being careful with latency and fidelity tradeoffs).

But I have never done this and don't have any plans to do so, so I'll let other people worry about it. But maybe some day I'll carry out my evil plot to write an alternative to brutefir that gets good asymptotic complexity without adding latency. :)

klaff•3w ago

The xiphmont link is pretty good. Reminded me of the nearly-useless (and growing more so every day) fact that incandescent bulbs not only make some noise, but the noise increases when the bulb is near end of life. I know this from working in an anechoic chamber lit by bare bulbs hanging by cords in the chamber. We would do calibration checks at the start of the day, and sometimes a recording of a silent chamber would be louder than normal and then we'd go in and shut the door and try to figure out which bulb was the loud one.

4gotunameagain•3w ago

Does the noise increase when the bulb is near end of life, or does lifespan decrease dramatically when the noise is increased ?

I imagine the noise increases when one of the supports fail, and the filament starts oscillating leading to mechanical stress and failure

(not that it makes a difference, just thinking out loud)

klaff•3w ago

I've assumed or guessed that the noisy bulb had weak spots in the coiled coil filament that changed how they mechanically respond to the temperature cycling at 120 or 100 thermal cycles per second, thereby making them noisier at end of life. I don't know if I read that or am just extrapolating from how burned out bulbs look. Halogen lamps extend the life of incandescent lamps by allowing evaporated tungsten to redeposit on the filament, which says something about the general failure mechanism.

brewmarche•3w ago

This is a nice video. But I’m wondering: do we even need to get back the original signal from the samples? The zero-order hold output actually contains the same audible frequencies doesn’t it? If we only want to listen to it, the stepped wave would be enough then

lucyjojo•3w ago

in theory yes, in practice no.

the article explains why.

tldr: formula for regenerating signal at time t uses an infinite amount of samples in the past and future.

hdjxbsksbsh•3w ago

> All audio signal is _perfectly_ represented in a digital form.

That is not true... A 22kHz signal only has 2 data points for a sinusoidal waveform. Those 2 points could be anywhere I.e you could read 0 both times the waveform is sampled.... See Nyquist theorem.

From memory changing the sample rate can cause other issues with sample aliasing sue to the algorithms used...

temp8830•3w ago

Well, yeah, because that's the Nyquist limit at that sample rate. The max frequency needs to be below that (22.999kHz would not be undersampled). Of course, all of these frequencies are in the ultrasound range anyway.

Reducing the sample rate could cause aliasing. Oversampling shouldn't.

klaff•3w ago

22 kHz is not a hard/fast limit. There are people who can hear a bit above 20 kHz. I learned this when working with motor drives (at 23 kHz) but also documented in this wiki reference. https://en.wikipedia.org/wiki/Hearing_range#cite_note-10

comprev•3w ago

Thanks for the links very interesting to read.

I buy loads of DJ music on Bandcamp and "downsample" (I think the term is) to 16bit if they only offer 24bit for smaller size and wider compatability.

spacechild1•3w ago

That's not downsampling, that's reducing the bit depth. The bit depth determines the resolution of a single sample, i.e. the number of possible values it can represent.

112233•3w ago

> All audio signal is _perfectly_ represented in a digital form

What? No. All bandwidth limited signal is. Which means periodic. Causal signals like audio can be approximated, with tradeoffs. Such as pre-ringing (look at sinc(x), used to reconstruct sampled signal — how much energy is in the limb preceding the x=0.)

Is the approximation achieved by filtering the 44.1kHz DAC good enough? Yes, yes it is. But the math is way more involved (i.e. beyond me) than simply "Niquist".

This popular myth that limited frequencies we can hear and limited frequencies in Fourier transform sense is the same thing is quite irritating.

cozzyd•3w ago

Naively I would upsample by 4-8 by 0 stuffing and low pass filtering and then interpolating. That can't be that bad, can it?

MontagFTB•3w ago

> We do [cubic curve fitting] all the time in image processing, and it works very well. It would probably work well for audio as well, although it's not used -- not in the same form, anyway -- in these applications.

Is there a reason the solution that "works very well" for images isn't/can't be applied to audio?

amlib•3w ago

AFAIK it introduces harmonic distortion

pixelpoet•3w ago

I'd love to know more about this, do you perhaps have any refs? Thanks

amlib•3w ago

Not an expert in this field, just a scrub, so I can't really give you much.

There is this website that has painstakingly compares many resampling algorithms from all sorts of software:

https://src.infinitewave.ca

Try it's mirror if you can't access it: https://megapro17.github.io/src/index.html

The only one that says it is a cubic interpolation is the "Renoise 2.8.0 (cubic)" one, the spectrogram isn't very promising with all sorts of noise, intermodulation and aliasing issues. And, by switching to the 1khz tone spectrum view you can see some harmonics creeping up.

When I used to mess with trackers I would sometimes chose different interpolations and bicubic definitely still colored the sound, with sometimes enjoyable results. Obviously you don't want that as a general resampler...

butterknife•3w ago

Just to note that this site hasn't been updated for a while.

Much better, more modern and with automated upload analysis site would be [1] although it is designed for finding the highest fidelity resampler rather than AB comparisons.

[1] https://src.hydrogenaudio.org

adgjlsfhk1•3w ago

The short answer is that our eyes and ears use very different processing mechanisms. Our eyes sense using rods and cones where the distribution of them reflects a spatial distribution of the image. Our ears instead work by performing an analogue forier transform and hearing the frequencies. If you take an image and add lots of very high frequency noise, the result will be almost indistinguishable, but if you do the same for audio it will sound like a complete mess.

bobbylarrybobby•3w ago

Where does dither fit into this picture?

mistrial9•3w ago

dither is not compression or re-sampling. Dither is a means to change data so that it appears similar to original data to a human AFAIK

source- wrote dithering code for digital images

pixelpoet•3w ago

Dithering is about trading spatial / time resolution for reduced average quantisation error.

mistrial9•3w ago

The word dither -- evidently, the term dithering originally referred to a trembling or erratic movement, from the Middle English didderen, meaning "to tremble."

further, the halftone technique developed in the 1880s by Georg Meisenbach — breaking images into dots to simulate shades of gray — was called autotype, not dithering. The term dithering was later adopted in digital imaging and computing, particularly in the 1960s, when engineers applied the concept of adding noise to reduce color banding.

Sesse__•3w ago

You generally want to dither when (before) you quantize, unless you have so much headroom that it doesn't matter. E.g., if you're converting 44.1/16 to 48/16 (which involves quantizing each sample from an intermediate higher-precision result), you probably want to dither, but if you're converting 44.1/24 to 48/24, you probably won't need to care since you don't really care about whether your effective result is 24 or 24.5–25 bits.

ErroneousBosh•3w ago

That's more of a bit depth rather than bit rate thing. I was surprised to find that going from 16 to 8 bits by simply truncating gave really obvious artifacts on certain sounds (a sampled 808 kick for example had a distinct BZZZEEEOOOOWWWW sound, quite prominent), and even really simple triangular noise dithering made it go away. It did mean the playback was more noisy but it was less obvious.

As an aside, G.711 codecs use a kind of log scale with only four bits of signal but small signal values use much smaller bits.

pixelpoet•3w ago

I'm not sure I understand the "just generate it" perspective. If you want to generate a much higher sampling rate signal that has a common multiple of your input and output sampling rate, "just generating it" is going to involve some kind of interpolation, no? Because you're trying to make data that isn't there.

If you want to change the number of slices of pizza, you can't simply just make 160x more pizza out of thin air.

Personally I'd just do a cubic resample if absolutely required (ideally you don't resample ofc); it's fast and straightforward.

Edit: serves me right for posting, I gotta get off this site.

superjan•3w ago

Maybe the following helps: if you have a an analog signal where there are no frequencies above 22.05 khz, it is in principle possible to sample it at 44.1 khz and then perfectly reconstruct the original signal from those samples. You could also represent the same analog signal using 48 khz samples. The key to resampling is not finding a nice looking interpolation, but rather one that corresponds to the original analog signal.

HelloUsername•3w ago

(2022)

amlib•3w ago

I wonder if this problem could be "solved" by having some kind of "dual mode" DACs that can accept two streams of audio at different sample rates, likely 44.1khz and 48khz, which are converted to analog in parallel and then mixed back together at the analog output.

Then at the operating system level rather than mixing everything to a single audio stream at a single sample rate you group each stream that is at or a multiple of either 44.1khz or 48khz and then finally sends both streams to this "dual dac", thus eliminating the need to resample any 44.1khz or 48khz stream, or even vastly simplifying the resample of any sample rate that is a multiple of this.

PunchyHamster•3w ago

> I wonder if this problem could be "solved" by having some kind of "dual mode" DACs that can accept two streams of audio at different sample rates, likely 44.1khz and 48khz, which are converted to analog in parallel and then mixed back together at the analog output.

You'd just resample both at 192kHz and run it into 192kHz DAC. The "headroom" means you don't need to use the very CPU intensive "perfect" resample.

adzm•3w ago

As a real world example, on Windows, unless you take exclusive access of the audio output device, everything is already resampled to 48khz in the mixer. Well, technically it gets resampled to the default configured device sample rate, but I haven't seen anything other than 48khz in at least a decade if ever. Practically this is a non-issue, though I could understand wanting bit-perfect reproduction of a 44.1 khz source.

klaff•3w ago

I'm kinda shocked that there's no discussion of sinc interpolation and adapting it's theoretical need for infinite signals to some finite kernel length.

For a sampled signal, if you know the sampling satisfied Nyquist (i.e., there was no frequency content above fs/2) then the original signal can be reproduced exactly at any point in time using sinc interpolation. Unfortunately that theoretically requires an infinite length sample, but the kernel can be bounded based on accuracy requirements or other limiting factors (such as the noise which was mentioned). Other interpolation techniques should be viewed as approximations to sinc.

Sinc interpolation is available on most oscilloscopes and is useful when the sample rate is sufficient but not greatly higher than the signal of interest.

oakwhiz•3w ago

Sinc interpolation is somewhat resource-intensive / latency inducing which is why it's not used as frequently in realtime audio.

klaff•3w ago

Agree, but it's like knowing a limit imposed by the 2nd law of thermodynamics. You can't do better and if you get close enough you are done.

observationist•3w ago

If you're taking something from 44.1 to 48, only 91.875% of the data is real, so 8.125% of the resulting upsampled data is invented. Some of it will correlate with the original, real sound. If you use upsampling functions tuned to features of the audio - style, whether it's music, voice, bird recordings, NYC traffic, known auditorium, etc, you can probably bring the accuracy up by several percent. If the original data already has the optimizations, it'll be closer to 92%.

If it's really good AI upsampling, you might get qualitatively "better" sounding audio than the original but still technically deviates from the original baseline by ~8%. Conversely, there'll be technically "correct" upsampling results with higher overall alignment with the original that can sound awful. There's still a lot to audio processing that's more art than science.

fulafel•3w ago

> In reality, the amount of precision that can actually be "heard" by the human ear probably lies between 18 and 21 bits; we don't actually know, because it's impossible to test.

This sounds contradictory - what would be the precision that can be heard in a test then?

functionmouse•3w ago

This article cleared up so many long-standing questions for me. THANK YOU for sharing!!!

somat•3w ago

> once we have reduced errors to below the noise floor they are inaudible by definition.

Makes me think of GPS where the signal is below the noise floor. Which still blows my mind, real RF black magic.

nivea3066•3w ago

For those looking to delve into this topic more, the term of art is ASRC: Asynchronous Sample Rate Conversion.

How Does AI Distribute the Pie? Large Language Models and the Ultimatum Game

Resistance Infrastructure

Fire-juggling unicyclist caught performing on crossing

Restoring a lost 1981 Unix roguelike (protoHack) and preserving Hack 1.0.3

GPS and Time Dilation – Special and General Relativity

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

Show HN: I built a clawdbot that texts like your crush

Scientists reverse Alzheimer's in mice and restore memory (2025)

Compiling Prolog to Forth [pdf]

Show HN: Cymatica – an experimental, meditative audiovisual app

GitBlack: Tracing America's Foundation

Horizon-LM: A RAM-Centric Architecture for LLM Training

We just ordered shawarma and fries from Cursor [video]

Correctio

Trying to make an Automated Ecologist: A first pass through the Biotime dataset

Watch Ukraine's Minigun-Firing, Drone-Hunting Turboprop in Action

Free Trial: AI Interviewer

FDA intends to take action against non-FDA-approved GLP-1 drugs

Supernote e-ink devices for writing like paper

We are QA Engineers now

Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified

Adversarial Reasoning: Multiagent World Models for Closing the Simulation Gap

Show HN: Poddley.com – Follow people, not podcasts

Layoffs Surge 118% in January – The Highest Since 2009

Papyrus 114: Homer's Iliad

DicePit – Real-time multiplayer Knucklebones in the browser

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

Show HN: AI Agent Tool That Keeps You in the Loop

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

Achieving Ultra-Fast AI Chat Widgets

How Does AI Distribute the Pie? Large Language Models and the Ultimatum Game

Resistance Infrastructure

Fire-juggling unicyclist caught performing on crossing

Restoring a lost 1981 Unix roguelike (protoHack) and preserving Hack 1.0.3

GPS and Time Dilation – Special and General Relativity

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

Show HN: I built a clawdbot that texts like your crush

Scientists reverse Alzheimer's in mice and restore memory (2025)

Compiling Prolog to Forth [pdf]

Show HN: Cymatica – an experimental, meditative audiovisual app

GitBlack: Tracing America's Foundation

Horizon-LM: A RAM-Centric Architecture for LLM Training

We just ordered shawarma and fries from Cursor [video]

Correctio

Trying to make an Automated Ecologist: A first pass through the Biotime dataset

Watch Ukraine's Minigun-Firing, Drone-Hunting Turboprop in Action

Free Trial: AI Interviewer

FDA intends to take action against non-FDA-approved GLP-1 drugs

Supernote e-ink devices for writing like paper

We are QA Engineers now

Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified

Adversarial Reasoning: Multiagent World Models for Closing the Simulation Gap

Show HN: Poddley.com – Follow people, not podcasts

Layoffs Surge 118% in January – The Highest Since 2009

Papyrus 114: Homer's Iliad

DicePit – Real-time multiplayer Knucklebones in the browser

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

Show HN: AI Agent Tool That Keeps You in the Loop

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

Achieving Ultra-Fast AI Chat Widgets

How problematic is resampling audio from 44.1 to 48 kHz?

Comments