One of the most hotly—and perhaps unnecessarily—debated topics in the world of audio is the one that surrounds digital sample rates.
It seems an unlikely topic for polarization, but for more than 10 years, the same tired arguments have been batted about by each side with almost unrelenting intensity.
At the fringes, advocates of either side have often dug deeper trenches of faith for themselves. But as much as that’s the case, there’s also a growing consensus among designers and users who have a firm understanding of digital audio.
Namely, that there are perfectly good reasons for sticking with the current professional and consumer standards of 44.1 and 48 kHz for recording and playback – and some valid arguments for moving up to slightly higher sample rates, such as 60, 88.2 or even as high as 96 kHz. What seems to have less informed support is the push to ultra-high sample rates like 192kHz.
We’ll explore the arguments on both sides of the major questions around sample rates and try to find out where each faction has got it right – and where they may be missing some crucial information.
How We Got Here
The mid-20th century was a heady time at Bell Laboratories. Just before its closing, it employed upward of 25,000 people, dedicated entirely to research and development.
Their innovations were enormous ones, and they lie at the root of the very device you are reading this on: The transistor, the laser, semi-conduction, the solar cell, television, C++ programming, the fax machine, and by the 1960s, the goddamn video phone.
For the sake of contrast, Google, one of our greatest innovators of today, employs roughly 50,000 people across all of its departments, and it’s greatest offerings have been, well… a slightly improved version of the fax machine and the videophone.
In their heyday, researchers at Bell Labs earned 7 Nobel Prizes in total, and in 1960, the IEEE gave their “Medal of Honor” to Harry Nyquist, who had researched there for almost 40 years.
Back in the 1920s, the Yale graduate had worked on an early version of the fax machine. By 1947, he had made his most lasting contribution: a mathematical proof that showed any sound wave could be perfectly re-created so long as it was limited in bandwidth and sampled at a rate more than twice its own frequency.
In this case, practice sprung from theory. Nyquist’s Theorem set the groundwork for what would become digital audio. He had provided a mathematical proof that predicts a real law of the natural world. Much like with analog audio recording, the proof for digital audio existed on paper long before it became a reality.
Of course, it can sometimes take practice a while to catch up with theory.
It wasn’t until 1983 that popular and practical digital audio format was even introduced to the consumer market. But from its inception, the 16-bit/44.1kHz standard promised greater audio fidelity than vinyl or even magnetic tape. This is an established fact by any criteria that we can measure: frequency response, distortion, signal-to-noise, even practical dynamic range.
Of course, some of us still prefer the sound of older technologies, but when we do, it is not for the sake of transparency. Even the best older analog formats sound less like what we feed into them than a properly designed 16/44.1 converter. This can be confirmed by both measurement and unbiased listening.
But even though 16/44.1 was a theoretically sound format from the start, it took decades for it to reach the level of quality it has attained today – just as it had taken decades for Nyquist’s Theorem to lead to the creation of a viable consumer format in the first place.
Now in 2013, the 16/44.1 converter of a Mac laptop can have better specs and real sound quality than most professional converters from a generation ago, not to mention a cassette deck or a consumer turntable. There’s always room for improvement, but the question now is where and how much?
Improvements at 44.1: Fixing the Clock
There have been a few major improvements to basic converter technology over the years. They have come largely when subjective listeners and objective designers have shared common goals and a common purpose.
At first, digital converters lacked sufficiently accurate clocking, which could introduce significant “jitter”: time-based anomalies which show up in the signal as high-frequency distortion.
Upgrading the clocks on digital converters became a huge point of focus for some time. There was even a moment when external clock upgrades could provide significant benefits in many systems.
But that was then and this is now.Technology always advances and today, external clocking is far more likely to increase distortion and decrease accuracy when compared to a converter’s internal clock. In fact, the best you can hope for in buying a master clock for your studio is that it won’t degrade the accuracy of your converters as you use it to keep them all on the same page.
There are however, occasions when switching to an external clock can add time-based distortion and inaccuracies to a signal that some listeners may find pleasing. That’s a subjective choice, and anyone who prefers the sound of a less accurate external clock to a more accurate internal one is welcome to that preference.
This is a theme that we find will pop up again and again as we explore the issue of transparency, digital audio, sampling rates, and sound perception in general: Sometimes we do hear real, identifiable differences between rates and formats, even when those differences do not reveal greater accuracy or objectively “superior” sound.
Improvements at 44.1: Fixing the Filters
Clocking wasn’t the only essential improvement that could be made at the 44.1kHz sample rate.
The earliest digital converters lacked well-designed anti-aliasing filters, which are used to remove inaudible super-sonic frequencies and keep them from mucking up the signals that we can hear.
Anti-aliasing filters are a basic necessity that was predicted by the Nyquist Theorem decades ago. Go without them and you are dealing with a signal that is not bandwidth limited, which Nyquist clearly shows cannot be rendered properly. Start them too low and you lose a little bit of the extreme high-end of your frequency response. Make them too steep and you introduce ringing artifacts into the audible spectrum.
It’s a series of tradeoffs, but even at 44.1, we can deal with this challenge. Designers can oversample signals at the input stage of converter and improve the response of filters at that point. When this is done properly, it’s been proven again and again that even 44.1kHz can be completely transparent in all sorts of unbiased listening tests.
But that doesn’t mean that all converter companies keep up with what’s possible. Sometimes different sampling rates can and do sound significantly different within the same converter. But this is usually because of design flaws – purposeful or accidental – at one sampling rate or another. More on that in a minute.
When More is Better: Making The Filters Even Better
With all that said, there are a few places where higher samples rates can be a definite benefit.
In theory, rates around 44.1kHz or 48kHz should be a near-perfect for recording and playing back music. Unless the Nyquist Theorem is ever disproved, it stands that any increase in sample rates cannot increase fidelity within the audible spectrum. At all. Extra data points yield no improvement.
In practice, tradeoffs necessitated by anti-aliasing might cause you to lose a few dB of top-end between 17kHz and 20kHz – the very upper reaches of the audible spectrum. Few adults over the age of 35 or so can even hear these frequencies, and there is currently no evidence to suggest that even younger people are influenced by frequencies above the audible range.
When properly designed, a slightly higher sample rate may allow us to smooth out our super-high frequency filters and keep them from introducing audible rolloff or ringing which may be perceived by younger listeners (if they’re paying any attention.)
But be careful of designers who go for super-sonic sampling rates and set their filters too high. If you include too much super-sonic information in the signal it becomes likely that you will introduce super-high frequency “intermodulation distortion” on playback.
It turns out that in many cases, we can hear the sound of higher sample rates not because they are more transparent, but because they are less so. They can actually introduce unintended distortion in the audible spectrum, and this is something that can be heard in listening tests. More on that later.
When More is Better: Oversampling for DSP
When you go beyond the mere recording and playback of sound and into the world of digital signal processing, it becomes clear that higher sampling rates actually can help. But the solution might be a different one than you’d expect.
When it comes to some non-linear audio processors like a super-fast compressor, a saturator, a super-high-frequency EQ, or a vintage synthesizer emulation, oversampling can be a major benefit. This in and of itself might seem like a great excuse to immediately jump up to 88.2 kHz or higher.
But not so fast: most plugin designers, knowing this full well, have written oversampling into their code. Even in a 44.1kHz session, plugins that benefit from oversampling automatically increase their internal sampling rate. To gain the full benefits of this, it’s important to note that the audio doesn’t have to be recorded at this higher sample rate, it’s just the processing that must happen at the higher rate.
So unless you are using plugins that have taken shortcuts and neglected to include oversampling in their code, then converting an entire audio session to a higher rate would make your mix take up more processing power without adding any sonic benefit.
But don’t take my word for this – Try it yourself. Up-sample an entire mix and then try a null test with your original file.
In my experience, the only things that will fail to null are A) Processors that have a random time element – like modulation effects – and cannot null B) Plugins that have different delay amounts and will not null until you compensate for the delay, and C) Processors that neglect to include oversampling when they should.
Very few of the latter still exist. And thankfully so, because oversampling has led to huge improvements in the quality of digital processing. Finally, after decades of people trying, there are actually some software compressors that I like. A lot.
When More is Better: Converter Design
Dan Lavry is one of the most respected designers of audio converters in the world, and a die-hard opponent of ultra-high sample rates like 192 kHz.
But even he would be among the first to admit that some slight increase in sampling rates can make designing great-sounding converters easier and less expensive. In an influential and now-famous white paper, he writes:
“The notion that more is better may appeal to one’s common sense. Presented with analogies such as more pixels for better video, or faster clock to speed computers, one may be misled to believe that faster sampling will yield better resolution and detail. The analogies are wrong. The great value offered by Nyquist’s theorem is the realization that we have ALL the information with 100% of the detail, and no distortions, without the burden of “extra fast” sampling.
“Nyquist pointed out that the sampling rate needs only to exceed twice the signal bandwidth. What is the audio bandwidth? Research shows that musical instruments may produce energy above 20 KHz, but there is little sound energy at above 40KHz. Most microphones do not pick up sound at much over 20KHz. Human hearing rarely exceeds 20KHz, and certainly does not reach 40KHz.
“The above suggests that [even] 88.2 or 96KHz would be overkill. In fact all the objections regarding audio sampling at 44.1KHz … are long gone by increasing sampling to about 60KHz.”
To him, the issue is not about whether 44.1kHz is the last stop. It’s clear that it rests on the cusp of the point of diminishing returns, and that by the time you’ve reached 60 kHz you’ve exhausted all the theoretical benefits you could ever add. The real benefits to be had are the ones that come from improving implementation, not from ever-increasing sample rates.
With that said, if you think it’s better to overshoot your mark and waste some power and stability rather than undershoot it and potentially leave some theoretical audio quality on the table, then switching from 44.1kHz to 88.2kHz seems like a valid argument.
Properly designed, 88.2kHz shouldn’t be a huge improvement, but it can make good design easier and less expensive, and it shouldn’t hurt either. But beyond that, things start to get a little sketchy.
When More Isn’t Better: “Because My Converter Sounds Better at a Higher Rate”
If we’ve tapped out our theoretical benefits by the time we get to a sampling rate 60kHz, then why do some people insist they can hear an improvement when they record at a higher rate? Are they making it up?
Absolutely not – There are definitely some converters that sound significantly better at a higher sampling rate than at a lower one, even in a blind test. But strictly speaking, the problem isn’t with the lower sampling rate – it’s with the converter.
When a designer like Lavry focuses on making a converter sound great at 44.1, it’s easy for that converter to sound equally great at a higher rate as well. Unfortunately, it doesn’t work the same the other way around.
Lavry’s work demonstrates that a slightly higher rate can give a little more wiggle room as far as design and implementation is concerned, and he should know: At traditional rates like 44.1 and 48kHz, his converters can outperform many professional-grade competitors set at significantly higher sampling rates.
A key point to understand is that just because a converter sounds better than itself when its switched to a higher rate doesn’t mean that it will sound better than a different converter at a lower rate!
A better converter that takes fewer design shortcuts might easily out perform the higher sample-rate converter, regardless of what sampling rate is chosen. For people in their 30s, 40s and beyond, what differences are left in the form of high frequency roll-off should be impossible to distinguish. What might be easy to distinguish however, are flaws in the design of the 44.1 converter or distortions introduced by a poorly-filtered higher rate.
So, if you are ever using a converter and find it sounds dramatically better at a higher rate, don’t get excited about the sample rate. Get suspicious of the design shortcuts instead! Why isn’t the 44.1kHz on that converter up to snuff? How does this converter compare to the best-designed converters when they are set to a lower rate? Is it still better, or does the advantage disappear?
However, you could also look at this the other way: If it’s cheaper and easier to make a good sounding converters at higher rates than at lower ones, doesn’t that justify switching your sessions over to a higher rate? I mean, if switching to a higher sample rate allows you to get the same results with a cheaper converter then why not?
That’s certainly an option, and a valid argument.
At that point, it becomes a good idea to crunch the numbers to find out whether the lower cost of a cheaply made, higher-rate converter offsets the increased cost of other resources such as processing power, disk speed and internet bandwidth.
To people who study these things, it’s become clear that doubling and quadrupling sample rates instead of improving converters is a lousy economic tradeoff for consumers, as well as for the environment and the larger economy.
But in your own studio? That call is yours to make.
Just remember: when comparing sample rates, don’t compare two different rates inside one converter. Compare rates across converters if you want any sense of how they perform relative to a high-quality baseline.
Another valuable thing to remember is that when a converter does sound surprisingly different at different sampling rates, those distinctions are usually still audible once you down convert to 44.1kHz. Ironically, this only goes to show that any benefit in the higher rate exists in the sonic range that 44.1kHz can capture!
This is further proof that you don’t need super-sonic sampling rates to gain those very same “benefits”, and yet more evidence to suggest the most important part of the equation is not “rate” but “implementation”.
Where More Isn’t Better: Rates Above 96kHz
There are almost as many opinions about sample rates as there are engineers. And that’s okay. I’ve mixed and mastered great sounding recordings made at 44.1, 48, 88.2 and 96, and I’d never tell anyone what rate they should use. In all the projects I’ve worked on, there has been no meaningful correlation between sample rate and sound quality. So many other factors matter more.
My own preference is to work at 44.1, especially on projects that will move from studio to studio and even into band members’ homes, which is so common these days. You never know what kind of computer power and disk speed you’ll be faced with. And, any difference I hear on properly designed converters tends to be less significant than say, a half dB of top-end EQ. In another 10 or 15 years, I might not hear these differences at all. Even today, there are older engineers who can hear fewer of these high frequencies than I can. But that doesn’t mean much: listening almost always trumps hearing. And good listening comes with experience.
In any case, by increasing rates from 44.1kHz to anywhere up to about 96kHz, you might not get incredible increases in sound quality — but with all other things being equal, it shouldn’t hurt.
Once you go past that however, you introduce the possibility of not just using too much power, but of introducing unintended distortion.
Yes, there is a point where you can have too high a data rate. Some would argue that point is closer to 96kHz, but almost any computer scientist or circuit designer today will tell you that you’ve definitely reached that point by 192 kHz.
192kHz digital music files offer no benefits. They’re not quite neutral either; practical fidelity is slightly worse. The ultrasonics are a liability during playback.
This runs counter to many initial intuitions regarding super-sonic sampling rates – my own included. But the evidence is there. Since analog circuits are almost never linear at super-high frequencies, they can and will introduce a special type of distortion called intermodulation distortion.
This means that two super-sonic frequencies that cannot be heard, say 22 kHz and 32 kHz, can create an intermodulation distortion down in the audible range, in this case at the “difference frequency” of 10kHz. This is a real danger whenever super-sonic frequencies are not filtered out.
(Note that this is not quite the same as the idea of a “phantom fundamental”, an acoustic phenomenon where a series of overtones can “trick” the brain into hearing the missing fundamental. For that to happen, it’s necessary for you to actually hear those overtones. But for a circuit to introduce intermodulation distortion, hearing the overtones is not necessary.)
There is also the separate problem of distortion caused by the decreased sampling accuracy of a rate that is too fast. Lavry agrees:
“There are reports of better sound with higher sampling rates. No doubt, the folks that like the “sound of a 192KHz” converter hear something. Clearly it has nothing to do with more bandwidth: the instruments make next to no 96KHz sound, the microphones don’t respond to it, the speakers don’t produce it, and the ear can not hear it.
Moreover, we hear some reports about “some of that special quality captured by that 192KHz is retained when down sampling to 44.1KHz. Such reports neglect the fact that a 44.1KHz sampled material can not contain above 22.05KHz of audio…
The danger here is that people who hear something they like may associate better sound with faster sampling, wider bandwidth, and higher accuracy. This indirectly implies that lower rates are inferior.
Whatever one hears on a 192KHz system can be introduced into a 96KHz system, and much of it into lower sampling rates. That includes any distortions associated with 192KHz gear, much of which is due to insufficient time to achieve the level of accuracy of slower sampling.
There is an inescapable tradeoff between faster sampling on one hand and a loss of accuracy, increased data size and much additional processing requirement on the other hand…
Sampling audio signals at 192KHz is about 3 times faster than the optimal rate. It compromises the accuracy, which ends up as audio distortions.”
Optimal Sampling Rates
OK – so if there can be some theoretical benefit to a slightly higher sampling rate, and some point where a sampling rate can be too high to be accurate, is there an in-between point? An “optimal” rate that gives us the best of both worlds?
To a degree, what’s “optimal” is matter of opinion. But Lavry offers an educated one:
“There are many who subscribe to the false notion that operating above the optimal sample rate can improve the audio. The truth is that there is an optimal sample rate, and that operating above that optimal sample rate compromises the accuracy of audio. To some, this may seem counterintuitive, but is completely proven; whereas most supporters of higher than optimum sample rates offer only subjective results in support.
In my paper “Sampling Theory” I already pointed out that increased speed (bandwidth) reduces accuracy. No one advocates sampling at 10 KHz because that would exclude audio signals above 5 KHz. Clearly, no one knowledgeable in the subject would advocate audio conversion at 100MHz, either. It would bring about poor audio performance due to high distortions, noise and more…
I use these extreme examples to show that sampling can be done too slowly, and it can also be done too fast. There is an OPTIMAL sample rate; fast enough to accommodate everything we hear (the audible range). But exceeding this optimal sample rate will only reduce audio accuracy.”
Based on the evidence that has passed the scrutiny of the scientific community, it would appear that whatever the optimal rate is, it must be somewhere quite a ways below 192kHz.
That’s not to say that 192kHz can’t sound different than a lower sample rate like 96kHz. Only that those differences are easily shown to be the results of intermodulation distortion or errors connected to too high of a data speed.
Lavry suggests that the optimal rate is somewhere around 50 or 60kHz, and he’s not alone. The makers of the very first commercial digital recorders from 3M, Mitsubishi and Soundstream calculated this as well, and used rates of around 50kHz until commercial pressures forced the standardization of a slightly lower rate.
Today, many users are skeptical of the idea of working at non-standard rates like 48, because for a long time the math to convert to the consumer music rate was lousy. That’s becoming less and less the case today, and perhaps some day it won’t be a factor at all.
But unlike the arguments around sensible sampling rates, 192 kHz appears to be one of those areas which – much like the reality of evolution – isn’t actually a matter of debate among the people who study the issue. Lavry, like the vast majority of computer and perceptual scientists, circuit designers and engineers, is more than a little dismissive of ultra-high sampling rates:
“[My] motivation,” he writes, “is to help dispel the widespread misconceptions regarding sampling of audio at a rate of 192KHz. This misconception, propagated by industry salesmen, is built on false premises, contrary to the fundamental theories that made digital communication and processing possible. “
But he also offers some hopeful words:
“It took a few years to have it turned around, but many of those that ‘jumped on the 192KHz band wagon baloney’ are coming around to saying that 60-70KHz is optimal. Well, there is no such standard, but 88.2-96KHz is not that far from the optimum. It is slightly faster then I would like, but still acceptable to me.”
Although we may have begun to come to some consensus on this one issue, no debate ever really ends. At best they just flag in the face of overwhelming evidence.
Even Lavry would admit there could possibly be sound differences at these super-high rates. He argues, with overwhelming evidence to back it up, that based on what we know about physics, physiology and electrical circuits, it seems that any differences would have to come from unintended distortions and inaccuracies.
Those who don’t agree that 192kHz sampling rates are inadvisable at best are certainly entitled to that view, and to the goal of working to prove it with real solid evidence and not just anecdotes. But until that time, it seems fair to say that the burden of proof now rests on those who believe that 192kHz offers more than it costs – and not the other way around.
In the meantime, keep doing whatever you do and at whatever sample rate you’re doing it at! If we’ve proved anything, I hope it’s that the raw numbers just aren’t that big of a deal.
There are so many more important decisions to make: Whether that dB of EQ is hurting or helping, whether the bridge of the song comes in too early, whether we should move the mic or try another one, or whether we should have chicken or tuna for lunch.
Justin Colletti is a Brooklyn-based audio engineer, college professor, and journalist. He records and mixes all over NYC, masters at JLM, teaches at CUNY, is a regular contributor to SonicScoop, and edits the music magazine Trust Me, I’m A Scientist.