Have you ever wondered if you should be recording in 44.1 kHz or 48 kHz?
If not, that’s great too! Sample rate might not be the most interesting topic in the world, but it’s MUCH more important than you think.
The answer to that question depends on the situation though.
So, how do you decide which sample rate to record in? That’s what we’ll be discussing today!
In an ideal world, we’d be recording in the highest possible sample rate at all times. The thing is… at higher sample rates, audio files take up more storage space. So, it’s best to reserve 48kHz for audio that will be synced TO PICTURE. In other words, dialogue, SFX and music that’ll be used by the audio post-production team. 44.1 kHz is still used for music distribution though. If you want to find out why, you’ll just have to keep reading…
- Introduction to Sample Rates (44.1kHz, 48kHz, etc…)
- When To Record in 44.1 kHz
- When To Record in 48 kHz
- Increasing your sample rate decreases your audio latency
- Summary: Should I Record in 44.1 kHz or 48 kHz?
Introduction to Sample Rates (44.1kHz, 48kHz, etc…)
Before getting started, I think it’s important for us to understand the significance of sample rate. The best way to visualize sample rate is to compare it to frame rate.
For audio, we measure the number of samples recorded per second in Hz (Hertz).
For video, we measure the number of frames recorded per second in FPS (frames per second).
In other words…
The higher the sample/frame rate is, the more information is stored in each second of audio/video.
However, you would be able to distinguish 24 FPS from 30 FPS and 60 FPS with ease. It’s theoretically impossible to perceive the difference between 44.1 kHz, 48 kHz and higher (unless you’re not human).
That’s because most human beings don’t hear anything higher than 20 kHz.
If we were going to slow down the speed of the audio though, it progressively sound like the high frequencies would be tapering off.
That being said, the slower you go, the more high-frequencies you lose.
The ONLY way to counter that would be to record at higher sample rates.
That’s why most professional SFX libraries are recorded in sample rates as high as 192 kHz.
It’s the only way to go “slow-mo” without losing quality!
BUT, that only applies to the individual audio files used to create a master. The mix itself is rendered at 44.1 kHz or 48 kHz 99% of the time.
So, you can record in whatever sample rate you want! Just remember that…
- Files recorded at higher sample rates are larger in size
- Recording at higher sample rates reduces audio latency, but increases CPU load
That’s out of the wat, but we still need to know which sample rate to bounce our masters in!
When should we record and export in 44.1 kHz then?
When To Record and Bounce in 44.1 kHz
One of the main reasons you may want to hold back on increasing your sample rate is because you’ll most likely need to downsample your music. Streaming services like Spotify, Apple Music and more all stream your music at 44.1 kHz.
That may change in the future but for now, it’s not practical to go higher than 44.1 Hz.
That being said, you’d defeat the purpose of recording at 48 kHz (or higher) and you’d also be introducing some distortion into your mix.
If you’re interested in finding out more, the phenomenon is referred to as “dithering”.
So unless you’re planning to slow down your audio recordings, there’s no advantage to recording them at sample rates higher than 44.1 kHz.
If the primary purpose of your music is streaming/playback, record at 44.1 kHz.
Even if you had to upsample your master from 44.1 kHz to 48 kHz, it would still be possible with tracks that were recorded in 44.1 kHz.
If you think your music will get synced to picture though, maybe you’re better off getting used to 48 kHz!
That’s what we’re talking about in the next section…
When To Record and Bounce in 48 kHz
You’ll have to decide for yourself whether you record music in 44.1 kHz or 48 kHz.
I’ve been recording in 48 kHz for a while now because I make music for picture. If music streaming was my priority, I’d most likely be recording in 44.1 kHz.
However, when I’m recording production sound on film sets, I always set my recorder to 48 kHz.
It’s common practice to record dialogue at 48 kHz.
It’s also common practice to record anything that’ll be synced to video at 48 kHz. The only exception would be sound effects for SFX libraries (since those might be slowed down).
That’s because the sample rate for video is set to 48 kHz.
It’s not very clear WHY that is… but one of the reasons is that 48 is divisible.
I’ve also heard that “48 kHz gives more headroom”, but we’ve already established that it really wouldn’t make a difference. My advice… just do, don’t question it!
I know why you’d want to record in 48 kHz instead of 44.1 kHz though!
Increasing your sample rate decreases your audio latency
One of the discoveries that surprised me the most is that increasing your sample rate actually decreases your audio latency. However, the catch is that it increases the load on your CPU.
For most of us though, going up to 48 kHz isn’t that much of a stretch.
I think that little move is completely worth it to slightly reduce your audio latency.
In other words, if your CPU can handle it and you’re planning to get your music placed in TV/Film, you may want to consider making the switch to 48 kHz. You’ll definitely want to make sure that your plugins are compatible, but most of them should be!
Regardless of the application, you’ll actually be increasing your audio latency by staying at 44.1 kHz which defeats the purpose of faster CPU performance.
Think about it, and see what works best for you!
Summary: Should I Record in 44.1 kHz or 48 kHz?
When in doubt, I’d record at a sample rate of 48 kHz.
However, 44.1 kHz is still the industry standard if you’re doing anything that involves music.
48 kHz is the standard for picture (TV, film, etc…).
Remember though, you can have audio tracks recorded at different sample rates in the same project. It’s the sample rate that you BOUNCE your masters in that’s most important.
Dithering is definitely a thing, but it’s not worth overthinking it.
99% of the time I deliver my masters in 48 kHz, so that’s why I chose to record at 48 kHz.
I’ll even set my recorder to 192 kHz when I record SFX that I know I’ll be slowing down.
So, what sample rate are you going to record in… 44.1 kHz or 48 kHz?
Let us know in the comments and feel free to ask me questions too! Thanks for reading!
Sources
https://www.izotope.com/en/learn/digital-audio-basics-sample-rate-and-bit-depth.html
https://www.recordingrevolution.com/what-sample-rate-should-you-record-and-mix-at/
7 thoughts on “Should I Record in 44.1 kHz or 48 kHz? | Let’s Find the Optimal Sample Rate”
“It’s the same when it comes to video because we can’t really perceive anything higher than 60 FPS.”
Hey, I know this article is about the audio side of things, but I just want to get this off my chest real quick.
We can see well above 60FPS. It’s almost a decade since we have 240Hz monitors and 360Hz hit the market in 2019. Even up to 360Hz people still perceive a difference, which is why 480Hz monitors are being developed right now.
In my mind humans won’t be able to perceive a difference anymore if the framerate of the video output is so high, that your brain will create its own motion blur.
Because motion blur is a byproduct of our brain not being able to keep up with the visual information, which in real life won’t happen by refreshrates, since life doesn’t have FPS obviously, but rather with too fast motion.
So at some point in the future it won’t be about the Hz of the monitor, instead it will be about the speed of the content you watch. And once we are there, then we can ask ourselves how much Hz on a monitor we can perceive.
Hey Tobias,
By all means, I’m really grateful that you brought this up!
I ended up correcting that information, but your comment definitely adds some great value to this page! I definitely don’t specialize in the video-side of things, so I’ll leave that to you ;D
Very inspiring stuff, can’t wait to see what the future brings!
Thanks for your comment, until next time.
– Stefan
> life doesn’t have FPS obviously
Life isn’t a game or video indeed, but our eyes (and eventually brain) do have FPS in a sense, yes.
While if looking at our whole perception as a system, no matter if the constraint is our retina sensitivity, the nerves conveying the signal, or our brain processing it, it has become pretty clear since decades that the rate is at about 25-30 FPS. Hence the framerate of TV or other video media content.
Now, how does that refer to the refresh rates of displays. Well, first of all, just as in audio, any sampling rate is always double of the maximum frequency of the signal that will be transferred. In other words, sample any sound or motion with 60 FPS, to catch actually 30. That also goes the other way round. If you want to play out a 25 or 30 FPS video, you will need a 50/60 Hz display.
That is also the case for audio. If you have a soundcard that records at 48Khz, the maximum frequency you will capture of you microphone, no matter its physical properties or the electrical circuits transfer capabilities, the digital signal you get carries waves at 24 Khz max. That’s it.
Look up Shanons sampling theorem about the physics on that.
> In my mind humans won’t be able to perceive a difference anymore if the framerate of the video output is so high, that your brain will create its own motion blur.
That sounds a bit esotherical.
In principle, you can look at the brain just as any capture device, and the sampling theorem applies.
> Even up to 360Hz people still perceive a difference, which is why 480Hz monitors are being developed right now.
Yes they do, and that is another signal processing effect called over-sampling. What happens here is that using a higher sampling frequency is used to potentially compensate distortion due to the signal depth.
What is depth? It is the resolution of the actual value your signal can take. And while we discuss sampling rate a lot, this part is often forgotten. With audio this is typically for example 16 bit integer, and for video we use 8 bit per color channel, while HDR means btw to use 12 bit instead. And, also here there is a lot of discussion wether the additional color space resolution this allows for, is even within our minds granularity.
So now, when you have 3D rendering, since it isn’t a “real” capture, colors, lines are just not as smooth. 3D engines push out a max amount of frames per second, because that will allow displaying backends to smooth out that distortion somewhat, integrating those high frequency components that are out of their actual sampling rate into some subtle differences on the value side.
However, that technically wouldn’t change any signal that hasn’t distortion already in it. The distortion will come from clipping as much in the value and frequency dimensions (i.e colors and resolution), and such are much more likely in “non real” signals. Those clips will cause so-called high-frequency components. They can also be looked as “noise” btw.
What does that mean. For that, you need to take a look at another effect, called aliasing. See https://en.wikipedia.org/wiki/Aliasing
Aliasing means, high frequencies appearing in a signal that a are higher that the capture or presentation devices sampling frequency, will appear in there still, but as lower frequencies.
Think of it like on an clock, where when you passed the 12, you get back to 1. It is exactly that. If you are at 13 but the maximum is 12, then you get back to 1. The same will happen if you have a game rendering at 100 fps, and display it at 50.
Now, why are games rendered at 100 fps and more if our eyes/nerves/brains only get ~30 fps max? Well, ask the game engine developers 🙂 i think there is a bit of carelessness involved, mixed with a type of esotherism that I also see with audio engineers swearing they hear a difference when recording their sound at 32 bit float and 192khz rate 😀
As far as I understood, noting this is where I go out of my comfort and expertise zone as an engineer, so just giving an intuitive guess – there are 2 reasons.
One is, when you render 3D you just render as much and fast as possible. Trying to aim for a specific sample rate would probably bring a much worse or unusable result – that would also mean having to hit exact timing, which the hardware constraints typically wont allow. That is even impossible for TV video decoding, the presentation hits proper fluency only thanks for some buffering on each end of decoding/displaying. But with 3D games, you dont have such buffers, things need to get displayed in real-time. So those engines just push out a flood of frames as much as they can, and then up to either presentation backend to do something with that over-sampled signal.
Second reason is, why I was explaining oversampling. Graphics cards and displays nowadays know how to take advantage of a 400 fps “incoherent” signal that has distortion in it and irregular sample timing, and make that look like a smooth experience to your good ol 30 fps brain/eyes.
If that helps understanding btw, oversampling at playback is as old as your 90s audio CD players. Look at the “1 bit DAC” it write everywhere. That is because audio CDs are supposed to recover a 44khz signal with a 16bit-ish depth, but it is actually put on the record at much higher rate, but with 1bit (!) depth. The idea is that this bit depth and higher sample rate is much more resilient to errors and noise (scratchs, dirt on the media) and thus the recovery will be more reliable. Mathematically, the idea here is that the 1bit “deltas” that are oversampled get “integrated” in the converter, to sum up to the actual wave form that will then recover with the initially targetted sample rate and depth.
We can absolutely see see framerates faster than 60fps. Ever owned a 120fps monitor?
Hey Evan,
I’ve never owned a 120 FPS monitor. You’re absolutely right though!
I did some research after reading your comment and have corrected the information. Thanks for the bringing that up!
All the best.
– Stefan
Your comment about recording at 20khz is incorrect. It takes 2 sample points to create a wave, so 20khz can only interpret audio up to 10khz, anything above that frequency would be incorrectly rendered, so all your audio would have to be brick wall filtered at 10K. Throw a 10K low pass filter with a 120db per octave slope on your master bus and you’ll get an idea of how limiting recording at 20khz would be. Yikes.
Hey Joe,
You’re absolutely right, thank you for the correction. I read an article on iZotope’s website that explains the concept pretty well.
Ironically, the “recording at 20kHz” statement actually comes from what I was taught in school. It just proves that we really should question everything, ESPECIALLY what we’re taught in school.
Anyway, I really appreciate the correction. I’ll be updating the article shortly.
Thanks for stopping by, all the best!
– Stefan