Lily Katz / Android Authority
When the MP3 player took off in the late 1990s, the format itself came into the public consciousness in a way not many others have — with perhaps the Word document as an exception. But what exactly is an audio format, and why should you care?
This guide covers some of the most popular formats used by audio streaming services today and explains their differences.
What is an audio file format?
A digital audio file is how recorded content is stored on a computer, media player, smartphone, or other device. Digital audio, at its most basic level, is a series of numbers that a device can use to mimic sound waves. There are several ways to achieve this and then compress (or not) the resulting data. We know that by sampling a sound wave in the process of analog to digital conversion with a minimum of 16 bits at 44.1 kHz, we can perfectly reproduce the captured signal later on. This is thanks to what mathematics called the Nyquist-Shannon Sampling. We can achieve higher bitrates and frequency ranges, but whether: everyone can hear a difference – even though the best headphones – is debatable at best.
If we just store that data as it is (known as pulse code modulation or PCM), the file takes up a lot of space. Therefore, both lossy and lossless forms of audio compression have been developed. Lossy audio throws out audio frequencies that our ears can’t hear, while lossless retains them all. Lossy audio formats can also use other tricks to compress audio even further, which we’ll discuss later.
Because most people nowadays access their music through streaming services, lossy compressed file formats are the primary way content is distributed. That’s fine if you listen casually, but some people demand the utmost in quality. As a result, more and more high-quality and even lossless streaming options are now available. But there’s no getting around the fact that lossy formats take up less space and consume less mobile data, as the chart below makes clear.
Stereo file sizes (16-bit 44.1 kHz) | WAV | AIFF | FLAC (typical) | MP3 (320Kbps) | MP3 (192Kbps) |
---|---|---|---|---|---|
Stereo file sizes (16-bit 44.1 kHz)
1 minute |
WAV
10.6MB |
AIFF
10.6MB |
FLAC (typical)
6.4MB |
MP3 (320Kbps)
2.4MB |
MP3 (192Kbps)
1.4MB |
Stereo file sizes (16-bit 44.1 kHz)
4 minutes |
WAV
41.6MB |
AIFF
41.6MB |
FLAC (typical)
24.9MB |
MP3 (320Kbps)
9.6MB |
MP3 (192Kbps)
5.6MB |
Stereo file sizes (16-bit 44.1 kHz)
1 hour |
WAV
635 MB |
AIFF
635 MB |
FLAC (typical)
381 MB |
MP3 (320Kbps)
144MB |
MP3 (192Kbps)
84MB |
MP3
The MP3 audio file format once reigned supreme when it came to downloading music. In fact, the format has become so synonymous with mobile music solutions that “MP3 player” is now generic for an audio playback device. Today, however, it is less prominent for several reasons. However, it lingers. If we understand MP3 files, we can also understand other formats more easily, so we’ll start here.
An MP3 file is a lossy audio file, meaning it throws out data that our ears can’t hear. Almost every human has a hearing range somewhere in the range of 2oHz to 20kHz. The upper limit actually decreases with age, but in general that’s the range within any sound you’ll ever hear lies. Knowing that other frequencies are therefore redundant, MP3 ignores all frequencies outside this range.
To save some more space, MP3 files use even more tricks. Audio engineers use noise shaping algorithms based on psychoacoustic effects of the human ear and brain to remove parts of music we shouldn’t be able to hear. For example, the brain cannot distinguish between two frequencies that are right next to each other. In addition, the adult human ear has difficulty identifying the direction of high-frequency sounds. It also starts to lose sensitivity above 16 kHz. In addition, loud noises can mask quiet noises. All of these can be removed with little to no noticeable difference to the final listener.
Basically, MP3 files remove frequencies we can’t hear and frequencies we could hear individually, but not because of the way they’re combined in a particular song.
An MP3 splits a track into 576 sample frames, and Fast Fourier Transforms (FFT) are used to obtain frequency data from these frames. The frequency data is then analyzed to see if there are opportunities to apply the compression rules based on human hearing, as described above. If so, these parts are rounded down (quantized) to lower the bitrates, which saves space. Data on restoring each frame to the full representation of the sound wave is stored in a 32-bit header.
The bitrate determines the maximum allowed file size for each frame. The more aggressive the compression, the more likely the algorithm is to remove something that is audible. In addition, this type of filtering and cutting is not perfect and the quantization can leave behind artifacts that some people can hear. This lossy psychoacoustic compression is then followed by a lossless Huffman Coding compression similar to a .zip file to save more space.
If that sounds too complicated, the bottom line is that MP3 files remove frequencies that we can’t hear and that we could theoretically hear individually, but not in any given song due to auditory masking. This can lead to quite small file sizes. However, if it’s done too aggressively or at too low a bitrate, the quality can suffer. As a result, MP3 is not as popular for streaming anymore.
AAC, M4A and OGG Vorbis audio formats
Zak Khan / Android Authority
Audio compression can take many forms and other formats have been developed. These use slightly different algorithms and techniques to do the job, so we can’t compare them based on bitrate alone.
OGG Vorbis is an open source alternative to MP3. It still uses FFT and similar methods to analyze and quantize maskable frequency information, but uses a different algorithm. Vorbis also takes into account the noise floor to improve performance at low bit rates. Spotify uses this format at 320 kbps.
There is also AAC, which is used by Apple MusicTidal, Pandora and Youtube music. It is an evolution of the MPEG (MP3) format and allows for higher sample rates up to 96kHz. In addition, the frame length can dynamically switch between 1024/960 or 128/120 samples for better resolution if needed. It performs better at smaller file sizes than MP3s, to boot.
Another file type that you may come across is the M4A file. These files are encoded in the AAC format and then stored in an MPEG-4 container, hence the .m4a file extension. Apple created this type in response to MP3. While it’s not as universally supported, it’s not rare by any means.
For these reasons, you can’t directly compare bitrates and claim that a higher bitrate would be a better sounding file between, say, AAC and MP3. Lower bit rate AAC and M4A files can still sound good while taking up less space.
That makes formats such as OGG Vorbis and AAC attractive for streaming services. They can deliver higher quality sound while using less of your mobile data.
FLAC
If you don’t want to ditch frequencies, but still want a file that is smaller than raw data, FLAC comes into play. FLAC does not delete any part of a recording and that is why it is called lossless. Apple’s version of a lossless codec is called ALAC. Both codecs work as a .zip file. If you’ve ever zipped a collection of files and then extracted them, you’ll understand the basic idea. Nothing is deleted, the FLAC file just looks for ways to consolidate repeating patterns and data and then reconstructs it on playback.
However, FLAC files will never be as small as MP3 or AAC files. But as bandwidth becomes cheaper and more accessible, more and more streaming services offer the option to stream with FLAC. These are often “HD”, “Ultra HD” or “HiFi” subscriptions. Amazon Music Unlimited, Tidal HiFi and HiFi Plus, Deezer Premiumand Qobuz all offer FLAC streaming.
Keep in mind that FLAC files are larger than lossy formats and can eat up a lot of your data. If you store them on a device, they also take up storage space quite quickly.
WAV and AIFF audio formats
Audio recordings can just be stored pure PCM on a device, which is essentially what WAV (on Windows) and AIFF (on Mac) are. They represent some of the earliest forms of digital music storage. No compression or anything else has been applied to these files. In fact, you can find out their file size quite easily with the following equation:
PCM size = sample rate X (bits per sample /8) X time in seconds X number of channels
As a result, these formats can lead to incredibly large file sizes. That means they’re pretty rare for streaming and downloading, though services like HD tracks offer them anyway. What these files are really useful for is audio mixing and editing. Because no conversion, compression or anything else has taken place, it is easy and fast to edit, save and re-edit tracks if necessary.