Pulse Code Modulation is a technique used to store audio data in computers. This is done without compressing the audio. In this post I shall expain in brief the format of uncompressed audio. The structure of an uncompressed audio file is as below:
Offset | Bytes | Data |
0 | 4 | “RIFF” |
4 | 4 | size of waveform chunk. This is actally 8 bytes minus the actual file size |
8 | 4 | “WAVE” |
C | 4 | “fmt” |
10 | 4 | size of format chunk |
14 | 2 | wf.wformatTag |
16 | 2 | wf.nChannels |
18 | 4 | wf.nsamplesPerSec |
1C | 4 | wf.nAvgbytesPerSec |
20 | 2 | wf.nBlockAlign |
22 | 2 | wf.wBitsPerSample |
24 | 4 | “data” |
28 | 4 | size of waveform data |
2C | waveform data |
This format is based on RIFF(Resource Intensive File Format). This is a tagged file format where the file contains chunks of data that are preceded by ASCCI names and chunk size. So in a RIFF file, you may encounter several tags.
The first item in the Waveform audio is the string RIFF which identifies it as a RIFF file. The next field is the 32 bit chunk size which is the size of the rest of the file. ie. the actual file size will be 8 minus the total size. The chunk data actually starts with the string “WAVE”. This is actually a tag which states that the rest of the file is a WAVE chunk. Next is the string “fmt” which is another tag which contains the format of the audio data. Next comes the size of the format information. The format information contains the first 16 bytes of WAVEFORMATEX structure. The WAVEFORMATEX structure defines the format of waveform audio data:
typedef struct {
WORD wFormatTag;
WORD nChannels;
DWORD nSamplesPerSec;
DWORD nAvgBytesPerSec;
WORD nBlockAlign;
WORD wBitsPerSample;
WORD cbSize;
} WAVEFORMATEX;
So the format information contains all member except cbSize member of WAVEFORMATE structure.
I found a nice tool called RiffPad to inspect the Wave form audio data. You can load an audio file and view the above format in this editor. Below is a screenshot of a file on RiffPad.The first one shows the WAVEFORMATEX structure while the second one shows the actual audio data section.
.