This is an old revision of the document!
Table of Contents
Multimedia Compression Methods
In their uncompressed form, multimedia data typically require a very large amount of storage space. Multimedia compression techniques generally belong to the group of lossy methods, meaning they do not restore the original data but rather use tricks so that the human senses do not notice the difference.
Example: One second of stereo recording with a microphone at CD quality
The sampling rate is 44.1 kHz, meaning 44.100 samples are taken in one second. Assuming that, each sample is stored as a 16-bit value, and there are 2 samples for each time point, one for the left channel and one for the right channel:
$$ 44100 \times 16 \text{ bits} \times 2 \approx 1.4 \text{ Megabits} $$
And this is just one second!
Example: Full HD frame (1080p): 1920x1080 resolution (theoretical size)
$$ 1920 \times 1080 \times 32 \text{ bits} \approx 8 \text{ Megabytes} $$
Keep in mind that this is just a single frame, and modern multimedia systems typically display 60-100 frames per second.
Principle of Lossy Compression: The bitstreams need to be transformed so that the information important to the human eye and ear is preserved. Experience shows that our eyes are less sensitive to image components with higher spatial frequencies, meaning we find it harder to perceive a dense pattern than a less dense one. In the case of moving images (videos), our eyes are also more sensitive to low frequencies.
The two-dimensional Fourier transform helps convert the image pixels into frequency values.
Note: The Fourier transform can be imagined similarly to a mono microphone recording, where the recorded volume values are represented at specific points in time. The transformation can determine, for example, that a C-sharp (C#) note was played in a given interval while a C was played in another interval. It's like trying to reconstruct the musical score by ear. If we only encoded the note heads, we would achieve very high compression!