tanszek:oktatas:techcomm:multimedia_compression
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
tanszek:oktatas:techcomm:multimedia_compression [2024/10/07 08:35] – created knehez | tanszek:oktatas:techcomm:multimedia_compression [2024/11/19 11:10] (current) – knehez | ||
---|---|---|---|
Line 1: | Line 1: | ||
===== Multimedia Compression Methods ===== | ===== Multimedia Compression Methods ===== | ||
+ | Multimedia files like audio, video, and images are often very large in their uncompressed form. Compression is used to reduce the amount of data required to store or transmit this information, | ||
+ | - **Lossless Compression**: | ||
+ | - **Lossy Compression**: | ||
- | In their uncompressed form, multimedia data typically require a very large amount of storage space. | + | ==== Lossy Compression Techniques in Multimedia |
+ | Lossy compression techniques | ||
+ | - **Human Perception Optimization**: | ||
+ | - For images, the human eye is **less sensitive** to subtle changes in **high spatial frequency** areas (fine details), which allows image compression | ||
+ | - For audio, humans are more sensitive to **lower frequencies** and less sensitive to **higher frequencies**, | ||
+ | |||
+ | ==== Audio Sampling Example ==== | ||
+ | When audio is recorded (e.g., with a microphone), | ||
+ | - **CD Quality Audio**: Has a sampling rate of **44.1 kHz** (44,100 samples per second), representing each sample as a **16-bit** value. In stereo audio, two channels (left and right) are recorded, which means twice the amount of data is needed. | ||
+ | - Example Calculation: | ||
+ | - **1 second of stereo CD quality audio**: | ||
+ | \[ | ||
+ | 44100 \text{ samples/ | ||
+ | \] | ||
+ | This is a significant amount of data for just one second of sound, demonstrating why compression is essential. | ||
- | ==== Example: One second of stereo recording with a microphone at CD quality | + | ==== Video Frame Compression |
+ | Video consists of a sequence of frames displayed rapidly to create the illusion of motion. | ||
+ | - **Full HD Video**: A Full HD (1080p) frame has a resolution of **1920x1080** pixels, and with **24 bits** per pixel (which allows for " | ||
+ | \[ | ||
+ | 1920 \times 1080 \times 24 \text{ bits} \approx 6 \text{ Megabytes} | ||
+ | \] | ||
+ | - A typical video displays **30 frames per second** (fps), meaning that **240 MB** of data would be required per second without compression. This is why video compression is vital to make streaming and storage practical. | ||
- | The sampling rate is 44.1 kHz, meaning 44.100 samples are taken in one second. Assuming that, each sample | + | ==== Two-Dimensional Fourier Transform ==== |
+ | The **Fourier Transform** | ||
+ | - In multimedia compression, **2D Fourier Transform** is used for images. It converts image pixels into frequency components. These frequency components can then be analyzed, and parts less significant to human perception can be discarded. | ||
+ | - **Example**: | ||
+ | |||
+ | - In practice, this means that areas of an image with **high-frequency details** (e.g., fine patterns) can be simplified or removed during compression without significantly impacting | ||
- | $$ 44100 \times 16 \text{ bits} \times 2 \approx 1.4 \text{ Megabits} $$ | + | ==== Fourier Transform Analogy with Music ==== |
+ | The analogy mentioned in the text compares the **Fourier Transform** to recognizing musical notes in an audio recording: | ||
+ | - Imagine you have a **mono recording** of music with different notes being played over time. The **Fourier Transform** is like figuring out which notes are being played (e.g., **C#**, **C**) during different time intervals. | ||
+ | - This is similar to trying to write the **musical score** (sheet music) just by listening to the audio. By focusing only on the most important notes (the **note heads**), you would end up with a much more **compressed** version of the original sound while retaining most vital information. | ||
- | And this is just one second! | + | ==== Human Sensory Limitations and Compression ==== |
+ | - **Vision**: The human eye is more sensitive to **low spatial frequencies** (smooth gradients) and less sensitive to **high spatial frequencies** (fine details or noise). This property is used in image and video compression to drop unnecessary detail in complex patterns, which most viewers won’t notice. | ||
+ | - **Hearing**: | ||
- | ==== Example: Full HD frame (1080p): 1920x1080 resolution (theoretical size) ==== | + | ==== Summary |
- | + | Multimedia compression methods, particularly lossy ones, take advantage of **human sensory limitations** to reduce data size without noticeable loss in quality. Techniques such as **Fourier Transform** allow multimedia compression algorithms | |
- | $$ 1920 \times 1080 \times 32 \text{ bits} \approx 8 \text{ Megabytes} $$ | + | |
- | + | ||
- | Keep in mind that this is just a single frame, and modern multimedia systems typically display 60-100 frames per second. | + | |
- | + | ||
- | **Principle of Lossy Compression**: The bitstreams need to be transformed so that the information important to the human eye and ear is preserved. Experience shows that our eyes are less sensitive to image components with higher spatial frequencies, meaning we find it harder to perceive a dense pattern than a less dense one. In the case of moving images (videos), our eyes are also more sensitive to low frequencies. | + | |
- | + | ||
- | The two-dimensional Fourier transform helps convert the image pixels into frequency values. | + | |
- | + | ||
- | Note: The Fourier transform can be imagined similarly to a mono microphone recording, where the recorded volume values are represented at specific points in time. The transformation can determine, for example, that a C-sharp (C#) note was played in a given interval while a C was played in another interval. It's like trying to reconstruct the musical score by ear. If we only encoded the note heads, we would achieve very high compression! | + |
tanszek/oktatas/techcomm/multimedia_compression.1728290115.txt.gz · Last modified: 2024/10/07 08:35 by knehez