Differences

This shows you the differences between two versions of the page.

--- tanszek:oktatas:techcomm:multimedia_compression [2024/10/07 08:35] – created knehez
+++ tanszek:oktatas:techcomm:multimedia_compression [2024/11/19 11:10] (current) – knehez
@@ Line 1: / Line 1: @@
 ===== Multimedia Compression Methods =====
+Multimedia files like audio, video, and images are often very large in their uncompressed form. Compression is used to reduce the amount of data required to store or transmit this information, making storage more efficient and reducing bandwidth requirements for transmission. There are two main types of compression methods:
+   - **Lossless Compression**: This preserves all the original data perfectly, meaning that the decompressed file is identical to the original. Examples include **PNG** for images and **FLAC** for audio.
+   - **Lossy Compression**: This reduces the file size by removing less important or less noticeable data to human perception, often sacrificing some quality in the process. **JPEG** for images and **MP3** for audio are popular lossy formats.
-In their uncompressed form, multimedia data typically require a very large amount of storage space. Multimedia compression techniques generally belong to the group of //lossy methods//, meaning they do not restore the original data but rather use tricks so that the human senses do not notice the difference.
+==== Lossy Compression Techniques in Multimedia ====
+Lossy compression techniques are often used for **audio, video, and images** to remove data that is not perceptually significant to humans.
+   - **Human Perception Optimization**: Lossy compression exploits limitations in the human eye and ear:
+     - For images, the human eye is **less sensitive** to subtle changes in **high spatial frequency** areas (fine details), which allows image compression methods like **JPEG** to reduce the size by dropping some fine-grained details.
+     - For audio, humans are more sensitive to **lower frequencies** and less sensitive to **higher frequencies**, which allows audio codecs like **MP3** to selectively discard less audible frequencies.
+==== Audio Sampling Example ====
+When audio is recorded (e.g., with a microphone), it needs to be **digitized** for storage. The **sampling rate** is the number of times the audio signal is measured per second.
+   - **CD Quality Audio**: Has a sampling rate of **44.1 kHz** (44,100 samples per second), representing each sample as a **16-bit** value. In stereo audio, two channels (left and right) are recorded, which means twice the amount of data is needed.
+   - Example Calculation:
+     - **1 second of stereo CD quality audio**:
+       \[
+\text{ samples/second} \times 16 \text{ bits/sample} \times 2 \approx 1.4 \text{ Megabits}
+       \]
+This is a significant amount of data for just one second of sound, demonstrating why compression is essential.
-==== Example: One second of stereo recording with a microphone at CD quality ====
+==== Video Frame Compression Example ====
+Video consists of a sequence of frames displayed rapidly to create the illusion of motion.
+   - **Full HD Video**: A Full HD (1080p) frame has a resolution of **1920x1080** pixels, and with **24 bits** per pixel (which allows for "true-color" with alpha transparency), one frame takes up:
+     \[
+\times 1080 \times 24 \text{ bits} \approx 6 \text{ Megabytes}
+     \]
+   - A typical video displays **30 frames per second** (fps), meaning that **240 MB** of data would be required per second without compression. This is why video compression is vital to make streaming and storage practical.
-The sampling rate is 44.1 kHz, meaning 44.100 samples are taken in one second. Assuming that, each sample is stored as a 16-bit value, and there are 2 samples for each time point, one for the left channel and one for the right channel:
+==== Two-Dimensional Fourier Transform ====
+The **Fourier Transform** is a mathematical operation transforming a signal from the **time domain** (or spatial domain, for images) to the **frequency domain**.
+   - In multimedia compression, **2D Fourier Transform** is used for images. It converts image pixels into frequency components. These frequency components can then be analyzed, and parts less significant to human perception can be discarded.
+   - **Example**: Imagine an audio waveform where the amplitude is plotted over time. The Fourier Transform decomposes this waveform into its frequency components — figuring out which notes (frequencies) are being played and how loud they are. This concept is applied to images as well, allowing more effective compression.
+   - In practice, this means that areas of an image with **high-frequency details** (e.g., fine patterns) can be simplified or removed during compression without significantly impacting the perceived quality. This is what is exploited in compression standards like **JPEG** to achieve significant size reduction.
-$$ 44100 \times 16 \text{ bits} \times 2 \approx 1.4 \text{ Megabits} $$
+==== Fourier Transform Analogy with Music ====
+The analogy mentioned in the text compares the **Fourier Transform** to recognizing musical notes in an audio recording:
+   - Imagine you have a **mono recording** of music with different notes being played over time. The **Fourier Transform** is like figuring out which notes are being played (e.g., **C#**, **C**) during different time intervals.
+   - This is similar to trying to write the **musical score** (sheet music) just by listening to the audio. By focusing only on the most important notes (the **note heads**), you would end up with a much more **compressed** version of the original sound while retaining most vital information.
-And this is just one second!
+==== Human Sensory Limitations and Compression ====
+   - **Vision**: The human eye is more sensitive to **low spatial frequencies** (smooth gradients) and less sensitive to **high spatial frequencies** (fine details or noise). This property is used in image and video compression to drop unnecessary detail in complex patterns, which most viewers won’t notice.
+   - **Hearing**: The human ear is more sensitive to certain frequency ranges. **MP3 compression** uses this by discarding audio data in ranges we typically cannot hear well.
-==== Example: Full HD frame (1080p): 1920x1080 resolution (theoretical size) ====
+==== Summary ====
+Multimedia compression methods, particularly lossy ones, take advantage of **human sensory limitations** to reduce data size without noticeable loss in quality. Techniques such as **Fourier Transform** allow multimedia compression algorithms to identify and remove less perceptible data, thereby achieving substantial compression ratios. These methods enable storing and transmitting multimedia content effectively, without overwhelming data storage capacities or requiring impractical bandwidth.
-$$ 1920 \times 1080 \times 32 \text{ bits} \approx 8 \text{ Megabytes} $$
-Keep in mind that this is just a single frame, and modern multimedia systems typically display 60-100 frames per second.
-**Principle of Lossy Compression**: The bitstreams need to be transformed so that the information important to the human eye and ear is preserved. Experience shows that our eyes are less sensitive to image components with higher spatial frequencies, meaning we find it harder to perceive a dense pattern than a less dense one. In the case of moving images (videos), our eyes are also more sensitive to low frequencies.
-The two-dimensional Fourier transform helps convert the image pixels into frequency values.
-Note: The Fourier transform can be imagined similarly to a mono microphone recording, where the recorded volume values are represented at specific points in time. The transformation can determine, for example, that a C-sharp (C#) note was played in a given interval while a C was played in another interval. It's like trying to reconstruct the musical score by ear. If we only encoded the note heads, we would achieve very high compression!