JPEG (Joint Photographic Experts Group) encoding is used to compress photographic images and is particularly effective for images with smooth variations in tone and color. The key idea is to reduce redundancy and irrelevancy based on human visual perception.
JPEG compression involves multiple steps:
1. Color Space Conversion
2. Block Splitting
3. Discrete Cosine Transform (DCT)
4. Quantization
5. Zigzag Scanning and Run-Length Encoding
6. Entropy Coding (Huffman Encoding)
Below is a description of each step, followed by an example.
—
JPEG typically converts the original image from the RGB color space to YCbCr color space. This conversion separates the image into luminance (Y) and chrominance (Cb and Cr) components.
Humans are more sensitive to luminance details than chrominance, which allows the chrominance channels to be more heavily compressed.
The image is divided into 8×8 blocks of pixels, and each block is processed independently. Working with smaller blocks allows for better compression while keeping the computations manageable.
For example, if we have a 16×16 pixel image, it is split into four 8×8 blocks.
Each 8×8 block undergoes a Discrete Cosine Transform (DCT). The DCT converts the spatial information (pixel values) into frequency information. (same as the Fourier transformation in the case of audio.)
The idea is that most of the information about an image is concentrated in the low frequencies, and the high frequencies can often be discarded with minimal visual impact.
Consider an 8×8 block of pixel values:
34 36 40 48 51 50 49 43 35 37 42 46 50 51 50 45 36 39 45 49 52 54 52 48 37 41 47 52 54 54 54 49 39 43 50 54 57 56 55 51 40 45 52 56 58 57 56 52 42 46 52 58 60 60 58 54 43 47 53 59 62 61 59 55
After applying the DCT, you might get values such as:
416 33 -19 9 -4 2 -2 0 -35 30 -8 -6 3 -2 0 0 12 -9 -2 -1 1 1 0 0 4 -3 -1 0 0 0 0 0 -3 2 0 0 0 0 0 0 1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Notice how most of the values are close to zero, especially in the lower right section.
The DCT coefficients are divided by a quantization matrix to reduce the precision of the high-frequency components. This step is key to achieving compression, but it also introduces loss.
For example, a simple quantization matrix might look like:
16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99
The DCT coefficient values are divided by the corresponding quantization values and rounded to the nearest integer:
Quantized Value = Round(DCT Value / Quantization Value)
The result of quantization for our DCT block might look like:
26 3 -2 1 0 0 0 0 -3 2 -1 0 0 0 0 0 1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Most of the high-frequency components become zero after quantization, leading to better compression.
The quantized values are then zigzag scanned to convert the 8×8 matrix into a 1D array. The zigzag pattern ensures that low-frequency components (which tend to have non-zero values) are grouped together, and high-frequency components (which are often zeros) are at the end.
For example, the scanned sequence might look like:
26 3 -3 2 -2 1 1 -1 0 0 0 0 0 0 0 ...
These values are then run-length encoded to represent long sequences of zeros efficiently.
Finally, the sequence of values is entropy coded using Huffman encoding to further reduce the size of the data. Huffman encoding replaces frequently occurring values with shorter codes and less frequent values with longer codes.
—
Consider the original 8×8 block:
34 36 40 48 51 50 49 43 ...
After applying DCT, quantization, zigzag scanning, and Huffman encoding, the resulting compressed data is much smaller in size than the original. During decoding, the reverse steps are applied to reconstruct the image. Although some details are lost due to quantization, the result is visually close to the original.
—
JPEG compression is a balance between image quality and file size. By discarding perceptually irrelevant details, JPEG can achieve impressive compression ratios while maintaining acceptable visual quality, making it one of the most popular image formats.