Table of Contents
Run-Length Encoding (RLE)
Run-Length Encoding (RLE) is one of the simplest lossless data compression techniques. It works by replacing sequences of repeated elements (called “runs”) with a single value and a repetition count.
Run-Length Encoding cannot be attributed to a single inventor; rather, it is based on one of the earliest and simplest general principles of data compression, developed and first used in the 1950s for computer image processing.
Basic Idea
When a sequence contains the same value multiple times in a row, instead of storing it multiple times, we store only the value and the number of times it repeats.
For example:
AAAAABBBCCDAA → 5A3B2C1D2A
Here:
- 'A' repeats 5 times → 5A
- 'B' repeats 3 times → 3B
- 'C' repeats 2 times → 2C
- 'D' repeats 1 time → 1D
- 'A' repeats again 2 times → 2A
The original message had 12 characters, but the encoded version has only 10, so there is some compression (especially for longer runs).
Binary Example
RLE is especially effective for binary data, such as black-and-white images.
Example (0 = white pixel, 1 = black pixel):
Original: 000000001111100000011111111000000000 Encoded : 8 0s, 5 1s, 6 0s, 8 1s, 7 0s → (8,0)(5,1)(6,0)(8,1)(7,0)
This binary form is what early image formats (like PCX, TGA) and FAX machines used.
Decoding
To reconstruct (decode) the original message, we simply repeat each symbol as many times as its count indicates.
For example:
Encoded: 12W1B12W3B24W1B14W Decoded: WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWWW
Advantages
- Very simple to implement
- Works well for data containing long runs of identical values (e.g., simple images, icons, masks)
- Low computational cost (fast encoding/decoding)
Disadvantages
- Inefficient when the data changes frequently (e.g., photographs, text files)
- May increase file size if no long runs exist (e.g,. `ABCD` → `1A1B1C1D`, which is longer)
- Usually combined with other compression algorithms in practical applications
Applications
- Image compression (TGA, PCX, BMP with RLE)
- FAX transmission (CCITT Group 3/4 standard)
- Simple bitmap fonts
- Storing masks and binary patterns
- Used as a sub-step in complex formats (e.g., TIFF, PDF internal compression)
A C program that implements this algorithm:
#include <stdio.h>
#include <string.h>
void runLengthEncoding(const char *input) {
int length = strlen(input);
for (int i = 0; i < length; i++) {
// Count occurrences of the current character
int count = 1;
while (i < length - 1 && input[i] == input[i + 1]) {
count++;
i++;
}
// Print the count followed by the character
printf("%d%c", count, input[i]);
}
printf("\n"); // New line after encoding
}
int main() {
// Example string to encode
char input[] = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWWW";
printf("Original String: %s\n", input);
printf("Run-Length Encoded: ");
runLengthEncoding(input);
return 0;
}
