Multipurpose Internet Mail Extensions (MIME) is the official format for internet emails. Emails are transmitted to recipients via the SMTP (Simple Mail Transfer Protocol), which only supports the transfer of 7-bit ASCII characters. To handle binary data in email attachments, the MIME standard provides several encoding methods, one of the most common being Base-64 encoding.
Below is an example of an email with an attachment, showing the source data that email programs interpret:
Content-type: multipart/mixed; boundary="frontier" MIME-version: 1.0 --frontier Content-type: text/plain This is the body of the message. --frontier Content-type: application/octet-stream Content-transfer-encoding: base64 gajwO4+n2Fy4FV3V7zD9awd7uG8/TITP/vIocxXnnf/5mjgQjcipBUL1b3uyLwAVtBLOP4nV AhSzlZnyLAF8na0n7g6OSeej7EjlF/aglS6ghfju FgRr+OX== --frontier--
The second and third lines from the end show Base-64 encoded data.
Base-64 encoding (or more generally, data representation) is based on a set of 64 symbols. It's essentially like converting data into a 64-base numbering system. The encoding works by grouping the data into 6-bit chunks.
The Base-64 encoding scheme uses the following symbol set:
This gives us 64 unique symbols to represent binary data. The encoding is done so that every 3 bytes of input data is converted into 4 characters of Base-64 output.
Let's take a simple binary string and encode it in Base-64:
Input:
001100110011
We split it into two 6-bit groups:
001100 , 110011
These groups correspond to decimal values 12 and 51. Using the Base-64 table:
Thus, the Base-64 encoding of `001100110011` is Mz.
Base-64 encoding operates on blocks of 3 bytes (24 bits). If the input data isn’t divisible by 3, padding is added to complete the block. This padding is represented by the '=' symbol. Here are two examples:
Encoding the byte `00000001`
1. We first extend it to 3 bytes by adding two zero bytes: 00000001 00000000 00000000
.
2. Split this into 6-bit groups: 000000 010000 000000 000000
.
3. The result is AQ==
.
Encoding the bytes 00000010 00000001
1. Extend to 3 bytes: 00000010 00000001 00000000
.
2. Split into 6-bit groups: 000000 100000 000100 000000
.
3. The result is AgE=
.
Decoding Base-64 involves reversing the process, converting each Base-64 character back into its 6-bit binary equivalent. The binary groups are then combined to form the original data. The number of '=' symbols at the end of the encoded data tells us whether the last byte is incomplete, helping determine whether to interpret the last 6 or 12 bits of the encoded string.
In everyday internet use, Base-64 encoding is used in various places, such as:
For instance, in the email example above, the binary content of the attachment is encoded into Base-64 so it can be transmitted through the SMTP protocol, which only supports 7-bit ASCII characters.