Character Encoding

Creating a unified code for characters is a significant achievement in technical communication. The American Standards Institute initially standardized a 7-bit code, known as ASCII (American Standard Code for Information Interchange). This code includes the 26 uppercase and lowercase letters of the English alphabet, digits, punctuation marks, and so-called control characters. Control characters were used to manage text formatting and control specific applications.

ASCII: The First Standard

ASCII was designed for English and includes 128 characters, where:

However, this 7-bit code couldn't handle characters from languages with accents, mathematical symbols, or special graphical symbols. As a result, a need arose to extend the character set.

ANSI and Extended Encoding

To address the need for more characters, an 8-bit extension was developed and named ANSI (American National Standards Institute) encoding. This provided 256 characters and allowed for additional language-specific characters. One widely used extension, particularly for text editing, is the IBM Code Page 852, which includes the 18 accented characters used in the Hungarian language.

ISO Standards

To further standardize character sets globally, the ISO 8859 series was introduced:

Unicode and UCS: Universal Character Sets

As digital communication expanded, a more comprehensive character set was required to accommodate all languages and symbols. The ISO-10646 standard defines a Universal Character Set (UCS), ensuring that no information is lost when converting any character to UCS and back. UCS includes characters from all known languages, both living and extinct, as well as known mathematical and scientific symbols.

Unicode

The Unicode standard is widely used alongside UCS. Each Unicode character is identified by both a number and a name. For example, U+0041 stands for “Latin capital letter A.” Unicode's range:

Differences Between ISO 10646 and Unicode

While ISO 10646 defines the code table of characters, Unicode adds further functionality by providing:

Practical Example

In practice, if you're encoding a text that contains the characters A, ű, and Ω:

Examples of Encoding Systems

With Unicode's rise, character encoding has become highly flexible, supporting diverse languages, symbols, and historical texts. This makes Unicode the most widely adopted standard for modern software systems.