Creating a unified code for characters is a significant achievement in technical communication. The American Standards Institute initially standardized a 7-bit code, known as ASCII (American Standard Code for Information Interchange). This code includes the 26 uppercase and lowercase letters of the English alphabet, digits, punctuation marks, and so-called control characters. Control characters were used to manage text formatting and control specific applications.
ASCII was designed for English and includes 128 characters, where:
However, this 7-bit code couldn't handle characters from languages with accents, mathematical symbols, or special graphical symbols. As a result, a need arose to extend the character set.
To address the need for more characters, an 8-bit extension was developed and named ANSI (American National Standards Institute) encoding. This provided 256 characters and allowed for additional language-specific characters. One widely used extension, particularly for text editing, is the IBM Code Page 852, which includes the 18 accented characters used in the Hungarian language.
To further standardize character sets globally, the ISO 8859 series was introduced:
As digital communication expanded, a more comprehensive character set was required to accommodate all languages and symbols. The ISO-10646 standard defines a Universal Character Set (UCS), ensuring that no information is lost when converting any character to UCS and back. UCS includes characters from all known languages, both living and extinct, as well as known mathematical and scientific symbols.
The Unicode standard is widely used alongside UCS. Each Unicode character is identified by both a number and a name. For example, U+0041 stands for “Latin capital letter A.” Unicode's range:
While ISO 10646 defines the code table of characters, Unicode adds further functionality by providing:
In practice, if you're encoding a text that contains the characters A, ű, and Ω:
With Unicode's rise, character encoding has become highly flexible, supporting diverse languages, symbols, and historical texts. This makes Unicode the most widely adopted standard for modern software systems.