Differences

This shows you the differences between two versions of the page.

--- tanszek:oktatas:techcomm:utf-8_encoding [2024/11/19 10:58] – [Structure of UTF-8] knehez
+++ tanszek:oktatas:techcomm:utf-8_encoding [2024/11/19 10:59] (current) – [UTF-16 Encoding] knehez
@@ Line 48: / Line 48: @@
 The Unicode code point for 'ó' is **U+00F3**, which is decimal 243 or hexadecimal 0x00F3. Since this is more than 7 bits, it fits the second rule for UTF-8 encoding (2-byte encoding).
-- Unicode binary: 00000000 00000000 11110011
+- Unicode binary: ''00000000 00000000 11110011''
-- UTF-8 binary: 11000011 10110011 (C3 B3 in hexadecimal).
+- UTF-8 binary: ''11000011 10110011'' (C3 B3 in hexadecimal).
 Thus, 'ó' in UTF-8 is encoded as two bytes: **C3 B3**.
-==== UTF-16 Encoding ====
+===== UTF-16 Encoding =====
 **UTF-16** is another encoding standard that uses either 2 or 4 bytes to represent characters. Unlike UTF-8, UTF-16 uses **a minimum of 2 bytes** for each character, which simplifies the encoding of characters but can be less space-efficient for texts containing many ASCII characters.
@@ Line 67: / Line 67: @@
   * **FF FE**: Little-endian (least significant byte first)
-For example, in Windows text files or Microsoft Office documents, you may encounter this BOM at the beginning, especially when opening files in text editors like Notepad.
+For example, you may encounter this BOM initially in Windows text files or Microsoft Office documents, especially when opening files in text editors like Notepad ++.
-=== Conclusion ===
+==== Conclusion ====
-UTF-8 has become the dominant encoding standard because it is backward-compatible with ASCII, space-efficient for texts that are predominantly ASCII, and can represent any character in the Unicode standard. Meanwhile, UTF-16 is commonly used in environments like Windows, where it handles 2-byte characters efficiently.
+**UTF-8** has become the dominant encoding standard because it is backwards-compatible with ASCII, space-efficient for predominantly ASCII texts, and can represent any character in the Unicode standard. Meanwhile, UTF-16 is commonly used in environments like Windows, which handles 2-byte characters efficiently.
 In summary: