tanszek:oktatas:techcomm:utf-8_encoding
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revision | |||
| tanszek:oktatas:techcomm:utf-8_encoding [2024/11/19 10:58] – [Structure of UTF-8] knehez | tanszek:oktatas:techcomm:utf-8_encoding [2024/11/19 10:59] (current) – [UTF-16 Encoding] knehez | ||
|---|---|---|---|
| Line 54: | Line 54: | ||
| Thus, ' | Thus, ' | ||
| - | ==== UTF-16 Encoding ==== | + | ===== UTF-16 Encoding |
| **UTF-16** is another encoding standard that uses either 2 or 4 bytes to represent characters. Unlike UTF-8, UTF-16 uses **a minimum of 2 bytes** for each character, which simplifies the encoding of characters but can be less space-efficient for texts containing many ASCII characters. | **UTF-16** is another encoding standard that uses either 2 or 4 bytes to represent characters. Unlike UTF-8, UTF-16 uses **a minimum of 2 bytes** for each character, which simplifies the encoding of characters but can be less space-efficient for texts containing many ASCII characters. | ||
| Line 67: | Line 67: | ||
| * **FF FE**: Little-endian (least significant byte first) | * **FF FE**: Little-endian (least significant byte first) | ||
| - | For example, in Windows text files or Microsoft Office documents, you may encounter this BOM at the beginning, especially when opening files in text editors like Notepad. | + | For example, |
| - | === Conclusion === | + | ==== Conclusion |
| - | UTF-8 has become the dominant encoding standard because it is backward-compatible with ASCII, space-efficient for texts that are predominantly ASCII, and can represent any character in the Unicode standard. Meanwhile, UTF-16 is commonly used in environments like Windows, | + | **UTF-8** has become the dominant encoding standard because it is backwards-compatible with ASCII, space-efficient for predominantly ASCII texts, and can represent any character in the Unicode standard. Meanwhile, UTF-16 is commonly used in environments like Windows, |
| In summary: | In summary: | ||
tanszek/oktatas/techcomm/utf-8_encoding.1732013912.txt.gz · Last modified: 2024/11/19 10:58 by knehez
