From Surf Wiki (app.surf) — the open knowledge base
UTF-1
Obsolete multibyte encoding for Unicode
Obsolete multibyte encoding for Unicode
| Field | Value |
|---|---|
| name | UTF-1 |
| mime | ISO-10646-UTF-1 |
| lang | International |
| status | Obscure, of mainly historical interest. |
| classification | Unicode Transformation Format, extended ASCII, variable-width encoding |
| encodes | ISO/IEC 10646 (Unicode) |
| extends | US-ASCII |
| next | UTF-8 |
UTF-1 is an obsolete method of transforming ISO/IEC 10646/Unicode into a stream of bytes. Its design does not provide self-synchronization, which makes searching for substrings and error recovery difficult. It reuses the ASCII printing characters for multi-byte encodings, making it unsuited for some uses (for instance Unix filenames cannot contain the byte value used for forward slash). UTF-1 is also slow to encode or decode due to its use of division and multiplication by a number which is not a power of 2. Due to these issues, it did not gain acceptance and was quickly replaced by UTF-8.
Design
Similar to UTF-8, UTF-1 is a variable-width encoding that is backwards-compatible with ASCII. Every Unicode code point is represented by either a single byte, or a sequence of two, three, or five bytes. All ASCII code points are a single byte (the code points through are also single bytes).
UTF-1 does not use the C0 and C1 control codes or the space character in multi-byte encodings: a byte in the range 0–0x20 or 0x7F–0x9F always stands for the corresponding code point. This design with 66 protected characters tried to be ISO/IEC 2022 compatible.
UTF-1 uses "modulo 190" arithmetic (256 − 66 = 190). For comparison, UTF-8 protects all 128 ASCII characters and needs one bit for this, and a second bit to make it self-synchronizing, resulting in "modulo 64" arithmetic (; ). BOCU-1 protects only the minimal set required for MIME-compatibility (0x00, 0x07–0x0F, 0x1A–0x1B, and 0x20), resulting in "modulo 243" arithmetic (256 − 13 = 243).
| First code point | Last code point | Byte 1 | Byte 2 | Byte 3 | Byte 4 | Byte 5 |
|---|---|---|---|---|---|---|
| U+0000 | U+009F | 00–9F | ||||
| U+00A0 | U+00FF | A0 | A0–FF | |||
| U+0100 | U+4015 | A1–F5 | 21–7E, A0–FF | |||
| U+4016 | U+38E2D | F6–FB | 21–7E, A0–FF | 21–7E, A0–FF | ||
| U+38E2E | U+7FFFFFFF | FC–FF | 21–7E, A0–FF | 21–7E, A0–FF | 21–7E, A0–FF | 21–7E, A0–FF |
| code point | UTF-8 | UTF-1 |
|---|---|---|
| U+007F | 7F | 7F |
| U+0080 | C2 80 | 80 |
| U+009F | C2 9F | 9F |
| U+00A0 | C2 A0 | A0 A0 |
| U+00BF | C2 BF | A0 BF |
| U+00C0 | C3 80 | A0 C0 |
| U+00FF | C3 BF | A0 FF |
| U+0100 | C4 80 | A1 21 |
| U+015D | C5 9D | A1 7E |
| U+015E | C5 9E | A1 A0 |
| U+01BD | C6 BD | A1 FF |
| U+01BE | C6 BE | A2 21 |
| U+07FF | DF BF | AA 72 |
| U+0800 | E0 A0 80 | AA 73 |
| U+0FFF | E0 BF BF | B5 48 |
| U+1000 | E1 80 80 | B5 49 |
| U+4015 | E4 80 95 | F5 FF |
| U+4016 | E4 80 96 | F6 21 21 |
| U+D7FF | ED 9F BF | F7 2F C3 |
| U+E000 | EE 80 80 | F7 3A 79 |
| U+F8FF | EF A3 BF | F7 5C 3C |
| U+FDD0 | EF B7 90 | F7 62 BA |
| U+FDEF | EF B7 AF | F7 62 D9 |
| U+FEFF | EF BB BF | F7 64 4C |
| U+FFFD | EF BF BD | F7 65 AD |
| U+FFFE | EF BF BE | F7 65 AE |
| U+FFFF | EF BF BF | F7 65 AF |
| U+10000 | F0 90 80 80 | F7 65 B0 |
| U+38E2D | F0 B8 B8 AD | FB FF FF |
| U+38E2E | F0 B8 B8 AE | FC 21 21 21 21 |
| U+FFFFF | F3 BF BF BF | FC 21 37 B2 7A |
| U+100000 | F4 80 80 80 | FC 21 37 B2 7B |
| U+10FFFF | F4 8F BF BF | FC 21 39 6E 6C |
| U+7FFFFFFF | FD BF BF BF BF BF | FD BD 2B B9 40 |
Although modern Unicode ends at U+10FFFF, both UTF-1 and UTF-8 were designed to encode the complete 31 bits of the original Universal Character Set (UCS-4), and the last entry in this table shows this original final code point.
References
This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page.
Ask Mako anything about UTF-1 — get instant answers, deeper analysis, and related topics.
Research with MakoFree with your Surf account
Create a free account to save articles, ask Mako questions, and organize your research.
Sign up freeThis content may have been generated or modified by AI. CloudSurf Software LLC is not responsible for the accuracy, completeness, or reliability of AI-generated content. Always verify important information from primary sources.
Report