Skip to content
Surf Wiki
Save to docs
general/unicode-blocks

From Surf Wiki (app.surf) — the open knowledge base

CJK Compatibility Ideographs


FieldValue
rangestartF900
rangeendFAFF
script1Han
1_0_1302
3_259
4_1106
5_23
6_12
sourcesKS X 1001
Big5
IBM 32
JIS X 0213
ARIB STD-B24
KPS 10721-2000
note
Range was initially part of the Private Use Area in Unicode 1.0.0, and removed from it in Unicode 1.0.1.

Big5 IBM 32 JIS X 0213 ARIB STD-B24 KPS 10721-2000 Range was initially part of the Private Use Area in Unicode 1.0.0, and removed from it in Unicode 1.0.1.

CJK Compatibility Ideographs is a Unicode block created to contain mostly Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. However, it also contains 12 unified ideographs sourced from Japanese character sets from IBM.

The block has dozens of ideographic variation sequences registered in the Unicode Ideographic Variation Database (IVD). These sequences specify the desired glyph variant for a given Unicode character.

Character sources

Sources for the original collection of CJK Compatibility Ideographs include:

  • South Korean KS X 1001 (U+F900–U+FA0B, 268 characters; see that page for the explanation)
  • Taiwanese Big5 (U+FA0C–U+FA0D, 2 characters)
  • "IBM 32": 32 Japanese characters from IBM (U+FA0E–U+FA2D; see below)

In ensuing versions of the standard, more characters have been added to the block from:

  • South Korean KS X 1001 (U+FA2E–U+FA2F, 2 characters)
  • Japanese JIS X 0213 (U+FA30–U+FA6A, 59 characters)
  • Japanese ARIB STD-B24 (U+FA6B–U+FA6D, 3 characters)
  • North Korean KPS 10721-2000 (U+FA70–U+FAD9, 106 characters)

The "IBM 32" characters

IBM Japanese double-byte EBCDIC includes several kanji which do not exist in, or do not round-trip from, JIS X 0208. These were included as gaiji in extensions to Shift JIS and EUC-JP from IBM (e.g. code page 942), NEC, the Open Software Foundation, and Microsoft (e.g. Windows code page 932). However, they were not used as a source for the original Unified Repertoire and Ordering (URO). Instead, 32 of the IBM extension kanji, those which had not been included in the URO from other sources, were included in the CJK Compatibility Ideographs block in the range U+FA0E–U+FA2D.

Of these 32 characters:

  • 19 are unifiable with characters in the URO, and are therefore compatibility ideographs in the strict sense.
  • 12 are kokuji characters which are actually unified ideographs (with the property, and which do not change upon normalisation). In spite of their inclusion in the CJK Compatibility Ideographs block and their algorithmically generated character names beginning with "", they are not duplicates of characters in the original CJK Unified Ideographs block in any respect; 11 of these 12 are completely non-duplicate, while was later unintentionally duplicated in CJK Unified Ideographs Extension B as . They are placed there because they do not have a URO encoding, yet IBM 32 is one of the encodings where duplicate encodings are of concern. All of them are rarely used or are variants of common kanji. They are as follows:
  • Uniquely, () is intended to be encoded as the kyūjitai form of a kokuji which received a separate encoding for a variant that is straightforwardly the (extended) shinjitai form . The URO only encoded the shinjitai form, and uses its stroke count to place it in this position. It is furthermore one variant of the many variants of the jinmeiyō kanji (i.e. Kummerowia). U+FA20 was assigned a normalisation to U+8612, even though the 龜 and 亀 components, while both forms of radical 213, are not usually considered unifiable.

Block

History

The following Unicode-related documents record the purpose and process of defining specific characters in the CJK Compatibility Ideographs block:

VersionCountL2 IDWG2 IDIRG IDDocument
1.0.1U+F900..FA2D302N782
N2667
N3525
N3590
N4111
N4103
3.2U+FA30..FA6A59N1935
N2003
N2095
N2142N710
N2103
N2197
N2221
N2221R
txt)
N2273
N2295
4.1U+FA70..FAD9106N2253
N2375
N2403
N2478
N2493
N2541
N2540
N2566
N2572
N2573
N2569
N2569R
N2776N1062
N2924R
N3899
N4111
N4103
5.2U+FA6B..FA6D3doc)
appendix)
doc)
6.1U+FA2E..FA2F2N3747
doc)

References

References

  1. "Unicode character database". The Unicode Standard.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard.
  3. (1991). "The Unicode Standard, Version 1.0, Volume 1". [[Unicode Consortium]].
  4. "Ideographic Variation Database". Unicode Consortium.
  5. "UTS #37, Unicode Ideographic Variation Database". Unicode Consortium.
  6. "PropList.txt". Unicode Consortium.
  7. (2021-06-14). "Known Anomalies in Unicode Character Names". [[Unicode Consortium]].
  8. Ideographic Research Group. (2024-11-19). "UCV & NUCV Lists".
  9. Proposed code points and characters names may differ from final code points and names
Info: Wikipedia Source

This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page.

Want to explore this topic further?

Ask Mako anything about CJK Compatibility Ideographs — get instant answers, deeper analysis, and related topics.

Research with Mako

Free with your Surf account

Content sourced from Wikipedia, available under CC BY-SA 4.0.

This content may have been generated or modified by AI. CloudSurf Software LLC is not responsible for the accuracy, completeness, or reliability of AI-generated content. Always verify important information from primary sources.

Report