ref: 0d6153841d81c5a7b7c850f1ee63677ec493b227
dir: /lib/ucd/CompositionExclusions.txt/
# CompositionExclusions-15.0.0.txt # Date: 2022-05-03, 18:50:00 GMT [KW, LI] # © 2022 Unicode®, Inc. # For terms of use, see https://www.unicode.org/terms_of_use.html # # Unicode Character Database # For documentation, see https://www.unicode.org/reports/tr44/ # # This file lists the characters for the Composition Exclusion Table # defined in UAX #15, Unicode Normalization Forms. # # This file is a normative contributory data file in the # Unicode Character Database. # # For more information, see # https://www.unicode.org/reports/tr15/#Primary_Exclusion_List_Table # # For a full derivation of composition exclusions, see the derived property # Full_Composition_Exclusion in DerivedNormalizationProps.txt # # ================================================ # (1) Script Specifics # # This list of characters cannot be derived from the UnicodeData.txt file. # # Included are the following subcategories: # # - Many precomposed characters using a nukta diacritic in the Devanagari, # Bangla/Bengali, Gurmukhi, or Odia/Oriya scripts. # - Tibetan letters and subjoined letters with decompositions including # U+0FB7 TIBETAN SUBJOINED LETTER HA or U+0FB5 TIBETAN SUBJOINED LETTER SSA. # - Two two-part Tibetan vowel signs involving top and bottom pieces. # - A large collection of compatibility precomposed characters for Hebrew # involving dagesh and/or other combining marks. # # This list is unlikely to grow. # # ================================================ 0958 # DEVANAGARI LETTER QA 0959 # DEVANAGARI LETTER KHHA 095A # DEVANAGARI LETTER GHHA 095B # DEVANAGARI LETTER ZA 095C # DEVANAGARI LETTER DDDHA 095D # DEVANAGARI LETTER RHA 095E # DEVANAGARI LETTER FA 095F # DEVANAGARI LETTER YYA 09DC # BENGALI LETTER RRA 09DD # BENGALI LETTER RHA 09DF # BENGALI LETTER YYA 0A33 # GURMUKHI LETTER LLA 0A36 # GURMUKHI LETTER SHA 0A59 # GURMUKHI LETTER KHHA 0A5A # GURMUKHI LETTER GHHA 0A5B # GURMUKHI LETTER ZA 0A5E # GURMUKHI LETTER FA 0B5C # ORIYA LETTER RRA 0B5D # ORIYA LETTER RHA 0F43 # TIBETAN LETTER GHA 0F4D # TIBETAN LETTER DDHA 0F52 # TIBETAN LETTER DHA 0F57 # TIBETAN LETTER BHA 0F5C # TIBETAN LETTER DZHA 0F69 # TIBETAN LETTER KSSA 0F76 # TIBETAN VOWEL SIGN VOCALIC R 0F78 # TIBETAN VOWEL SIGN VOCALIC L 0F93 # TIBETAN SUBJOINED LETTER GHA 0F9D # TIBETAN SUBJOINED LETTER DDHA 0FA2 # TIBETAN SUBJOINED LETTER DHA 0FA7 # TIBETAN SUBJOINED LETTER BHA 0FAC # TIBETAN SUBJOINED LETTER DZHA 0FB9 # TIBETAN SUBJOINED LETTER KSSA FB1D # HEBREW LETTER YOD WITH HIRIQ FB1F # HEBREW LIGATURE YIDDISH YOD YOD PATAH FB2A # HEBREW LETTER SHIN WITH SHIN DOT FB2B # HEBREW LETTER SHIN WITH SIN DOT FB2C # HEBREW LETTER SHIN WITH DAGESH AND SHIN DOT FB2D # HEBREW LETTER SHIN WITH DAGESH AND SIN DOT FB2E # HEBREW LETTER ALEF WITH PATAH FB2F # HEBREW LETTER ALEF WITH QAMATS FB30 # HEBREW LETTER ALEF WITH MAPIQ FB31 # HEBREW LETTER BET WITH DAGESH FB32 # HEBREW LETTER GIMEL WITH DAGESH FB33 # HEBREW LETTER DALET WITH DAGESH FB34 # HEBREW LETTER HE WITH MAPIQ FB35 # HEBREW LETTER VAV WITH DAGESH FB36 # HEBREW LETTER ZAYIN WITH DAGESH FB38 # HEBREW LETTER TET WITH DAGESH FB39 # HEBREW LETTER YOD WITH DAGESH FB3A # HEBREW LETTER FINAL KAF WITH DAGESH FB3B # HEBREW LETTER KAF WITH DAGESH FB3C # HEBREW LETTER LAMED WITH DAGESH FB3E # HEBREW LETTER MEM WITH DAGESH FB40 # HEBREW LETTER NUN WITH DAGESH FB41 # HEBREW LETTER SAMEKH WITH DAGESH FB43 # HEBREW LETTER FINAL PE WITH DAGESH FB44 # HEBREW LETTER PE WITH DAGESH FB46 # HEBREW LETTER TSADI WITH DAGESH FB47 # HEBREW LETTER QOF WITH DAGESH FB48 # HEBREW LETTER RESH WITH DAGESH FB49 # HEBREW LETTER SHIN WITH DAGESH FB4A # HEBREW LETTER TAV WITH DAGESH FB4B # HEBREW LETTER VAV WITH HOLAM FB4C # HEBREW LETTER BET WITH RAFE FB4D # HEBREW LETTER KAF WITH RAFE FB4E # HEBREW LETTER PE WITH RAFE # Total code points: 67 # ================================================ # (2) Post Composition Version precomposed characters # # These characters cannot be derived solely from the UnicodeData.txt file # in this version of Unicode. # # Note that characters added to the standard after the # Composition Version and which have canonical decomposition mappings # are not automatically added to this list of Post Composition # Version precomposed characters. # ================================================ 2ADC # FORKING 1D15E # MUSICAL SYMBOL HALF NOTE 1D15F # MUSICAL SYMBOL QUARTER NOTE 1D160 # MUSICAL SYMBOL EIGHTH NOTE 1D161 # MUSICAL SYMBOL SIXTEENTH NOTE 1D162 # MUSICAL SYMBOL THIRTY-SECOND NOTE 1D163 # MUSICAL SYMBOL SIXTY-FOURTH NOTE 1D164 # MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE 1D1BB # MUSICAL SYMBOL MINIMA 1D1BC # MUSICAL SYMBOL MINIMA BLACK 1D1BD # MUSICAL SYMBOL SEMIMINIMA WHITE 1D1BE # MUSICAL SYMBOL SEMIMINIMA BLACK 1D1BF # MUSICAL SYMBOL FUSA WHITE 1D1C0 # MUSICAL SYMBOL FUSA BLACK # Total code points: 14 # ================================================ # (3) Singleton Decompositions # # These characters can be derived from the UnicodeData.txt file # by including all canonically decomposable characters whose # canonical decomposition consists of a single character. # # These characters are simply quoted here for reference. # See also Full_Composition_Exclusion in DerivedNormalizationProps.txt # ================================================ # 0340..0341 [2] COMBINING GRAVE TONE MARK..COMBINING ACUTE TONE MARK # 0343 COMBINING GREEK KORONIS # 0374 GREEK NUMERAL SIGN # 037E GREEK QUESTION MARK # 0387 GREEK ANO TELEIA # 1F71 GREEK SMALL LETTER ALPHA WITH OXIA # 1F73 GREEK SMALL LETTER EPSILON WITH OXIA # 1F75 GREEK SMALL LETTER ETA WITH OXIA # 1F77 GREEK SMALL LETTER IOTA WITH OXIA # 1F79 GREEK SMALL LETTER OMICRON WITH OXIA # 1F7B GREEK SMALL LETTER UPSILON WITH OXIA # 1F7D GREEK SMALL LETTER OMEGA WITH OXIA # 1FBB GREEK CAPITAL LETTER ALPHA WITH OXIA # 1FBE GREEK PROSGEGRAMMENI # 1FC9 GREEK CAPITAL LETTER EPSILON WITH OXIA # 1FCB GREEK CAPITAL LETTER ETA WITH OXIA # 1FD3 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA # 1FDB GREEK CAPITAL LETTER IOTA WITH OXIA # 1FE3 GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA # 1FEB GREEK CAPITAL LETTER UPSILON WITH OXIA # 1FEE..1FEF [2] GREEK DIALYTIKA AND OXIA..GREEK VARIA # 1FF9 GREEK CAPITAL LETTER OMICRON WITH OXIA # 1FFB GREEK CAPITAL LETTER OMEGA WITH OXIA # 1FFD GREEK OXIA # 2000..2001 [2] EN QUAD..EM QUAD # 2126 OHM SIGN # 212A..212B [2] KELVIN SIGN..ANGSTROM SIGN # 2329 LEFT-POINTING ANGLE BRACKET # 232A RIGHT-POINTING ANGLE BRACKET # F900..FA0D [270] CJK COMPATIBILITY IDEOGRAPH-F900..CJK COMPATIBILITY IDEOGRAPH-FA0D # FA10 CJK COMPATIBILITY IDEOGRAPH-FA10 # FA12 CJK COMPATIBILITY IDEOGRAPH-FA12 # FA15..FA1E [10] CJK COMPATIBILITY IDEOGRAPH-FA15..CJK COMPATIBILITY IDEOGRAPH-FA1E # FA20 CJK COMPATIBILITY IDEOGRAPH-FA20 # FA22 CJK COMPATIBILITY IDEOGRAPH-FA22 # FA25..FA26 [2] CJK COMPATIBILITY IDEOGRAPH-FA25..CJK COMPATIBILITY IDEOGRAPH-FA26 # FA2A..FA6D [68] CJK COMPATIBILITY IDEOGRAPH-FA2A..CJK COMPATIBILITY IDEOGRAPH-FA6D # FA70..FAD9 [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILITY IDEOGRAPH-FAD9 # 2F800..2FA1D [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D # Total code points: 1035 # ================================================ # (4) Non-Starter Decompositions # # These characters can be derived from the UnicodeData.txt file # by including each expanding canonical decomposition # (i.e., those which canonically decompose to a sequence # of characters instead of a single character), such that: # # A. The character is not a Starter. # # OR (inclusive) # # B. The character's canonical decomposition begins # with a character that is not a Starter. # # Note that a "Starter" is any character with a zero combining class. # # These characters are simply quoted here for reference. # See also Full_Composition_Exclusion in DerivedNormalizationProps.txt # ================================================ # 0344 COMBINING GREEK DIALYTIKA TONOS # 0F73 TIBETAN VOWEL SIGN II # 0F75 TIBETAN VOWEL SIGN UU # 0F81 TIBETAN VOWEL SIGN REVERSED II # Total code points: 4 # EOF