| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | Last update: 2005-01-17, version 1.4 | 
|  | 2 |  | 
|  | 3 | This file is maintained by H. Peter Anvin <unicode@lanana.org> as part | 
|  | 4 | of the Linux Assigned Names And Numbers Authority (LANANA) project. | 
|  | 5 | The current version can be found at: | 
|  | 6 |  | 
|  | 7 | http://www.lanana.org/docs/unicode/unicode.txt | 
|  | 8 |  | 
|  | 9 | ------------------------ | 
|  | 10 |  | 
|  | 11 | The Linux kernel code has been rewritten to use Unicode to map | 
|  | 12 | characters to fonts.  By downloading a single Unicode-to-font table, | 
|  | 13 | both the eight-bit character sets and UTF-8 mode are changed to use | 
|  | 14 | the font as indicated. | 
|  | 15 |  | 
|  | 16 | This changes the semantics of the eight-bit character tables subtly. | 
|  | 17 | The four character tables are now: | 
|  | 18 |  | 
|  | 19 | Map symbol	Map name			Escape code (G0) | 
|  | 20 |  | 
|  | 21 | LAT1_MAP	Latin-1 (ISO 8859-1)		ESC ( B | 
|  | 22 | GRAF_MAP	DEC VT100 pseudographics	ESC ( 0 | 
|  | 23 | IBMPC_MAP	IBM code page 437		ESC ( U | 
|  | 24 | USER_MAP	User defined			ESC ( K | 
|  | 25 |  | 
|  | 26 | In particular, ESC ( U is no longer "straight to font", since the font | 
|  | 27 | might be completely different than the IBM character set.  This | 
|  | 28 | permits for example the use of block graphics even with a Latin-1 font | 
|  | 29 | loaded. | 
|  | 30 |  | 
|  | 31 | Note that although these codes are similar to ISO 2022, neither the | 
|  | 32 | codes nor their uses match ISO 2022; Linux has two 8-bit codes (G0 and | 
|  | 33 | G1), whereas ISO 2022 has four 7-bit codes (G0-G3). | 
|  | 34 |  | 
|  | 35 | In accordance with the Unicode standard/ISO 10646 the range U+F000 to | 
|  | 36 | U+F8FF has been reserved for OS-wide allocation (the Unicode Standard | 
|  | 37 | refers to this as a "Corporate Zone", since this is inaccurate for | 
|  | 38 | Linux we call it the "Linux Zone").  U+F000 was picked as the starting | 
|  | 39 | point since it lets the direct-mapping area start on a large power of | 
|  | 40 | two (in case 1024- or 2048-character fonts ever become necessary). | 
|  | 41 | This leaves U+E000 to U+EFFF as End User Zone. | 
|  | 42 |  | 
|  | 43 | [v1.2]: The Unicodes range from U+F000 and up to U+F7FF have been | 
|  | 44 | hard-coded to map directly to the loaded font, bypassing the | 
|  | 45 | translation table.  The user-defined map now defaults to U+F000 to | 
|  | 46 | U+F0FF, emulating the previous behaviour.  In practice, this range | 
|  | 47 | might be shorter; for example, vgacon can only handle 256-character | 
|  | 48 | (U+F000..U+F0FF) or 512-character (U+F000..U+F1FF) fonts. | 
|  | 49 |  | 
|  | 50 |  | 
|  | 51 | Actual characters assigned in the Linux Zone | 
|  | 52 | -------------------------------------------- | 
|  | 53 |  | 
|  | 54 | In addition, the following characters not present in Unicode 1.1.4 | 
|  | 55 | have been defined; these are used by the DEC VT graphics map.  [v1.2] | 
|  | 56 | THIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW. | 
|  | 57 |  | 
|  | 58 | U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1 | 
|  | 59 | U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3 | 
|  | 60 | U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7 | 
|  | 61 | U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9 | 
|  | 62 |  | 
|  | 63 | The DEC VT220 uses a 6x10 character matrix, and these characters form | 
|  | 64 | a smooth progression in the DEC VT graphics character set.  I have | 
|  | 65 | omitted the scan 5 line, since it is also used as a block-graphics | 
|  | 66 | character, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL. | 
|  | 67 |  | 
|  | 68 | [v1.3]: These characters have been officially added to Unicode 3.2.0; | 
|  | 69 | they are added at U+23BA, U+23BB, U+23BC, U+23BD.  Linux now uses the | 
|  | 70 | new values. | 
|  | 71 |  | 
|  | 72 | [v1.2]: The following characters have been added to represent common | 
|  | 73 | keyboard symbols that are unlikely to ever be added to Unicode proper | 
|  | 74 | since they are horribly vendor-specific.  This, of course, is an | 
|  | 75 | excellent example of horrible design. | 
|  | 76 |  | 
|  | 77 | U+F810 KEYBOARD SYMBOL FLYING FLAG | 
|  | 78 | U+F811 KEYBOARD SYMBOL PULLDOWN MENU | 
|  | 79 | U+F812 KEYBOARD SYMBOL OPEN APPLE | 
|  | 80 | U+F813 KEYBOARD SYMBOL SOLID APPLE | 
|  | 81 |  | 
|  | 82 | Klingon language support | 
|  | 83 | ------------------------ | 
|  | 84 |  | 
|  | 85 | In 1996, Linux was the first operating system in the world to add | 
|  | 86 | support for the artificial language Klingon, created by Marc Okrand | 
|  | 87 | for the "Star Trek" television series.	This encoding was later | 
|  | 88 | adopted by the ConScript Unicode Registry and proposed (but ultimately | 
|  | 89 | rejected) for inclusion in Unicode Plane 1.  Thus, it remains as a | 
|  | 90 | Linux/CSUR private assignment in the Linux Zone. | 
|  | 91 |  | 
|  | 92 | This encoding has been endorsed by the Klingon Language Institute. | 
|  | 93 | For more information, contact them at: | 
|  | 94 |  | 
|  | 95 | http://www.kli.org/ | 
|  | 96 |  | 
|  | 97 | Since the characters in the beginning of the Linux CZ have been more | 
|  | 98 | of the dingbats/symbols/forms type and this is a language, I have | 
|  | 99 | located it at the end, on a 16-cell boundary in keeping with standard | 
|  | 100 | Unicode practice. | 
|  | 101 |  | 
|  | 102 | NOTE: This range is now officially managed by the ConScript Unicode | 
|  | 103 | Registry.  The normative reference is at: | 
|  | 104 |  | 
|  | 105 | http://www.evertype.com/standards/csur/klingon.html | 
|  | 106 |  | 
|  | 107 | Klingon has an alphabet of 26 characters, a positional numeric writing | 
|  | 108 | system with 10 digits, and is written left-to-right, top-to-bottom. | 
|  | 109 |  | 
|  | 110 | Several glyph forms for the Klingon alphabet have been proposed. | 
|  | 111 | However, since the set of symbols appear to be consistent throughout, | 
|  | 112 | with only the actual shapes being different, in keeping with standard | 
|  | 113 | Unicode practice these differences are considered font variants. | 
|  | 114 |  | 
|  | 115 | U+F8D0	KLINGON LETTER A | 
|  | 116 | U+F8D1	KLINGON LETTER B | 
|  | 117 | U+F8D2	KLINGON LETTER CH | 
|  | 118 | U+F8D3	KLINGON LETTER D | 
|  | 119 | U+F8D4	KLINGON LETTER E | 
|  | 120 | U+F8D5	KLINGON LETTER GH | 
|  | 121 | U+F8D6	KLINGON LETTER H | 
|  | 122 | U+F8D7	KLINGON LETTER I | 
|  | 123 | U+F8D8	KLINGON LETTER J | 
|  | 124 | U+F8D9	KLINGON LETTER L | 
|  | 125 | U+F8DA	KLINGON LETTER M | 
|  | 126 | U+F8DB	KLINGON LETTER N | 
|  | 127 | U+F8DC	KLINGON LETTER NG | 
|  | 128 | U+F8DD	KLINGON LETTER O | 
|  | 129 | U+F8DE	KLINGON LETTER P | 
|  | 130 | U+F8DF	KLINGON LETTER Q | 
|  | 131 | - Written <q> in standard Okrand Latin transliteration | 
|  | 132 | U+F8E0	KLINGON LETTER QH | 
|  | 133 | - Written <Q> in standard Okrand Latin transliteration | 
|  | 134 | U+F8E1	KLINGON LETTER R | 
|  | 135 | U+F8E2	KLINGON LETTER S | 
|  | 136 | U+F8E3	KLINGON LETTER T | 
|  | 137 | U+F8E4	KLINGON LETTER TLH | 
|  | 138 | U+F8E5	KLINGON LETTER U | 
|  | 139 | U+F8E6	KLINGON LETTER V | 
|  | 140 | U+F8E7	KLINGON LETTER W | 
|  | 141 | U+F8E8	KLINGON LETTER Y | 
|  | 142 | U+F8E9	KLINGON LETTER GLOTTAL STOP | 
|  | 143 |  | 
|  | 144 | U+F8F0	KLINGON DIGIT ZERO | 
|  | 145 | U+F8F1	KLINGON DIGIT ONE | 
|  | 146 | U+F8F2	KLINGON DIGIT TWO | 
|  | 147 | U+F8F3	KLINGON DIGIT THREE | 
|  | 148 | U+F8F4	KLINGON DIGIT FOUR | 
|  | 149 | U+F8F5	KLINGON DIGIT FIVE | 
|  | 150 | U+F8F6	KLINGON DIGIT SIX | 
|  | 151 | U+F8F7	KLINGON DIGIT SEVEN | 
|  | 152 | U+F8F8	KLINGON DIGIT EIGHT | 
|  | 153 | U+F8F9	KLINGON DIGIT NINE | 
|  | 154 |  | 
|  | 155 | U+F8FD	KLINGON COMMA | 
|  | 156 | U+F8FE	KLINGON FULL STOP | 
|  | 157 | U+F8FF	KLINGON SYMBOL FOR EMPIRE | 
|  | 158 |  | 
|  | 159 | Other Fictional and Artificial Scripts | 
|  | 160 | -------------------------------------- | 
|  | 161 |  | 
|  | 162 | Since the assignment of the Klingon Linux Unicode block, a registry of | 
|  | 163 | fictional and artificial scripts has been established by John Cowan | 
|  | 164 | <jcowan@reutershealth.com> and Michael Everson <everson@evertype.com>. | 
|  | 165 | The ConScript Unicode Registry is accessible at: | 
|  | 166 |  | 
|  | 167 | http://www.evertype.com/standards/csur/ | 
|  | 168 |  | 
|  | 169 | The ranges used fall at the low end of the End User Zone and can hence | 
|  | 170 | not be normatively assigned, but it is recommended that people who | 
|  | 171 | wish to encode fictional scripts use these codes, in the interest of | 
|  | 172 | interoperability.  For Klingon, CSUR has adopted the Linux encoding. | 
|  | 173 | The CSUR people are driving adding Tengwar and Cirth into Unicode | 
|  | 174 | Plane 1; the addition of Klingon to Unicode Plane 1 has been rejected | 
|  | 175 | and so the above encoding remains official. |