eucTW(5)eucTW(5)NAMEeucTW - A character encoding system (codeset) for Traditional Chinese
DESCRIPTION
The Taiwanese EUC (Extended UNIX Code), or eucTW, codeset consists of
the following character sets: ASCII CNS 11643 (Plane 1 to Plane 16)
Taiwanese EUC uses a combination of single-byte data and 2-byte data to
represent ASCII characters, symbols, and ideographic characters.
Because too many character planes were included, Taiwanese EUC uses
different leading codes to designate different character planes.
ASCII characters are represented in the form of single byte 7-bit data
in Taiwanese EUC; that is, the most significant bit (MSB) of the byte
that represents an ASCII character is always set off. For more informa‐
tion, refer to ascii(5).
Although the standard Taiwanese EUC codeset includes all characters
defined by the CNS 11643-1992 standard, the operating system's eucTW
implementation currently supports the following: Characters defined in
the first and second planes of CNS 11643 The EDPC Recommended Character
Set (refer to dechanyu(5) for more information) CNS 11643-1986 and
DTSCS characters that have been remapped into the third and fourth
character planes by the CNS 11643-1992 standard
Characters that were added to CNS 11643-1986 by the CNS 11643-1992
standard are not supported.
The characters that are defined in plane 1 and plane 2 of CNS
11643-1992 and that are the same as those defined in CNS 11643-1986 are
as follows:
────────────────────────────────────────────────────────────────────
Character Plane Character Type Number of Characters
────────────────────────────────────────────────────────────────────
1 Special characters 651
Control characters 33
Frequently-used characters 5401
2 Less frequently-used char‐ 7650
acters
────────────────────────────────────────────────────────────────────
The characters defined in plane 3 and plane 4 of CNS 11643-1992 are as
follows:
──────────────────────────────────────────────────────────────────────────
Character Plane Character Type Number of
Characters
──────────────────────────────────────────────────────────────────────────
3 Rarely-used characters (EDPC Part I) 6148
4 Used for residency system, ISO 2nd 7298
edition DIS 10646 Han characters, 171
EDPC Part II Characters
──────────────────────────────────────────────────────────────────────────
The characters that have been remapped into the third and fourth char‐
acter planes of CNS 11643-1992 as specified by the EDPC are as follows:
─────────────────────────────────────────────────────────
EDPC Characters Character Plane Number of Characters
─────────────────────────────────────────────────────────
Part I Plane 3 6148
Part II Plane 4 171
─────────────────────────────────────────────────────────
Taiwanese EUC Encoding
Except for characters in the first plane of CNS 11643-1986, Taiwanese
EUC makes use of a leading code (the 8-bit Single-Shift 2 control char‐
acter (SS2) and an additional byte) to designate characters to a char‐
acter plane.
The position of a character on a plane is specified by two bytes. The
first byte determines the character's row number and the second byte
determines the character's column number. The MSB of both bytes is set
on.
The following table shows the encoding of Taiwanese EUC characters:
───────────────────────────────────────────────────────
CNS 11643-1986 Code Plane Leading Code Code Range
───────────────────────────────────────────────────────
1 [nil] A1A1 - FEFE
2 SS2 A2 A1A1 - FEFE
3 SS2 A3 A1A1 - FEFE
4 SS2 A4 A1A1 - FEFE
5 SS2 A5 A1A1 - FEFE
6 SS2 A6 A1A1 - FEFE
7 SS2 A7 A1A1 - FEFE
8 SS2 A8 A1A1 - FEFE
9 SS2 A9 A1A1 - FEFE
10 SS2 AA A1A1 - FEFE
11 SS2 AB A1A1 - FEFE
12 SS2 AC A1A1 - FEFE
13 SS2 AD A1A1 - FEFE
14 SS2 AE A1A1 - FEFE
15 SS2 AF A1A1 - FEFE
16 SS2 B0 A1A1 - FEFE
───────────────────────────────────────────────────────
Codeset Conversion
The following codeset converter pairs are available for converting Tra‐
ditional Chinese characters between eucTW and other encoding formats.
Refer to iconv_intro(5) for an introduction to codeset conversion. For
more information about the other codeset for which eucTW is the input
or output, see the reference page specified in the list item.
big5_eucTW, eucTW_big5
Converting from and to the Big-5 codeset: big5(5).
Note that Big-5 encoding is equivalent to the Microsoft code-
page format used on PCs for Traditional Chinese. You can there‐
fore use this set of converters to convert Traditional Chinese
text between the eucTW and PC code-page formats. For information
about how the operating system supports PC code pages, see
code_page(5). dechanyu_eucTW, eucTW_dechanyu
Converting from and to the DEC Hanyu codeset: dechanyu(5).
dechanzi_eucTW, eucTW_dechanzi
Converting from and to the DEC Hanzi codeset: dechanzi(5).
sbig5_eucTW, eucTW_sbig5
Converting from and to the Shift Big-5 codeset: sbig5(5). tele‐
code_eucTW, eucTW_telecode
Converting from and to the Telecode codeset: telecode(5).
UTF-16_eucTW, eucTW_UTF-16
Converting from and to UTF-16 format: Unicode(5). UCS-4_eucTW,
eucTW_UCS-4
Converting from and to UCS-4 format: Unicode(5). UTF-8_eucTW,
eucTW_UTF-8
Converting from and to UTF--8 format: Unicode(5).
Fonts for Taiwanese EUC
For both display devices and printers, the operating system supports
Taiwanese EUC through internal conversion to DEC Hanyu code and use of
DEC Hanyu fonts (see dechanyu(5)).
For general information on printing non-English text, refer to
i18n_printing(5).
SEE ALSO
Commands: locale(1)
Others: ascii(5), big5(5), Chinese(5), code_page(5), dechanzi(5),
GBK(5), iconv_intro(5), i18n_intro(5), i18n_printing(5), l10n_intro(5),
sbig5(5), telecode(5), Unicode(5)eucTW(5)