Utf8 man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

UTF8(5)			    BSD File Formats Manual		       UTF8(5)

NAME
     utf8 — UTF-8, a transformation format of ISO 10646

SYNOPSIS
     ENCODING "UTF-8"

DESCRIPTION
     The UTF-8 encoding represents UCS-4 characters as a sequence of octets,
     using between 1 and 6 for each character.	It is backwards compatible
     with ASCII, so 0x00-0x7f refer to the ASCII character set.	 The multibyte
     encoding of non- ASCII characters consist entirely of bytes whose high
     order bit is set.	The actual encoding is represented by the following
     table:

     [0x00000000 - 0x0000007f] [00000000.0bbbbbbb] -> 0bbbbbbb
     [0x00000080 - 0x000007ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
     [0x00000800 - 0x0000ffff] [bbbbbbbb.bbbbbbbb] ->
	     1110bbbb, 10bbbbbb, 10bbbbbb
     [0x00010000 - 0x001fffff] [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
	     11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
     [0x00200000 - 0x03ffffff] [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
	     111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
     [0x04000000 - 0x7fffffff] [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
	     1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb

     If more than a single representation of a value exists (for example,
     0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always
     used.  Longer ones are detected as an error as they pose a potential
     security risk, and destroy the 1:1 character:octet sequence mapping.

COMPATIBILITY
     The utf8 encoding supersedes the utf2(5) encoding.	 The only differences
     between the two are that utf8 handles the full 31-bit character set of
     ISO 10646 whereas utf2(5) is limited to a 16-bit character set, and that
     utf2(5) accepts redundant, non-"shortest form" representations of charac‐
     ters.

SEE ALSO
     euc(5), utf2(5)

     F. Yergeau, UTF-8, a transformation format of ISO 10646, January 1998,
     RFC 2279.

STANDARDS
     The utf8 encoding is compatible with RFC 2279.

BSD			       October 10, 2002				   BSD
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net