utf man page on Inferno

Man page or keyword search:  
man Server   579 pages
apropos Keyword Search (all sections)
Output format
Inferno logo
[printable version]

UTF(6)									UTF(6)

NAME
       UTF, Unicode, ASCII, rune - character set and format

DESCRIPTION
       The  Inferno  character set and representation are based on the Unicode
       Standard and on the ISO multibyte UTF-8 encoding	 (Universal  Character
       Set  Transformation  Format, 8 bits wide).  The Unicode Standard repre‐
       sents its characters in 21 bits; UTF-8 represents  such	values	in  an
       8-bit byte stream.  Throughout this manual, UTF-8 is shortened to UTF.

       Internally,  programs  store  individual	 Unicode  characters as 32-bit
       integers, of which only 21  bits	 are  currently	 used.	 Documentation
       often refers to them as `runes', following Plan 9.  However, any exter‐
       nal manifestation of textual information, in files or at the  interface
       between	programs,  uses	 the machine-independent, byte-stream encoding
       called UTF.

       UTF is designed so the 7-bit ASCII set (values hexadecimal 00  to  7F),
       appear  only  as	 themselves  in	 the encoding.	Characters with values
       above 7F appear as sequences of two or more bytes with values only from
       80 to FF.

       The  UTF	 encoding  of the Unicode Standard is backward compatible with
       ASCII: programs presented only with ASCII work on Inferno even  if  not
       written	to  deal with UTF, as do programs that deal with uninterpreted
       byte streams.  However, programs that perform  semantic	processing  on
       characters  must	 convert  from	UTF to runes in order to work properly
       with non-ASCII input.  Normally, all necessary conversions are done  by
       the  Limbo  compiler and execution envirnoment, when converting between
       array of byte and string , but sometimes more is needed, such as when a
       program receives UTF input one byte at a time; see sys-byte2char(2) for
       routines to handle such processing.

       Letting numbers be binary, a rune x is converted	 to  a	multibyte  UTF
       sequence as follows:

       01.   x in [000000.00000000.0bbbbbbb] → 0bbbbbbb
       10.   x in [000000.00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb
       11.   x in [000000.bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb, 10bbbbbb
       100.  x	in  [bbbbbb.bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb, 10bbbbbb,
       10bbbbbb

       Conversion 01 provides a one-byte sequence that spans the ASCII charac‐
       ter  set	 in  a	compatible  way.  Conversions 10, 11 and 100 represent
       higher-valued characters as sequences of two, three or four bytes  with
       the  high bit set.  Inferno does not support the 5 and 6 byte sequences
       proposed by X-Open.  When there are multiple ways to  encode  a	value,
       for example rune 0, the shortest encoding is used.

       In  the	inverse	 mapping, any sequence except those described above is
       incorrect and is converted to the rune hexadecimal FFFD.

FILES
       /lib/unicode
	      table of characters and descriptions, suitable for look(1).

SEE ALSO
       ascii(1), tcs(1), sys-byte2char(2), keyboard(6), The Unicode Standard.

									UTF(6)
[top]

List of man pages available for Inferno

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net