iconv_intro man page on DigitalUNIX

Man page or keyword search:  
man Server   12896 pages
apropos Keyword Search (all sections)
Output format
DigitalUNIX logo
[printable version]

iconv_intro(5)							iconv_intro(5)

       iconv_intro, iconv - Introduction to codeset conversion

       Conversion of character encoding from one coded character set (codeset)
       to another is an operation that often has to be performed by the	 oper‐
       ating  system  and some applications. For example, the man command sup‐
       ports codeset conversion to allow one set of reference  page  files  to
       meet  the needs of locales that support the same language and territory
       but different codesets (see man(1)).

       The following commands and library interfaces give users	 and  applica‐
       tion  developers	 direct	 access	 to codeset conversion operations: The
       iconv command converts characters in a data file from  one  codeset  to
       another	(see  iconv(1)).  The iconv(), iconv_open(), and iconv_close()
       functions convert a string of characters from one  codeset  to  another
       (see  iconv(3),	iconv_open(3), and iconv_close(3)).  The iconv command
       uses these interfaces to convert characters.

       There are two types of codeset converters: algorithmic and table. Algo‐
       rithmic	converters,  which reside in the /usr/lib/nls/loc/iconv direc‐
       tory, are shared libraries with a predefined entry point for invocation
       by  functions  in  the libiconv.so library.  Algorithmic converters are
       needed for the conversion of multibyte codesets, in part because	 table
       converters  cannot  handle  the required number of character values and
       also because some of  these  codesets  require  complex	handling  (see
       NOTES).	Algorithmic  converters	 are supplied as part of the operating
       system product; the internal interfaces that they require are not  pub‐
       lished for external use.

       Table  converters,  which  reside  in  the  /usr/lib/nls/loc/iconvTable
       directory, can be created by using the genxlt command (see  genxlt(1)).
       These converters can support single-byte codesets and up to 256 encoded
       character values.

       Names of codeset converters are in the following form:


       For example, the following converter converts  values  from  Super  DEC
       Kanji to Japanese Extended UNIX Code:


       The  codeset  converters produce an invalid character error in response
       to characters that cannot be converted from the source codeset  to  the
       destination  codeset. This error is always produced for character codes
       that are invalid in the source codeset. However, if the	error  results
       from  characters that are valid in the source codeset but have no coun‐
       terparts in the destination codeset, you can  eliminate	the  error  by
       defining	 the ICONV_DEFSTR environment variable to specify a substitute
       output string. See the ENVIRONMENT VARIABLES section for more  informa‐
       tion about using the ICONV_DEFSTR variable.

       It  is possible to convert data directly between two codesets or by way
       of an intermediate codeset, such as UTF-16, UCS-4, or UTF-8.  For  con‐
       version	of Chinese characters, be aware that the results of converting
       a Traditional Chinese codeset directly to a Simplified Chinese  codeset
       may  not	 be  the same as the results of converting Traditional Chinese
       first to UTF-16, UCS-4, or UTF-8 and then to Simplified Chinese.

       Some codeset converters require more complex  algorithms	 than  can  be
       provided	 through  tables.  The following environment variables provide
       control over conversion behavior for different kinds  of	 codeset  con‐

       Controls the behavior for the many-to-one value conversions for conver‐
       sion of Traditional Chinese (except for Traditional Chinese encoded  in
       Telecode)  to Simplified Chinese.  The valid settings for this environ‐
       ment variable are as follows:  Specifies	 that  the  preferred  mapping
       value  (the first one in the one-to-many mapping list) is always taken.
       The batch setting is the ICONV_ACTION default.  Specifies that all  the
       possible	 values are printed to the standard output, enclosed by braces
       ({ }), so that the user can later manually edit the converted file  and
       select  the  one	 to  use.   Specifies that all the possible values are
       printed to the standard output  except  for  punctuation	 symbols,  for
       which only the preferred mapping value is printed. As is true for conv-
       all, the conv_all_nosym setting prints value choices enclosed by braces
       so that the converted file can later be edited.	Sets byte ordering for
       UTF-16 or UCS-4 (UTF-32) converters  only.  Valid  values  are  little-
       endian or big-endian.

	      If  ICONV_NOBOM  is  set	to  a non-null value, the default byte
	      ordering is big-endian. If ICONV_NOBOM is not set,  the  default
	      byte ordering is little-endian.  Setting the ICONV_BYTEORDER and
	      ICONV_NOBOM environment variables may be necessary when  produc‐
	      ing  UTF-16  or  UCS-4  output that will be processed by codeset
	      converters on platforms other  than  Tru64  UNIX.	  Defines  the
	      default string to be substituted in output for valid input char‐
	      acters that cannot be converted from the source codeset  to  the
	      destination  codeset.  The  variable  value  can be an arbitrary
	      string or a code number. If the value  is	 a  code  number  (for
	      example,	10, 07, 0x10, or, for Unicode converters, U+1234), the
	      corresponding character in the output  codeset  (to-codeset)  is

	      For  a  given  type of codeset conversion, a matching ICONV_DEF‐
	      STR_from-codeset_to-codeset variable  has	 precedence  over  the
	      ICONV_DEFSTR  variable  without the from-codeset_to-codeset suf‐
	      fix.  When defining the variable with the suffix, replace	 from-
	      codeset_to-codeset  with	the  name  of the codeset converter to
	      which the variable applies. The ICONV_DEFSTR  variable  (defined
	      without  the   suffix) is used by a converter when no ICONV_DEF‐
	      STR_from-codeset_to-codeset variable has been  defined  specifi‐
	      cally for the type of conversion being done.

	      If  these	 variables  are	 not  defined  or  are set to the null
	      string, the characters that cannot be converted are skipped  and
	      have no representation in converted output.

	      The   following	converter-specific   restrictions   apply   to
	      ICONV_DEFSTR* variables: ICONV_DEFSTR* environment variables  do
	      not  work	 for converters that convert between Japanese codesets
	      or between Korean codesets.  For converters that handle  UTF-16,
	      UCS-4  or	 UTF-8 format, the only valid variable value is a code
	      number (such as U+1234 or 0x10) or a string  whose  value	 is  a
	      single  ASCII  character	(such as ?). For these converters, any
	      string value other than a single ASCII character is ignored  and
	      any  characters  that cannot be converted have no representation
	      in output.  For converters that handle output in	UTF-16,	 UCS-4
	      or  UTF-8	 format,  characters  that cannot be converted and for
	      which no valid ICONV_DEFSTR* value has been defined  produce  an
	      error  condition	that  aborts the conversion process.  Disables
	      generation of the byte-order mark at the beginning of UTF-16  or
	      UCS-4  (UTF-32) output.  A valid setting is any value other than
	      a null string. If ICONV_NOBOM is set, big-endian is  established
	      as  the default byte ordering and BOM generation is disabled. If
	      ICONV_NOBOM is not set,  little-endian  is  established  as  the
	      default byte ordering and BOM generation is enabled.

	      Codeset  converters  that	 process UTF-16 or UCS-4 data on plat‐
	      forms other than Tru64 UNIX usually require the byte-order mark.
	      The  ICONV_NOBOM	and ICONV_BYTEORDER environment variables pro‐
	      vide you with the means to control the  generation  of  a	 byte-
	      order  mark  and	byte ordering. Thus, you can establish codeset
	      conversion that is appropriate  to  the  requirements  of	 other
	      platforms	 or is compatible with output produced by codeset con‐
	      verters that were included in versions of Tru64  UNIX  prior  to
	      Version  4.0D.   Activates phrase conversion for converters that
	      convert from a Traditional Chinese codeset  (except  for	Tradi‐
	      tional  Chinese  encoded	in  Telecode)  to a Simplified Chinese
	      codeset or the reverse. When phrase conversion is	 activated,  a
	      whole  phrase in Traditional Chinese is converted to a different
	      phrase in Simplified Chinese or the reverse.

	      If ICONV_PHRCONV is set to mark, the converted  phrases  are  be
	      bracketed by [ and ] to highlight the conversion result for vis‐
	      ual checking.

	      The phrase conversion databases in the  /usr/share/phrdb	direc‐
	      tory  are normal text files with the same file names as those of
	      the algorithmic converters in  /usr/lib/nls/loc/iconv/*.	 These
	      phrase  conversion  databases contain entries for phrase conver‐
	      sion pairs.

       Algorithmic converters Table converters Phrase conversion databases

       Commands: genxlt(1), iconv(1), phrase(1)

       Functions: iconv(3), iconv_close(3), iconv_open(3)

       Others: i18n_intro(5), l10n_intro(5)


List of man pages available for DigitalUNIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
Vote for polarhome
Free Shell Accounts :: the biggest list on the net