mbrtowc man page on OpenBSD

Man page or keyword search:  
man Server   11362 pages
apropos Keyword Search (all sections)
Output format
OpenBSD logo
[printable version]

MBRTOWC(3)		  OpenBSD Programmer's Manual		    MBRTOWC(3)

NAME
     mbrtowc - converts a multibyte character to a wide character
     (restartable)

SYNOPSIS
     #include <wchar.h>

     size_t
     mbrtowc(wchar_t * restrict wc, const char * restrict s, size_t
     n, mbstate_t * restrict mbs);

DESCRIPTION
     The mbrtowc() function examines at most n bytes of the multibyte
     character byte string pointed to by s, converts those bytes to a wide
     character, and stores the wide character in the wchar_t object pointed to
     by wc if wc is not NULL and s points to a valid character.

     Conversion happens in accordance with the conversion state described by
     the mbstate_t object pointed to by mbs.  The mbstate_t object must be
     initialized to zero before the application's first call to mbrtowc().  If
     the previous call to mbrtowc() did not return (size_t)-1, the mbstate_t
     object can safely be reused without reinitialization.

     The behaviour of mbrtowc() is affected by the LC_CTYPE category of the
     current locale.  If the locale is changed without reinitialization of the
     mbstate_t object pointed to by mbs, the behaviour of mbrtowc() is
     undefined.

     Unlike mbtowc(3), mbrtowc() will accept an incomplete byte sequence
     pointed to by s which does not form a complete character but is
     potentially part of a valid character.  In this case, mbrtowc() consumes
     all such bytes.  The conversion state saved in the mbstate_t object
     pointed to by mbs will be used to restart the suspended conversion during
     the next call to mbrtowc().

     In state-dependent encodings, s may point to a special sequence of bytes
     called a ``shift sequence''.  Shift sequences switch between character
     code sets available within an encoding scheme.  One encoding scheme using
     shift sequences is ISO/IEC 2022-JP, which can switch e.g. from ASCII
     (which uses one byte per character) to JIS X 0208 (which uses two bytes
     per character).  Shift sequence bytes correspond to no individual wide
     character, so mbrtowc() treats them as if they were part of the
     subsequent multibyte character.  Therefore they do contribute to the
     number of bytes in the multibyte character.

     Special cases in interpretation of arguments are as follows:

     wc == NULL	   The conversion from a multibyte character to a wide
		   character is performed and the conversion state may be
		   affected, but the resulting wide character is discarded.

		   This can be used to find out how many bytes are contained
		   in the multibyte character pointed to by s.

     s == NULL	   mbrtowc() ignores wc and n, and behaves equivalent to

			 mbrtowc(NULL, "", 1, mbs);

		   which attempts to use the mbstate_t object pointed to by
		   mbs to start or continue conversion using the empty string
		   as input, and discards the conversion result.

		   If conversion succeeds, this call always returns zero.
		   Unlike mbtowc(3), the value returned does not indicate
		   whether the current encoding of the locale is state-
		   dependent, i.e. uses shift sequences.

     mbs == NULL   mbrtowc() uses its own internal state object to keep the
		   conversion state, instead of an mbstate_t object pointed to
		   by mbs.  This internal conversion state is initialized once
		   at program startup.	It is not safe to call mbrtowc() again
		   with a NULL mbs argument if mbrtowc() returned (size_t)-1
		   because at this point the internal conversion state is
		   undefined.

		   Calling any other functions in libc never changes the
		   internal conversion state object of mbrtowc().

RETURN VALUES
     0		   The bytes pointed to by s form a terminating NUL character.
		   If wc is not NULL, a NUL wide character has been stored in
		   the wchar_t object pointed to by wc.

     positive	   s points to a valid character, and the value returned is
		   the number of bytes completing the character.  If wc is not
		   NULL, the corresponding wide character has been stored in
		   the wchar_t object pointed to by wc.

     (size_t)-1	   s points to an illegal byte sequence which does not form a
		   valid multibyte character in the current locale.  mbrtowc()
		   sets errno to EILSEQ.  The conversion state object pointed
		   to by mbs is left in an undefined state and must be
		   reinitialized before being used again.

		   Because applications using mbrtowc() are shielded from the
		   specifics of the multibyte character encoding scheme, it is
		   impossible to repair byte sequences containing encoding
		   errors.  Such byte sequences must be treated as invalid and
		   potentially malicious input.	 Applications must stop
		   processing the byte string pointed to by s and either
		   discard any wide characters already converted, or cope with
		   truncated input.

     (size_t)-2	   s points to an incomplete byte sequence of length n which
		   has been consumed and contains part of a valid multibyte
		   character.  mbrtowc() sets errno to EILSEQ.	The character
		   may be completed by calling mbrtowc() again with s pointing
		   to one or more subsequent bytes of the multibyte character
		   and mbs pointing to the conversion state object used during
		   conversion of the incomplete byte sequence.

ERRORS
     The mbrtowc() function may cause an error in the following cases:

     [EILSEQ]	   s points to an invalid or incomplete multibyte character.

     [EINVAL]	   mbs points to an invalid or uninitialized mbstate_t object.

SEE ALSO
     mbrlen(3), mbtowc(3), setlocale(3)

STANDARDS
     The mbrtowc() function conforms to ISO/IEC 9899/AMD1:1995 (``ISO C90,
     Amendment 1'').  The restrict qualifier is added at ISO/IEC 9899:1999
     (``ISO C99'').

CAVEATS
     mbrtowc() is not suitable for programs that care about internals of the
     character encoding scheme used by the byte string pointed to by s.

     It is possible that mbrtowc() fails because of locale configuration
     errors.  An ``invalid'' character sequence may simply be encoded in a
     different encoding than that of the current locale.

     The special cases for s == NULL and mbs == NULL do not make any sense.
     Instead of passing NULL for mbs, mbtowc(3) can be used.

     Earlier versions of this man page implied that calling mbrtowc() with a
     NULL s argument would always set mbs to the initial conversion state.
     But this is true only if the previous call to mbrtowc() using mbs did not
     return (size_t)-1 or (size_t)-2.  It is recommended to zero the mbstate_t
     object instead.

OpenBSD 4.9		       December 5, 2010			   OpenBSD 4.9
[top]

List of man pages available for OpenBSD

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net