optu8to16 man page on MirBSD

Man page or keyword search:  
man Server   6113 pages
apropos Keyword Search (all sections)
Output format
MirBSD logo
[printable version]

OPTU8TO16(3)		   BSD Programmer's Manual		  OPTU8TO16(3)

NAME
     optu8to16, optu8to16vis - converts multibyte characters to wide charac-
     ters preserving octets

SYNOPSIS
     #include <wchar.h>

     size_t
     optu8to16(wchar_t * restrict pwc, const char * restrict s, size_t n,
	     mbstate_t * restrict ps);

     size_t
     optu8to16vis(wchar_t * restrict pwc, const char * restrict s, size_t n,
	     mbstate_t * restrict ps);
     /* deprecated */

DESCRIPTION
     The optu8to16() function usually converts the multibyte character pointed
     to by s to a wide character, and stores the wide character in the wchar_t
     object pointed to by pwc if pwc is non-null and s points to a valid char-
     acter in the CESU-8 multibyte encoding, similar to mbrtowc() in a UTF-8
     locale. If s does not point to a valid character, the first octet is
     transliterated to either an ISO_646.irv:1991 (ASCII) mapping into UCS-2
     (U+0000 .. U+007F), or to the OPTU-16 raw octet range (U+EF80 .. U+EFFF).
     The optu8to16vis() function behaves the same, except raw octets are
     mapped into the normal unicode range as if they had been encoded in the
     legacy 8-bit codepage. The conversion happens in accordance with the
     conversion state described in the mbstate_t object pointed to by ps; it
     should be noted that raw octet conversion is stateful. This function may
     examine at most n bytes of the array beginning from s. If n is set to 0,
     the function behaves as if end of input (not a null character) has been
     read and ignores s.

     If s points to a valid character and the character corresponds to a null
     wide character, then the function places the mbstate_t object pointed to
     by ps to an initial conversion state.

     These are the special cases:

     pwc == NULL    The conversion from a multibyte character to a wide char-
		    acter has taken place and the conversion state may be af-
		    fected, but the resultant wide character is discarded.

     s == NULL	    optu8to16() sets the conversion state object pointed to by
		    ps to an initial state and always returns 0. In this case,
		    optu8to16() ignores pwc but not n, and is equivalent to
		    the following call:

			  optu8to16(NULL, "", 1, ps);

     n == 0	    Read end of input (not a null character, but an epsilon as
		    known from computer science automaton modelling) and ig-
		    nore s. optu8to16() will still emit up to two wide charac-
		    ters and return 0, if the conversion state contains infor-
		    mation about these, and (size_t)-2 otherwise. Application
		    note: If the end of input has been reached, call
		    optu8to16() with n == 0 until it returns (size_t)-2, and
		    process the remaining wide characters emitted. This en-
		    sures no raw octets in the OPTU-8 encoded source are lost.

     ps == NULL	    optu8to16() uses its own internal state object to keep the
		    conversion state, instead of ps mentioned in this manual
		    page.

		    Calling any other functions in libc never change the
		    internal state of optu8to16(), which is initialised at
		    programme startup time.

RETURN VALUES
     0 or positive  Number of bytes read from s. If 0, the state contained
		    enough information to emit a wide character; if positive,
		    the bytes form a valid multibyte character in the OPTU-8
		    encoding.

     (size_t)-2	    s points to the byte sequence which possibly contains part
		    of a valid multibyte character, but which is incomplete.
		    All n bytes of the input have been processed and stored in
		    ps.

     (size_t)-1	    Generic error condition; should not happen in the current
		    implementation. errno is set to indicate the error.

ERRORS
     The optu8to16() function is designed to be as robust as possible and can,
     in contrast to mbrtowc(), not throw EILSEQ. While EINVAL to indicate an
     invalid or uninitialised mbstate_t object is theoretically possible, nei-
     ther this nor other processing errors should ever happen with the current
     implementaton.

SEE ALSO
     iswoctet(3), mbrtowc(3), optu16to8(3)

STANDARDS
     At present, MirOS is limited to the Unicode BMP (Basic Multilingual
     Plane), thus OPTU-8 is limited to the common subset of CESU-8 and UTF-8.

     The optu16to8() and optu8to16() functions are standardised by MirOS and
     have been designed to behave as close to their ISO/IEC 9899:1999 ("ISO
     C99") equivalents wcrtombs() and mbrtowcs() as possible, with the follow-
     ing intentional exceptions:

     If n is 0, s is ignored, even if it is NULL, not the other way round. The
     return value 0 does not indicate that a null character was processed, use
     pwc for that. It indicates that no byte of the input has been read.

     The optu8to16vis() function assumes codepage 1252 and maps holes into
     distinguishable codepoints.

     All these extended functions declare macros with the same name that can
     be used to check for their presence.

HISTORY
     The optu8to16 function first appeared in MirOS #11.

AUTHORS
     Thorsten Glaser <tg@mirbsd.de> wrote the entire internationalisation im-
     plementation in MirOS. He is also the steward for the OPTU encoding.

CAVEATS
     On a system whose wide character type is only 16 bits wide, as opposed to
     31 bits of ISO 10646, the OPTU encoder and decoder are permitted to not
     de- and recompose any surrogates encountered and pass them through as if
     they were regular wide characters with no special function. Since MirOS
     is such a system, the reference implementation does not care about UTF-16
     surrogates posing as OPTU-16 characters at all; a planes-aware Unicode
     application is required to handle surrogates by itself. For compatibility
     purposes, optu8to16 should always be assumed to not treat surrogates spe-
     cially; applications must ensure to not produce invalid surrogates unless
     limited to the BMP.

MirOS				March 17, 2010				     1
[top]

List of man pages available for MirBSD

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net