Unicode::String man page on OpenServer

Man page or keyword search:  
man Server   5388 pages
apropos Keyword Search (all sections)
Output format
OpenServer logo
[printable version]

String(3)	      User Contributed Perl Documentation	     String(3)

NAME
       Unicode::String - String of Unicode characters (UTF-16BE)

SYNOPSIS
	use Unicode::String qw(utf8 latin1 utf16be);

	$u = utf8("string");
	$u = latin1("string");
	$u = utf16be("\0s\0t\0r\0i\0n\0g");

	print $u->utf32be;   # 4 byte characters
	print $u->utf16le;   # 2 byte characters + surrogates
	print $u->utf8;	     # 1-4 byte characters

DESCRIPTION
       A "Unicode::String" object represents a sequence of Unicode characters.
       Methods are provided to convert between various external formats
       (encodings) and "Unicode::String" objects, and methods are provided for
       common string manipulations.

       The functions utf32be(), utf32le(), utf16be(), utf16le(), utf8(),
       utf7(), latin1(), uhex(), uchr() can be imported from the "Uni-
       code::String" module and will work as constructors initializing strings
       of the corresponding encoding.

       The "Unicode::String" objects overload various operators, which means
       that they in most cases can be treated like plain strings.

       Internally a "Unicode::String" object is represented by a string of 2
       byte numbers in network byte order (big-endian). This representation is
       not visible by the API provided, but it might be useful to know in
       order to predict the efficiency of the provided methods.

       METHODS

       Class methods

       The following class methods are available:

       Unicode::String->stringify_as
       Unicode::String->stringify_as( $enc )
	   This method is used to specify which encoding will be used when
	   "Unicode::String" objects are implicitly converted to and from
	   plain strings.

	   If an argument is provided it sets the current encoding.  The argu-
	   ment should have one of the following: "ucs4", "utf32", "utf32be",
	   "utf32le", "ucs2", "utf16", "utf16be", "utf16le", "utf8", "utf7",
	   "latin1" or "hex".  The default is "utf8".

	   The stringify_as() method returns a reference to the current encod-
	   ing function.

       $us = Unicode::String->new
       $us = Unicode::String->new( $initial_value )
	   This is the object constructor.  Without argument, it creates an
	   empty "Unicode::String" object.  If an $initial_value argument is
	   given, it is decoded according to the specified stringify_as()
	   encoding, UTF-8 by default.

	   In general it is recommended to import and use one of the encoding
	   specific constructor functions instead of invoking this method.

       Encoding methods

       These methods get or set the value of the "Unicode::String" object by
       passing strings in the corresponding encoding.  If a new value is
       passed as argument it will set the value of the "Unicode::String", and
       the previous value is returned.	If no argument is passed then the cur-
       rent value is returned.

       To illustrate the encodings we show how the 2 character sample string
       of "m" (micro meter) is encoded for each one.

       $us->utf32be
       $us->utf32be( $newval )
	   The string passed should be in the UTF-32 encoding with bytes in
	   big endian order.  The sample "m" is "\0\0\0\xB5\0\0\0m" in this
	   encoding.

	   Alternative names for this method are utf32() and ucs4().

       $us->utf32le
       $us->utf32le( $newval )
	   The string passed should be in the UTF-32 encoding with bytes in
	   little endian order.	 The sample "m" is is "\xB5\0\0\0m\0\0\0" in
	   this encoding.

       $us->utf16be
       $us->utf16be( $newval )
	   The string passed should be in the UTF-16 encoding with bytes in
	   big endian order. The sample "m" is "\0\xB5\0m" in this encoding.

	   Alternative names for this method are utf16() and ucs2().

	   If the string passed to utf16be() starts with the Unicode byte
	   order mark in little endian order, the result is as if utf16le()
	   was called instead.

       $us->utf16le
       $us->utf16le( $newval )
	   The string passed should be in the UTF-16 encoding with bytes in
	   little endian order.	 The sample "m" is is "\xB5\0m\0" in this
	   encoding.  This is the encoding used by the Microsoft Windows API.

	   If the string passed to utf16le() starts with the Unicode byte
	   order mark in big endian order, the result is as if utf16le() was
	   called instead.

       $us->utf8
       $us->utf8( $newval )
	   The string passed should be in the UTF-8 encoding. The sample "m"
	   is "\xC2\xB5m" in this encoding.

       $us->utf7
       $us->utf7( $newval )
	   The string passed should be in the UTF-7 encoding. The sample "m"
	   is "+ALU-m" in this encoding.

	   The UTF-7 encoding only use plain US-ASCII characters for the
	   encoding.  This makes it safe for transport through 8-bit stripping
	   protocols.  Characters outside the US-ASCII range are
	   base64-encoded and '+' is used as an escape character.  The UTF-7
	   encoding is described in RFC 1642.

	   If the (global) variable $Uni-
	   code::String::UTF7_OPTIONAL_DIRECT_CHARS is TRUE, then a wider
	   range of characters are encoded as themselves.  It is even TRUE by
	   default.  The characters affected by this are:

	      ! " # $ % & * ; < = > @ [ ] ^ _ ` { | }

       $us->latin1
       $us->latin1( $newval )
	   The string passed should be in the ISO-8859-1 encoding. The sample
	   "m" is "\xB5m" in this encoding.

	   Characters outside the "\x00" .. "\xFF" range are simply removed
	   from the return value of the latin1() method.  If you want more
	   control over the mapping from Unicode to ISO-8859-1, use the "Uni-
	   code::Map8" class.  This is also the way to deal with other 8-bit
	   character sets.

       $us->hex
       $us->hex( $newval )
	   The string passed should be plain ASCII where each Unicode charac-
	   ter is represented by the "U+XXXX" string and separated by a single
	   space character.  The "U+" prefix is optional when setting the
	   value.  The sample "m" is "U+00b5 U+006d" in this encoding.

       String Operations

       The following methods are available:

       $us->as_string
	   Converts a "Unicode::String" to a plain string according to the
	   setting of stringify_as().  The default stringify_as() encoding is
	   "utf8".

       $us->as_num
	   Converts a "Unicode::String" to a number.  Currently only the dig-
	   its in the range 0x30 .. 0x39 are recognized.  The plan is to even-
	   tually support all Unicode digit characters.

       $us->as_bool
	   Converts a "Unicode::String" to a boolean value.  Only the empty
	   string is FALSE.  A string consisting of only the character U+0030
	   is considered TRUE, even if Perl consider "0" to be FALSE.

       $us->repeat( $count )
	   Returns a new "Unicode::String" where the content of $us is
	   repeated $count times.  This operation is also overloaded as:

	     $us x $count

       $us->concat( $other_string )
	   Concatenates the string $us and the string $other_string.  If
	   $other_string is not an "Unicode::String" object, then it is first
	   passed to the Unicode::String->new constructor function.  This
	   operation is also overloaded as:

	     $us . $other_string

       $us->append( $other_string )
	   Appends the string $other_string to the value of $us.  If
	   $other_string is not an "Unicode::String" object, then it is first
	   passed to the Unicode::String->new constructor function.  This
	   operation is also overloaded as:

	     $us .= $other_string

       $us->copy
	   Returns a copy of the current "Unicode::String" object.  This oper-
	   ation is overloaded as the assignment operator.

       $us->length
	   Returns the length of the "Unicode::String".	 Surrogate pairs are
	   still counted as 2.

       $us->byteswap
	   This method will swap the bytes in the internal representation of
	   the "Unicode::String" object.

	   Unicode reserve the character U+FEFF character as a byte order
	   mark.  This works because the swapped character, U+FFFE, is
	   reserved to not be valid.  For strings that have the byte order
	   mark as the first character, we can guaranty to get the byte order
	   right with the following code:

	      $ustr->byteswap if $ustr->ord == 0xFFFE;

       $us->unpack
	   Returns a list of integers each representing an UCS-2 character
	   code.

       $us->pack( @uchr )
	   Sets the value of $us as a sequence of UCS-2 characters with the
	   characters codes given as parameter.

       $us->ord
	   Returns the character code of the first character in $us.  The
	   ord() method deals with surrogate pairs, which gives us a result-
	   range of 0x0 .. 0x10FFFF.  If the $us string is empty, undef is
	   returned.

       $us->chr( $code )
	   Sets the value of $us to be a string containing the character
	   assigned code $code.	 The argument $code must be an integer in the
	   range 0x0 .. 0x10FFFF.  If the code is greater than 0xFFFF then a
	   surrogate pair created.

       $us->name
	   In scalar context returns the official Unicode name of the first
	   character in $us.  In array context returns the name of all charac-
	   ters in $us.	 Also see Unicode::CharName.

       $us->substr( $offset )
       $us->substr( $offset, $length )
       $us->substr( $offset, $length, $subst )
	   Returns a sub-string of $us.	 Works similar to the builtin substr()
	   function.

       $us->index( $other )
       $us->index( $other, $pos )
	   Locates the position of $other within $us, possibly starting the
	   search at position $pos.

       $us->chop
	   Chops off the last character of $us and returns it (as a "Uni-
	   code::String" object).

FUNCTIONS
       The following functions are provided.  None of these are exported by
       default.

       byteswap2( $str, ... )
	   This function will swap 2 and 2 bytes in the strings passed as
	   arguments.  If this function is called in void context, then it
	   will modify its arguments in-place.	Otherwise, the swapped strings
	   are returned.

       byteswap4( $str, ... )
	   The byteswap4 function works similar to byteswap2, but will reverse
	   the order of 4 and 4 bytes.

       latin1( $str )
       utf7( $str )
       utf8( $str )
       utf16le( $str )
       utf16be( $str )
       utf32le( $str )
       utf32be( $str )
	   Constructor functions for the various Unicode encodings.  These
	   return new "Unicode::String" objects.  The provided argument should
	   be encoded correspondingly.

       uhex( $str )
	   Constructs a new "Unicode::String" object from a string of hex val-
	   ues.	 See hex() method above for description of the format.

       uchar( $num )
	   Constructs a new one character "Unicode::String" object from a Uni-
	   code character code.	 This works similar to perl's builtin chr()
	   function.

SEE ALSO
       Unicode::CharName, Unicode::Map8

       <http://www.unicode.org/>

       perlunicode

COPYRIGHT
       Copyright 1997-2000,2005 Gisle Aas.

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.8.8			  2005-10-26			     String(3)
[top]
                             _         _         _ 
                            | |       | |       | |     
                            | |       | |       | |     
                         __ | | __ __ | | __ __ | | __  
                         \ \| |/ / \ \| |/ / \ \| |/ /  
                          \ \ / /   \ \ / /   \ \ / /   
                           \   /     \   /     \   /    
                            \_/       \_/       \_/ 
More information is available in HTML format for server OpenServer

List of man pages available for OpenServer

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net