locale(4)locale(4)NAMElocale - Contains one or more categories that describe a localeDESCRIPTION
A locale definition source file contains one or more categories that
describe a locale. You can convert a locale definition source file
into a locale by using the localedef command. Locales can be modified
only by editing a locale definition source file and then using the
localedef command again on the new source file.
Each locale source file section defines a category of locale data. A
source file cannot contain more than one section for the same category.
The following standard categories are supported: Defines character or
string collation information Defines character classification, case
conversion, and other character properties or attributes Defines the
format for affirmative and negative responses Defines rules and symbols
for formatting monetary numeric information Defines a list of rules and
symbols for formatting nonmonetary numeric information Defines a list
of rules and symbols for formatting time and date information
You can include optional declarations at the beginning of your locale
source file to override the default comment and escape characters used
in locale category definitions: Escape character
The escape character is used in decimal or hexadecimal constants
when these are specified in the locale file. The default escape
character is the backslash (\). To define another escape charac‐
ter, include a line with the following format:
escape_char <char_symbol> Comment character
The comment character is the first character of any comment
entries in the locale file. The default comment character is the
number sign (#). To define another comment character, use the
following format:
comment_char <char_symbol>
In the preceding formats, <char_symbol> is the character's symbolic
name as defined in the charmap file used to build the locale's codeset.
One or more blank characters (spaces or tabs) must separate escape_char
or comment_char from <char_symbol>.
Each category source definition consists of the following: The category
header (category_name) The associated keyword/value pairs that comprise
the category body The category trailer (END category_name)
For example:
LC_CTYPE <source for LC_CTYPE category> END LC_CTYPE
The source for all of the categories is specified using keywords,
strings, character literals, and character symbols. Each keyword iden‐
tifies either a definition or a rule. The remainder of the statement
containing the keyword contains the operands to the keyword. Operands
are separated from the keyword by one or more blank characters (spaces
or tabs). A statement may be continued on the next line by placing a \
(backslash) as the last character before the newline character that
terminates the line. Lines containing the # (comment character) in the
first column are treated as comment lines.
A symbolic name begins with the < (left-angle bracket) character and
ends with the > (right-angle bracket) character. The characters
between the < and the > can be any characters from the Portable Charac‐
ter Set, except for control and space characters. For example, <A-
diaeresis> could be a symbolic name for a character. Any symbolic name
referenced in the locale source file must be defined in the Portable
Character Set or in the character set description (charmap) file for
that locale.
A character literal is the character itself, or else a decimal, hexa‐
decimal, or octal constant. A decimal constant is of the following
form:
\dddd or \ddd
where d is a decimal digit.
A hexadecimal constant is of the following form:
\xxx
where x is a hexadecimal digit.
An octal constant is of the following form:
\ooo or \oo
where o is an octal digit.
The explicit definition of each category in a locale definition source
file is not required. When a category is undefined in a locale defini‐
tion source file, the category value defaults to the value in the C
locale definition.
The LC_COLLATE Category
The LC_COLLATE category defines the relative order between collating
elements.
A collation element is the unit of comparison for collation. A colla‐
tion element may be a character or a sequence of characters. Every col‐
lation element in the locale has a set of weights, which determine if
the collation element collates before, equal to, or after the other
collation elements in the locale. Each collation element is assigned
collation weights by the localedef command when the locale definition
source file is compiled. These collation weights are then used by
applications programs that compare strings.
Comparison of strings is performed by comparing the collation weights
of each character in the string until either a difference is found or
the strings are determined to be equal. This comparison may be per‐
formed several times if the locale defines multiple collation orders.
For example, in the French locale, the strings are compared using a
primary set of collation weights. If they are equal on the basis of
this comparison, they are compared again using a secondary set of col‐
lation weights. A collating element has a set of collation weights
associated with it that is equal to the number of collation orders
defined for the locale.
Every character defined in the charmap file (or every character in the
portable character set if no charmap file is specified) is itself a
collating element. Additional collating elements can be defined using
the collating-element statement. The syntax is as follows:
collating-element <character_symbol> from <string>
The LC_COLLATE category begins with the keyword LC_COLLATE and ends
with the keyword END LC_COLLATE.
The following keywords are recognized in the LC_COLLATE category: The
copy statement specifies the name of an existing locale to be used as
the definition of this category. If you specify a copy statement, you
can specify no other keywords in the category. The collating-element
statement is used to specify multicharacter collating elements.
The character_symbol argument defines a collating element that
is a string of one or more characters as a single collating ele‐
ment. The character_symbol argument cannot duplicate any sym‐
bolic name in the current charmap file or any other symbolic
name defined in this collation definition. The string argument
specifies a string of two or more characters that define the
character_symbol argument. The following are examples of the
syntax for the collating-element statement:
collating-element <ch> from "<c><h>" collating-element <e-acute>
from "<acute><e>" collating-element <11> from "<1><1>"
A character_symbol argument defined by the collating-element
statement is recognized only within the LC_COLLATE category.
The collating-symbol statement is used to specify collation sym‐
bols for use in collation sequence statements.
The syntax for the collating-symbol statement is as follows:
collating-symbol <collating_symbol>
The collating_symbol argument cannot duplicate any symbolic name
in the current charmap file or any other symbolic name defined
in this collation definition. The following are examples of
collating-symbol statements:
collating-symbol <UPPER_CASE> collating-symbol <HIGH>
A collating_symbol argument defined by the collating-symbol
statement is recognized only within the LC_COLLATE category.
The order_start statement is followed by one or more collation
order statements, assigning collation weights to collating ele‐
ments. This statement is mandatory.
The syntax for the order_start statement is as follows:
order_start <sort_rules>;<sort_rules>;\...;<sort_rules> colla‐
tion_order_statements order_end
The sort_rules have the following syntax:
keyword, keyword,...,keyword
where keyword is the keyword forward, backward, or position.
The sort_rules directives are optional. If present, they define
the rules to apply during string comparison. The number of spec‐
ified sort_rules directives defines the number of weights each
collating element is assigned; that is, the directives define
the number of collation orders in the locale. If no sort_rules
directives are present, one forward directive is assumed and
comparisons are made on a character basis rather than a string
basis.
If directives are present, the first sort_rules directive
applies when comparing strings that use the primary weight, the
second when comparing strings that use the secondary weight, and
so on. Each set of sort_rules directives is separated by a ;
(semicolon). A sort_rules directive consists of one or more
comma-separated keywords. The following keywords are supported:
Specifies that collation weight comparisons proceed from the
beginning of a string to the end of the string. Specifies that
collation weight comparisons proceed from the end of a string to
the beginning of the string. Specifies that collation weight
comparisons consider the relative position of nonignored ele‐
ments in the string. That is, if strings compare as equal, the
element with the shortest distance from the starting point of
the comparison collates first.
The forward and backward keywords are mutually exclusive. The
following is an example of a sort_rules directive:
order_start forward;backward
The following syntax rules apply to collation order statements: Each
collation order statement consists of a <character_symbol> specifica‐
tion, followed by white space and a set of collation orders. Charac‐
ters in the character set can be explicitly specified in the collation
orders or implicitly specified using the ellipsis symbol (...). A col‐
lation order statement that begins with the UNDEFINED special symbol
specifies any characters that are in the character set and not explic‐
itly or implicitly specified by other collation order statements.
The optional operands for each collation element are used to define the
primary, secondary, or subsequent weights for the collating element.
The special symbol IGNORE is used to indicate a collating element that
is to be ignored when strings are compared.
An ellipsis keyword appearing in place of a collating_element_list
indicates the weights are to be assigned, for the characters in the
identified range, in numerically increasing order from the weight for
the character symbol on the left-hand side of the preceding statement.
The use of the ellipsis keyword results in a locale that may collate
differently when compiled with different character set description
(charmap) source files. For this reason, the localedef command will
issue a warning when the ellipsis keyword is encountered.
The UNDEFINED special symbol includes all coded character set values
not specified explicitly or with an ellipsis symbol. These characters
are inserted in the character collation order at the point indicated by
the UNDEFINED special symbol in the order of their character code set
values. If no UNDEFINED special symbol exists and the collation order
does not specify all collation elements from the coded character set, a
warning is issued and all undefined characters are placed at the end of
the character collation order.
The following is an example of a collation order statement in the
LC_COLLATE locale definition source file category:
order_start forward;backward UNDEFINED IGNORE;IGNORE <LOW>
<space> <LOW>;<space> .. <LOW>;... <a>
<a>;<a> <a-acute> <a>;<a-acute> <a-grave> <a>;<a-grave> <A>
<a>;<A> <A-acute> <a>;<A-acute> <A-grave> <a>;<A-grave>
<ch> <ch>;<ch> <Ch> <ch>;<Ch> <s>
<s>;<s> <ss> <s><s>;<s><s> <eszet>
<s><s>;<eszet><eszet> ... <HIGH>;... <HIGH> order_end
This example is interpreted as follows: The UNDEFINED special symbol
indicates that all characters not specified in the definition (either
explicitly or by the ellipsis symbol) are ignored for collation pur‐
poses. All collating elements between <space> and <a> have the same
primary equivalence class and individual secondary weights based on
their coded character set values. All versions of the letter a (upper‐
case and lowercase, and with or without diacriticals) belong to the
same primary collation class. The <c><h> multicharacter collating ele‐
ment is represented by the <ch> collating symbol and belongs to the
same primary equivalence class as the <C><h> multicharacter collating
element. The <eszet> character is collated as an <s><s> string. That
is, one <eszet> character is expanded to two characters before compar‐
ing.
The LC_CTYPE Category
The LC_CTYPE category of a locale definition source file defines char‐
acter classification, case conversion, and other character attributes.
This category begins with an LC_CTYPE category header and terminates
with an END LC_CTYPE category trailer.
All operands for LC_CTYPE category statements are defined as lists of
characters. Each list consists of one or more semicolon-separated
characters or symbolic character names. An ellipsis (...) can represent
a series of characters; for example, <a>;...;<z> represents the charac‐
ters in the range a through z.
There are multiple sets of property keywords that are recognized in the
LC_CTYPE category. One set contains property keywords and associated
rules defined for locales by the XSH standard. A keyword in this set
can be defined in locales based on any codeset, assuming that the asso‐
ciated property applies to characters in the language supported by the
locale. Another set of property keywords is defined by the Unicode
standard. Define these keywords only in locales using one of the Uni‐
code character encoding formats. Some national language standards also
define properties for characters. Japanese locales define quite a few
supplemental properties to conform with national standards.
The following two subsections describe the sets of keywords as defined
by XSH and Unicode. See Japanese(5) for descriptions of properties
defined in Japanese locales.
Property Keywords Defined by the XSH Standard
The following keywords defined by XSH are recognized in the LC_CTYPE
category. In the descriptions, the term "automatically included" means
that an error does not occur if the referenced characters are included
or omitted. The characters will be provided if they are missing and
will be accepted if they are present.
Specifies the name of an existing locale to be used as the definition
of this category
If you include a copy statement, no other keyword can be speci‐
fied. Defines uppercase letter characters
No character defined by the cntrl, digit, punct, or space key‐
word can be specified. If upper is not defined, A through Z
default to upper. Defines lowercase letter characters
No character defined by the cntrl, digit, punct, or space key‐
word can be specified. If lower is not defined, a through z
default to lower. Defines all letter characters
No character defined by the cntrl, digit, punct, or space key‐
word can be specified. Characters defined by the upper and lower
keywords are automatically included in this character class.
Defines numeric digit characters
Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be speci‐
fied. If digit is not defined, 0 through 9 default to digit.
Defines white-space characters
No character defined by the upper, lower, alpha, digit, graph,
or xdigit keyword can be specified. If space is not defined,
the space, formfeed, newline, carriage-return, tab, and vertical
tab characters default to space. Defines control characters
No character defined by the upper, lower, alpha, digit, punct,
graph, print, or xdigit keyword can be specified. Defines punc‐
tuation characters
The space character and characters defined by the upper, lower,
alpha, digit, cntrl, or xdigit keywords cannot be specified.
Defines printable characters, excluding the space character
If this keyword is not specified, characters defined by the
upper, lower, alpha, digit, xdigit, and punct keywords are auto‐
matically included in this character class. No character
defined by the cntrl keyword can be specified. Defines print‐
able characters, including the space character
If this keyword is not specified, the space character and char‐
acters defined by the upper, lower, alpha, digit, xdigit, and
punct keywords are automatically included in this character
class. No character defined by the cntrl keyword can be speci‐
fied. Defines hexadecimal digit characters
Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be speci‐
fied. Any character can be specified for the hexadecimal values
for 10 to 15, however. These alternate hexadecimal digits are
not used by standard conversion routines when converting digit
strings from hexadecimal to numeric quantities. If xdigit is
not defined, the numbers 0 through 9 and the letters A through F
and a through f default to xdigit. Defines blank characters
If this keyword is not specified, the space and horizontal tab
characters are included in this character class. Any characters
defined by this statement are automatically included in the
space class. Defines the mapping of lowercase characters to
uppercase characters
Operands for this keyword consist of comma-separated character
pairs. Each character pair is enclosed in () (parentheses) and
separated from the next pair by a ; (semicolon). The first
character in each pair is considered a lowercase character; the
second character is considered an uppercase character. Only
characters defined by the lower and upper keywords can be speci‐
fied. If toupper is not defined, a through z is mapped to A
through Z by default. Defines the mapping of uppercase charac‐
ters to lowercase characters
Operands for this keyword consist of comma-separated character
pairs. Each character pair is enclosed in () (parentheses) and
separated from the next pair by a ; (semicolon). The first char‐
acter in each pair is considered an uppercase character; the
second character is considered a lowercase character. Only
characters defined by the lower and upper keywords can be speci‐
fied.
The tolower keyword is optional. If this keyword is not speci‐
fied, the mapping defaults to the reverse mapping of the toupper
keyword, if specified. If the toupper and tolower keywords are
both unspecified, the mapping for each defaults to that of the C
locale.
Additional keywords can be specified to define supplemental character
classifications. For example:
charclass vowel vowel <a>;<e>;<i>;<o>;<u>;<y>
Within the context of the XSH standard, the Unicode character proper‐
ties discussed in the next subsection fall into the category of supple‐
mental property definitions. Note that a supplemental property defi‐
nition can be accessed in programs only by using the wctype() and iswc‐
type() interfaces.
The LC_CTYPE category does not support multicharacter elements. For
example, the German Eszet character is traditionally classified as a
lowercase letter. There is no corresponding uppercase letter; in
proper capitalization of German text, the Eszet character is replaced
by the two characters SS. This kind of conversion is outside of the
scope of the toupper and tolower keywords.
The following is an example of a possible LC_CTYPE category listed in a
locale definition source file:
LC_CTYPE #"alpha" is by default "upper" and "lower" #"alnum" is by def‐
inition "alpha" and "digit" #"print" is by default "alnum", "punct" and
the space character #"graph" is by default "alnum" and "punct"
#"tolower" is by default the reverse mapping of "toupper" # upper
<A>;<B>;<C>;<D>;<E>;<F>;<G>;<H>;<I>;<J>;<K>;<L>;<M>;\
<N>;<O>;<P>;<Q>;<R>;<S>;<T>;<U>;<V>;<W>;<X>;<Y>;<Z> # lower
<a>;<b>;<c>;<d>;<e>;<f>;<g>;<h>;<i>;<j>;<k>;<l>;<m>;\
<n>;<o>;<p>;<q>;<r>;<s>;<t>;<u>;<v>;<w>;<x>;<y>;<z> # digit
<zero>;<one>;<two>;<three>;<four>;<five>;<six>;\
<seven>;<eight>;<nine> # space <tab>;<newline>;<vertical-
tab>;<form-feed>;\
<carriage-return>;<space> # cntrl
<alert>;<backspace>;<tab>;<newline>;<vertical-tab>;\
<form-feed>;<carriage-return>;<NUL>;<SOH>;<STX>;\
<ETX>;<EOT>;<ENQ>;<ACK>;<SO>;<SI>;<DLE>;<DC1>;<DC2>;\
<DC3>;<DC4>;<NAK>;<SYN>;<ETB>;<CAN>;<EM>;<SUB>;\
<ESC>;<IS4>;<IS3>;<IS2>;<IS1>;<DEL> # punct <exclamation-
mark>;<quotation-mark>;<number-sign>;\
<dollar-sign>;<percent-sign>;<ampersand>;<asterisk>;\
<apostrophe>;<left-parenthesis>;<right-parenthesis>;\
<plus-sign>;<comma>;<hyphen>;<period>;<slash>;\
<colon>;<semicolon>;<less-than-sign>;<equals-sign>;\
<greater-than-sign>;<question-mark>;<commercial-at>;\
<left-square-bracket>;<backslash>;<circumflex>;\
<right-square-bracket>;<underline>;<grave-accent>;\
<left-curly-bracket>;<vertical-line>;<tilde>;\
<right-curly-bracket> # xdigit
<zero>;<one>;<two>;<three>;<four>;<five>;<six>;\
<seven>;<eight>;<nine>;<A>;<B>;<C>;<D>;<E>;<F>;\
<a>;<b>;<c>;<d>;<e>;<f> # blank <space>;<tab> # toupper
(<a>,<A>);(<b>,<B>);(<c>,<C>);(<d>,<D>);(<e>,<E>);\
(<f>,<F>);(<g>,<G>);(<h>,<H>);(<i>,<I>);(<j>,<J>);\
(<k>,<K>);(<l>,<L>);(<m>,<M>);(<n>,<N>);(<o>,<O>);\
(<p>,<P>);(<q>,<Q>);(<r>,<R>);(<s>,<S>);(<t>,<T>);\
(<u>,<U>);(<v>,<V>);(<w>,<W>);(<x>,<X>);(<y>,<Y>);\
(<z>,<Z>) # END LC_CTYPE
Property Keywords Defined by the Unicode Standard
Property keywords defined by the Unicode standard can be normative or
informative. For example, a normative property might tell you whether a
character is a letter, a digit, or something else while an informative
property might tell you whether a letter is uppercase or lowercase.
There is also a set of properties, all normative, that applies only to
languages whose scripts are bidirectional (like Chinese, Korean, Japa‐
nese, and Arabic). Mark, non-spacing Mark, spacing combining Mark,
enclosing Number, decimal digit Number, letter Number, other Separator,
space Separator, line Separator, paragraph Other, control Other, format
Other, surrogate Other, private use Other, not assigned Letter, upper‐
case Letter, lowercase Letter, titlecase Letter, modifier Letter, other
Punctuation, connector Punctuation, dash Punctuation, final quote Punc‐
tuation, initial quote Punctuation, open Punctuation, close Punctua‐
tion, other Symbol, math Symbol, currency Symbol, modifier Symbol,
other Left-right; for most alphabetic, syllabic, and logographic char‐
acters (such as ideographs in Asian languages) Right-left; for Arabic,
Hebrew, and punctuation in those languages European number European
number separator European number terminator Arabic number Common number
separator Block separator Segment separator Whitespace Other neutrals:
all other characters like punctuation and symbols
For locales included with the Tru64 UNIX product, only the locales
include Unicode property keywords in addition to those specified in the
XSH standard. Programmers who want to use specific Unicode keywords
with locales to determine a character's classification use the wctype()
and iswctype() functions. Other functions, such as iswdigit(), iswal‐
pha(), and toupper(), access only definitions of properties specified
in the XSH standard. When equivalence exists between an XSH property
and one or more Unicode properties, locales support properties as
defined by both standards. XSH property keywords can be mapped to Uni‐
code property keywords as follows: Uppercase letter: maps to Lu Lower‐
case letter: maps to Ll Digit: maps to Nd, Nl, and No combined Hexidec‐
imal digit: includes specific characters (0-9, a-f, and A-F) A control
or format character: maps to Cc and Cf Any letter: maps to Lu, Ll, Lt,
Lm, and Lo combined Any letter or number: maps to Lu, Ll, Lt, Lm, Lo,
Nd, Nl, and No combined Any punctuation character: maps to Pc, Pd, Ps,
Pe, Pi, Pf, and Po combined Any graphical character: maps to Lu, Ll,
Lt, Lm, Lo, Nd, Nl, No, Pc, Pd, Ps, Pe, Pi, Pf, Po, Sm, Sc, Sk, and So
combined Any printable character: maps to a combination of all Unicode
properties with the exception of Cc, Cf, Cn, Co, and Cs. A space sepa‐
rator: maps to Zs Any separator: maps to Zl, Zp, and Zs
When operating in a *.UTF-8 locale, functions that test for a property
defined in the XSH standard implicitly test a character for any of the
Unicode properties that map to the XSH property. For example, the
iswdigit() function implicitly tests for the Nd, Nl, and No properties
as defined by the Unicode standard.
The LC_MESSAGES Category
The LC_MESSAGES category of a locale definition source file defines the
format for affirmative and negative system responses. This category
begins with an LC_MESSAGES category header and terminates with an END
LC_MESSAGES category trailer.
All operands for the LC_MESSAGES category are defined as strings or
extended regular expressions bounded by " " (double quotes). These op‐
erands are separated from the keyword they define by one or more blank
characters (spaces or tabs). Two adjacent " " (double quotes) indicate
an undefined value.
The following keywords are recognized in the LC_MESSAGES category:
Specifies the name of an existing locale to be used as the definition
of this category
If you include a copy statement, you cannot include other key‐
words. Specifies an extended regular expression that describes
the acceptable affirmative response to a question expecting an
affirmative or negative response Specifies an extended regular
expression that describes the acceptable negative response to a
question expecting an affirmative or negative response Specifies
the locale's equivalent of an acceptable affirmative response
This string is accessible to applications through the nl_lang‐
info subroutine as nl_langinfo (YESSTR). Note that yesstr is
likely to be withdrawn from the XPG4 standard; yesexpr is the
recommended alternative. Specifies the locale's equivalent of
an acceptable negative response
This string is accessible to applications through the nl_lang‐
info subroutine as nl_langinfo (NOSTR). Note that nostr is
likely to be withdrawn from the XPG4 standard; noexpr is the
recommended alternative.
The following is an example of a possible LC_MESSAGES category listed
in a locale definition source file:
LC_MESSAGES # yesexpr "<circumflex><left-square-bracket><y><Y>\ <right-
square-bracket>" noexpr "<circumflex><left-square-bracket><n><N>\
<right-square-bracket>" yesstr "<y><e><s>" nostr "<n><o>" # END
LC_MESSAGES
The LC_MONETARY Category
The LC_MONETARY category of a locale definition source file defines
rules and symbols for formatting monetary numeric information. This
category begins with an LC_MONETARY category header and terminates with
an END LC_MONETARY category trailer.
All operands for the LC_MONETARY category keywords are defined as
string or integer values. String values are bounded by " " (double
quotes). All values are separated from the keyword they define by one
or more blank characters (spaces or tabs). Two adjacent " " (double
quotes) indicate an undefined string value. A -1 (negative one) indi‐
cates an undefined integer value.
The following keywords are recognized in the LC_MONETARY category:
Specifies the name of an existing locale to be used as the definition
of this category
If you include a copy statement, no other keyword will be speci‐
fied. Specifies the string used for the international currency
symbol
The operand for the int_curr_symbol keyword is a 4-character
string. The first three characters contain the alphabetic
international currency symbol. The fourth character specifies a
character separator between the international currency symbol
and a monetary quantity. Specifies the string used for the
local currency symbol. Specifies the string used for the deci‐
mal delimiter that is used to format monetary quantities Speci‐
fies the character separator used for grouping digits to the
left of the decimal delimiter in formatted monetary quantities
Specifies a string that defines the size of each group of digits
in formatted monetary quantities
The operand for the mon_grouping keyword consists of a sequence
of semicolon-separated integers. Each integer specifies the
number of digits in a group. The initial integer defines the
size of the group immediately to the left of the decimal delim‐
iter. The subsequent integers define succeeding groups to the
left of the previous group. If the last integer is not -1,
grouping for any remaining digits is performed using that that
integer. If the last integer is -1, no further grouping is
performed.
The following is an example of the interpretation of the
mon_grouping statement. Assuming the value to be formatted is
123456789 and the operand for the mon_thousands_sep keyword is '
(single quotation mark), the following results occur: Formatted
Value 123456'789 123'456'789 1234'56'789 12'34'56'789 Specifies
the string used to indicate a nonnegative-valued formatted mone‐
tary quantity Specifies the string used to indicate a negative-
valued formatted monetary quantity Specifies an integer value
representing the number of fractional digits (those after the
decimal delimiter) to be displayed in a formatted monetary quan‐
tity using the int_curr_symbol value Specifies an integer value
representing the number of fractional digits (those after the
decimal delimiter) to be displayed in a formatted monetary quan‐
tity using the currency_symbol value Specifies an integer value
indicating whether the int_curr_symbol or currency_symbol string
precedes or follows the value for a nonnegative-formatted mone‐
tary quantity
The following integer values are recognized: Indicates that the
currency symbol follows the monetary quantity Indicates that the
currency symbol precedes the monetary quantity Specifies an
integer value indicating whether the int_curr_symbol or cur‐
rency_symbol string is separated by a space from a nonnegative-
formatted monetary quantity
The following integer values are recognized: Indicates that no
space separates the currency symbol from the monetary quantity
Indicates that a space separates the currency symbol from the
monetary quantity Indicates that a space separates the currency
symbol and the positive_sign string, if adjacent Specifies an
integer value indicating whether the int_curr_symbol or cur‐
rency_symbol string precedes or follows the value for a nega‐
tive-formatted monetary quantity
The following integer values are recognized: Indicates that the
currency symbol follows the monetary quantity Indicates that the
currency symbol precedes the monetary quantity Specifies an
integer value indicating whether the int_curr_symbol or cur‐
rency_symbol string is separated by a space from a negative-for‐
matted monetary quantity
The following integer values are recognized: Indicates that no
space separates the currency symbol from the monetary quantity
Indicates that a space separates the currency symbol from the
monetary quantity Indicates that a space separates the currency
symbol and the negative_sign string, if adjacent Specifies an
integer value indicating the positioning of the positive_sign
string for a nonnegative-formatted monetary quantity
The following integer values are recognized: Indicates that a
left_parenthesis and right_parenthesis symbol enclose both the
monetary quantity and the int_curr_symbol or currency_symbol
string Indicates that the positive_sign string precedes the
quantity and the int_curr_symbol or currency_symbol string Indi‐
cates that the positive_sign string follows the quantity and the
int_curr_symbol or currency_symbol string Indicates that the
positive_sign string immediately precedes the int_curr_symbol or
currency_symbol string Indicates that the positive_sign string
immediately follows the int_curr_symbol or currency_symbol
string Specifies an integer value indicating the positioning of
the negative_sign string for a negative-formatted monetary quan‐
tity
The following integer values are recognized: Indicates that a
left_parenthesis and right_parenthesis symbol enclose both the
monetary quantity and the int_curr_symbol or currency_symbol
string Indicates that the negative_sign string precedes the
quantity and the int_curr_symbol or currency_symbol string Indi‐
cates that the negative_sign string follows the quantity and the
int_curr_symbol or currency_symbol string Indicates that the
negative_sign string immediately precedes the int_curr_symbol or
currency_symbol string Indicates that the negative_sign string
immediately follows the int_curr_symbol or currency_symbol
string Specifies the string used for the debit symbol (DB) to
indicate a negative-formatted monetary quantity
The debit_sign keyword is an extension to the X/Open Portability
Guide and may not be portable to all systems that conform to
that standard. Specifies the string used for the credit symbol
(CR) to indicate a nonnegative-formatted monetary quantity The
credit_sign keyword is an extension to the X/Open Portability
Guide and may not be portable to all systems that conform to
that standard. Specifies the character, equivalent to a ( (left
parenthesis), used by the p_sign_posn and n_sign_posn statements
to enclose a monetary quantity and currency symbol
The left_parenthesis keyword is an extension to the X/Open
Portability Guide and may not be portable to all systems that
conform to that standard. Specifies the character, equivalent
to a ) (right parenthesis), used by the p_sign_posn and
n_sign_posn statements to enclose a monetary quantity and cur‐
rency symbol
The right_parenthesis keyword is an extension to the X/Open
Portability Guide and may not be portable to all systems that
conform to that standard.
A unique customized monetary format can be produced by changing the
value of a single statement. For example, the following table shows
the results of using all combinations of defined values for the
p_cs_precedes, p_sep_by_space, and p_sign_posn statements:
────────────────────────────────────────────────────────────────────
p_sep_by_space = 2 1 0
────────────────────────────────────────────────────────────────────
p_cs_precedes = 1 p_sign_posn = 0 ($1.25) ($ 1.25) ($1.25)
p_sign_posn = 1 + $1.25 +$ 1.25 +$1.25
p_sign_posn = 2 $1.25 + $ 1.25+ $1.25+
p_sign_posn = 3 + $1.25 +$ 1.25 +$1.25
p_sign_posn = 4 $ +1.25 $+ 1.25 $+1.25
p_cs_precedes = 0 p_sign_posn = 0 (1.25$) (1.25 $) (1.25$)
p_sign_posn = 1 +1.25 $ +1.25 $ +1.25$
p_sign_posn = 2 1.25$ + 1.25 $+ 1.25$+
p_sign_posn = 3 1.25+ $ 1.25 +$ 1.25+$
p_sign_posn = 4 1.25$ + 1.25 $+ 1.25$+
────────────────────────────────────────────────────────────────────
The following is an example of a possible LC_MONETARY category in a
locale definition source file:
LC_MONETARY # int_curr_symbol "<U><S><D>" currency_symbol
"<dollar-sign>" mon_decimal_point "<period>" mon_thousands_sep
"<comma>" mon_grouping <3> positive_sign "<plus-
sign>" negative_sign "<hyphen>" int_frac_digits <2>
frac_digits <2> p_cs_precedes <1> p_sep_by_space
<2> n_cs_precedes <1> n_sep_by_space <2> p_sign_posn
<3> n_sign_posn <3> debit_sign "<D><B>"
credit_sign "<C><R>" left_parenthesis "<left-paren‐
thesis>" right_parenthesis "<right-parenthesis>" # END LC_MONE‐
TARY
The LC_NUMERIC Category
The LC_NUMERIC category of a locale definition source file defines
rules and symbols for formatting nonmonetary numeric information. This
category begins with an LC_NUMERIC category header and terminates with
an END LC_NUMERIC category trailer.
All operands for the LC_NUMERIC category keywords are defined as string
or integer values. String values are bounded by " " (double quotes).
All values are separated from the keyword they define by one or more
blank characters (spaces or tabs). Two adjacent double quote characters
("") indicate an undefined string value. A -1 (negative one) indicates
an undefined integer value.
The following keywords are recognized in the LC_NUMERIC category: Spec‐
ifies the name of an existing locale to be used as the definition of
this category
If you include a copy statement, no other keyword will be speci‐
fied. Specifies the decimal delimiter string used to format
nonmonetary numeric quantities
This keyword cannot be omitted and cannot be set to the unde‐
fined string value. Specifies the string separator used for
grouping digits to the left of the decimal delimiter in format‐
ted nonmonetary numeric quantities Defines the size of each
group of digits in formatted monetary quantities
The operand for the grouping keyword consists of a sequence of
semicolon-separated integers. Each integer specifies the number
of digits in a group. The initial integer defines the size of
the group immediately to the left of the decimal delimiter. The
subsequent integers define succeeding groups to the left of the
previous group. Grouping is performed for each integer speci‐
fied for the grouping keyword. If the last integer is not -1,
the size of the last integer is repeatedly used to group any
remaining digits. If the last integer is -1, no more grouping
is performed.
The following is an example of the interpretation of the grouping
statement. Assuming the value to be formatted is 123456789 and the op‐
erand for the thousands_sep keyword is ' (single quote), the following
results occur: Formatted Value 123456'789 123'456'789 1234'56'789
12'34'56'789
The following is an example of a possible LC_NUMERIC category listed in
a locale definition source file:
LC_NUMERIC # decimal_point "<period>" thousands_sep "<comma>"
grouping <3> # END LC_NUMERIC
The LC_TIME Category
The LC_TIME category of a locale definition source file defines rules
and symbols for formatting time and date information. This category
begins with an LC_TIME category header and terminates with an END
LC_TIME category trailer.
All operands for the LC_TIME category keywords are defined as string or
integer values. String values are bounded by " " (double quotes). All
values are separated from the keyword they define by one or more blank
characters (spaces or tabs). Two adjacent double quote characters ()
indicate an undefined string value. Field descriptors are used by com‐
mands and subroutines that query the LC_TIME category to represent ele‐
ments of time and date formats. The field descriptors used by commands
and subroutines that query the LC_TIME category for time formatting are
described in this section, immediately following the descriptions of
valid keywords.
The following keywords are recognized in the LC_TIME category: Speci‐
fies the name of an existing locale to be used as the definition of
this category
If you include a copy statement, no other keyword will be speci‐
fied. Defines the abbreviated weekday names corresponding to
the %a field descriptor
Recognized values consist of 7 semicolon-separated strings. The
first string corresponds to the abbreviated name for the first
day of the week (for example, Sun), the second to the abbrevi‐
ated name for the second day of the week, and so on. Defines
the full spelling of the weekday names corresponding to the %A
field descriptor
Recognized values consist of 7 semicolon-separated strings. The
first string corresponds to the full spelling of the name of the
first day of the week (for example, Sunday), the second to the
name of the second day of the week, and so on. Defines the
abbreviated month names corresponding to the %b field descriptor
Recognized values consist of 12 semicolon-separated strings.
The first string corresponds to the abbreviated name for the
first month of the year (for example, Jan), the second to the
abbreviated name for the second month of the year, and so on.
Defines the full spelling of the month names corresponding to
the %B field descriptor
Recognized values consist of 12 semicolon-separated strings.
The first string corresponds to the full spelling of the name
for the first month of the year (for example, January), the sec‐
ond to the full spelling of the name for the second month of the
year, and so on. Defines the string used for the standard date-
and-time format corresponding to the %c field descriptor
The string can contain any combination of characters and field
descriptors. Defines the string used for the standard date for‐
mat corresponding to the %x field descriptor
The string can contain any combination of characters and field
descriptors. Defines the string used for the standard time for‐
mat corresponding to the %X field descriptor
The string can contain any combination of characters and field
descriptors. Defines the strings used to represent a.m. (before
noon) and p.m. (after noon) corresponding to the %p field
descriptor
Recognized values consist of two semicolon-separated strings.
The first string corresponds to the a.m. designation, the last
string to the p.m. designation. Defines the string used for the
standard 12-hour time format that includes an am_pm value (%p
field descriptor)
This statement corresponds to the %r field descriptor. The
string can contain any combination of characters and field
descriptors. If the string is empty, the 12-hour format is not
supported by the locale. Defines how the years are counted and
displayed for each era in a locale, corresponding to the %E
field descriptor modifier
For each era, there must be one string in the following format:
direction:offset:start_date:end_date:name:format
The variables for the era string format are defined as follows:
Specifies a - (minus) or + (plus) character
The - character indicates that years count in the negative
direction when moving from the start date to the end date. The +
character indicates that years count in the positive direction
when moving from the start date to the end date. Specifies a
number representing the first year of the era Specifies the
starting date of the era in yyyy/mm/dd format, where yyyy, mm,
and dd are the year, month, and day, respectively, on the Grego‐
rian calendar
Years prior to the year AD 1 are represented as negative num‐
bers. For example, an era beginning March 5th in the year 100
BC would be represented as -100/03/05. Specifies the ending
date of the era in the same form used for the start_date vari‐
able or one of the two special values -* or +*. A -* value indi‐
cates that the ending date of the era extends backward to the
beginning of time
A +* value indicates that the ending date of the era extends
forward to the end of time. Therefore, the ending date can be
chronologically before or after the starting date of the era.
For example, the strings for the Christian eras AD and BC would
be entered as follows:
+:0:0000/01/01:+*:AD:%o %N +:1:-0001/12/31:-*:BC:%o %N Specifies
a string representing the name of the era that is substituted
for the %N field descriptor Specifies a strftime() format string
to use when formatting the %EY field descriptor
This string can contain any strftime() format control characters
(except %EY) and locale-dependent multibyte characters.
An era value consists of one string (enclosed in quotes) for
each era. If more than one era is specified, each era string is
separated by a ; (semicolon). Defines the string used to repre‐
sent the year in alternate-era format corresponding to the %Ey
field descriptor
The string can contain any combination of characters and field
descriptors. Defines the string used to represent the date in
alternate-era format corresponding to the %Ex field descriptor
The string can contain any combination of characters and field
descriptors. Defines the locale's alternative time format, as
represented by the %EX field descriptor for strftime() Defines
the locale's alternative date-and-time format, as represented by
the %Ec field descriptor for strftime() Defines alternate
strings for digits corresponding to the %O field descriptor
Recognized values consist of a group of semicolon-separated
strings. The first string represents the alternate string for 0
(zero), the second string represents the alternate string for 1,
and so on. A maximum of 100 alternate strings can be specified.
Defines the string used to print out the month/date/time format
for some commands (ls, find, who, ar)
This format corresponds to the "%b %e %H:%M" format for the
POSIX locale. (Optional) This format is an extension to the
X/Open Portability Guide and may not be supported on all systems
that conform to that standard. Defines the string used to print
out the month/date/year format for some commands (ls, find, who,
ar)
This format corresponds to the "%b %e %Y" format for the POSIX
locale. (Optional) This format is an extension to the X/Open
Portability Guide and may not be supported on all systems that
conform to that standard.
The LC_TIME locale definition source file uses field descriptors to
represent elements of time and date formats. Combinations of these
field descriptors create other field descriptors or create time and
date format strings. When used in format strings that contain field
descriptors and other characters, field descriptors are replaced by
their current values. All other characters are copied without change.
The following field descriptors are used by commands and subroutines
that query the LC_TIME category for time formatting: Represents the
abbreviated weekday name (for example, Sun) defined by the abday state‐
ment Represents the full weekday name (for example, Sunday) defined by
the day statement Represents the abbreviated month name (for example,
Jan) defined by the abmon statement Represents the full month name (for
example, January) defined by the month statement Represents the date-
and-time format defined by the d_t_fmt statement Represents the century
as a decimal number (00 to 99) Represents the day of the month as a
decimal number (01 to 31) Represents the date in %m/%d/%y format (for
example, 01/31/91) Represents the day of the month as a decimal number
(1 to 31)
The %e field descriptor uses a 2-digit field. If the day of the
month is not a 2-digit number, the leading digit is filled with
a space character. Specifies the locale's alternate appropriate
date-and-time representation Specifies the name of the base year
(period) in the locale's alternate representation Specifies the
locale's alternate date representation Specifies the offset from
%EC (year only) in the locale's alternate representation Speci‐
fies the full alternate year representation Represents the
abbreviated month name (for example, Jan) defined by the abmon
statement
This field descriptor is a synonym for the %b field descriptor
Represents the 24-hour clock hour as a decimal number (00 to 23)
Represents the 12-hour clock hour as a decimal number (01 to 12)
Represents the day of the year as a decimal number (001 to 366)
Represents the month of the year as a decimal number (01 to 12)
Represents the minutes of the hour as a decimal number (00 to
59) Specifies a newline character Represents the alternate era
name Represents the alternate era year Specifies the day of the
month by using the locale's alternate numeric symbols Specifies
the day of the month by using the locale's alternate numeric
symbols Specifies the hour (24-hour clock) by using the locale's
alternate numeric symbols Specifies the hour (12-hour clock) by
using the locale's alternate numeric symbols Specifies the month
by using the locale's alternate numeric symbols Specifies the
minutes by using the locale's alternate numeric symbols Speci‐
fies the seconds by using the locale's alternate numeric symbols
Specifies the week number of the year (Sunday as the first day
of the week) by using the locale's alternate numeric symbols
Specifies the weekday as a number in the locale's alternate rep‐
resentation (Sunday = 0) Specifies the week number of the year
(Monday as the first day of the week) by using the locale's
alternate numeric symbols Specifies the year (offset from %C) in
alternate representation Represents the a.m. or p.m. string
defined by the am_pm statement Represents the 12-hour clock time
with a.m./p.m. notation as defined by the t_fmt_ampm statement
Represents the seconds of the minute as a decimal number (00 to
59) Specifies a tab character Represents 24-hour clock time in
the format %H:%M:%S (for example, 16:55:15) Represents the week
of the year as a decimal number (00 to 53)
Sunday, or its equivalent as defined by the day statement, is
the first day of the week for calculating the value of this
field descriptor. Represents the day of the week as a decimal
number (0 to 6)
Sunday, or its equivalent as defined by the day statement, is 0
(zero) for calculating the value of this field descriptor. Rep‐
resents the week of the year as a decimal number (00 to 53)
Monday, or its equivalent as defined by the day statement, is
the first day of the week for calculating the value of this
field descriptor. Represents the date format defined by the
d_fmt statement Represents the time format defined by the t_fmt
statement Represents the year of the century (00 to 99) Repre‐
sents the year as a decimal number (for example, 1989) Repre‐
sents the time zone name, if one can be determined (for example,
EST)
No characters are displayed if a time zone cannot be determined.
Specifies a % (percent sign) character
The following is an example of a possible LC_TIME category listed in a
locale definition source file:
LC_TIME # #Abbreviated weekday names (%a) abday
"<S><u><n>";"<M><o><n>";"<T><u><e>";"<W><e><d>";\
"<T><h><u>";"<F><r><i>";"<S><a><t>"
#Full weekday names (%A) day
"<S><u><n><d><a><y>";"<M><o><n><d><a><y>";\
"<T><u><e><s><d><a><y>";"<W><e><d><n><e><s><d><a><y>";\
<T><h><u><r><s><d><a><y>";"<F><r><i><d><a><y>";\
<S><a><t><u><r><d><a><y>"
#Abbreviated month names (%b) abmon
"<J><a><n>";"<F><e><b>";"<M><a><r>";"<A><p><r>";\
"<M><a><y>";"<J><u><n>";"<J><u><l>";"<A><u><g>";\
<S><e><p>";"<O><c><t>";"<N><o><v>";"<D><e><c>"
#Full month names (%B) mon
"<J><a><n><u><a><r><y>";"<F><e><b><r><u><a><r><y>";\
"<M><a><r><c><h>";"<A><p><r><i><l>";"<M><a><y>";\
<J><u><n><e>";"<J><u><l><y>";"<A><u><g><u><s><t>";\
"<S><e><p><t><e><m><b><e><r>";"<O><c><t><o><b><e><r>";\
<N><o><v><e><m><b><e><r>";"<D><e><c><e><m><b><e><r>"
#Date-and-time format (%c) #Note that for improved readability, this
section uses actual #characters, rather than symbolic names, and is
inconsistent with #the other sections in this example. This is bad
form. #In practice, symbolic names should be used. d_t_fmt
"%a %b %e %H:%M:%S %Y" # #Date format (%x) d_fmt "%m/%d/%y" #
#Time format (%X) t_fmt "%H:%M:%S" # #Equivalent of AM/PM
(%p) am_pm "<A><M>";"<P><M>" # #12-hour time format (%r)
#Note that for improved readability, this section uses actual #charac‐
ters, rather than symbolic names, and is inconsistent with #the other
sections in this example. This is bad form. #In practice, symbolic
names should be used. t_fmt_ampm "%I:%M:%S %p" # era
"+:0:0000/01/01:+*:AD:%o %N";\
"+:1:-0001/12/31:-*:BC:%o %N"
era_year
era_d_fmt
alt_digits "<0><t><h>";"<1><s><t>";"<2><n><d>";"<3><r><d>";\
"<4><t><h>";"<5><t><h>";"<6><t><h>";"<7><t><h>";\
"<8><t><h>";"<9><t><h>";"<1><0><t><h>" # END LC_TIME
FILES
Locale definition source files for supported locales. Character set
description (charmap) source files for supported locales. Locale
binary files.
By default, the setlocale() routine searches for locales in the
/usr/lib/nls/loc directory. The value of the LOCPATH variable,
if set, overrides this search path. Note that the LOCPATH vari‐
able is an extension to the XPG4 standard and may not be sup‐
ported on all systems that conform to that standard.
Two types of locales are installed on the system when you install
Worldwide Language Support; Unicode locales and dense code locales.
Unicode locales conform to Unicode and ISO/IEC 10646 standards and use
UTF-32 as the wide character encoding. Under UTF-32 wide character
encoding, wchar_t values represent the same characters regardless of
locale and, because Unicode standards prevail, implementation is con‐
sistent across platforms. See Unicode(5) for more information about
encoding formats.
Dense code locales use dense code for wide character encoding to mini‐
mize table size (that is, codepoints are assigned consecutively with no
empty positions). Under dense code locales, a wchar_t value for one
locale may not represent the same character in another locale and,
thus, is locale specific. Dense code locales are appropriate for appli‐
cations that have no dependencies on the internal process code or,
because dense code locales are slightly more efficient than Unicode
locales, require better performance.
The Unicode locales are installed in /usr/i18n/lib/nls/ucsloc/... Dense
locales are installed in /usr/i18n/lib/nls/loc/... A symbolic link,
/usr/i18n/lib/nls/dloc points to the default locales. For example, the
Japanese locale filename, /usr/lib/nls/loc/ja_JP.eucJP is a symbolic
link to /usr/i18n/lib/nls/dloc/ja_JP.eucJP, where /dloc is a symbolic
link to either /ucsloc for the Unicode version, or /loc for the dense
code version, of the Japanese locale. Keep in mind that the same locale
name can refer to a Unicode locale or a dense code locale, depending on
the setting of the symbolic link. Thus,if running a application in a
locale is problematic, check the symbolic link.
Because Unicode locales use consistent values for characters in wchar_t
form, a link to Unicode locales can increase consistency across locales
and platforms. However, some users may prefer the older, dense code
locales that use proprietary algorithms to convert characters to
wchar_t form. To switch between Unicode and dense code locales, the
system administrator, as root, uses i18nconfig to change the systemwide
default or manually changes the symbolic link (/usr/i18n/lib/nls/dloc)
from to
The /usr/lib/nls/loc/src and /usr/lib/nls/loc/charmap directories do
not exist when source files are not provided for installed locales.
See l10n_intro(5) for more information about Unicode locales and dense
code locales.
SEE ALSO
Commands: locale(1), localedef(1), Unicode(5)
Files: charmap(4)
Using International Software
locale(4)