localedef(4)localedef(4)NAMElocaledef - format and semantics of locale definition file
DESCRIPTION
This is a description of the syntax and meaning of the locale defini‐
tion that is provided as input to the command to create a locale (see
localedef(1M)).
The following is a list of category tags, keywords and subsequent
expressions which are recognized by The order of keywords within a cat‐
egory is irrelevant with the exception of the keyword and other excep‐
tions noted under the description. (Note that, as a convention, the
category tags are composed of uppercase characters, while the keywords
are composed of lowercase characters).
Category Tags and Keywords
The following keywords do not belong to any category and should appear
in the beginning of the locale definition file:
Single character indicating the character
to be interpreted as starting a comment line within the
locale definition file. This character should be in the
first column of a comment line. The default comment_char
is All lines with a comment_char in the first column are
ignored.
A single character indicating the character
to be interpreted as an escape character within the
script. The default escape_char is escape_char is used
to escape localedef metacharacters to remove special
meaning and in the character constant decimal, octal, and
hexadecimal formats. It is also used to continue a line
onto the next, if escape_char is the last character on
the line (before the new-line character).
The following keywords can be used in any category:
A string naming another valid locale available on the system.
This causes the category in the locale being created to
be a copy of the same category in the named locale.
Since the keyword defines the entire category, if used,
it must be the only keyword in the category.
The following six categories are recognized:
This category defines character classification, case conversion and
other
character attributes. The following predefined character clas‐
sifications are recognized:
Character codes classified as uppercase letters. Characters
specified
in the or classifications cannot be specified
in this category.
Character codes classified as lowercase letters. Same
restrictions
applicable to the category apply to this clas‐
sification.
Character codes classified as numeric. Only ten characters in
contiguous
ascending sequence by numerical value can be
specified. Alternative digits cannot be speci‐
fied here.
Character codes classified as white-space. No character spec‐
ified for
the or categories can be included in this
classification.
Character codes classified as punctuation characters. No
character
included in the or categories can be speci‐
fied.
Character codes classified as control characters. No charac‐
ter included in
the or can be included here.
Character codes classified as blank characters. The <space>
and
<tab> characters are automatically included.
Character codes classified as hexadecimal digits. Only the
characters
defined for the class can be specified, fol‐
lowed by one or more sets of six characters,
with each set in ascending order.
Character codes classified as letters. Characters classified
as
or cannot be specified. Characters specified
as and classes are automatically included in
this class.
Character codes classified as printable characters.
Characters specified for and classes and the
<space> character are automatically included.
No character from the category can be speci‐
fied.
Character codes classified as printable characters,
except the <space> character. In all other
respect this classification is similar to the
category.
The following two are special classifications, used to designate
valid first-of-two and second-of-two Note that these are byte
classifications and not character classifications; hence, they
cannot be used with the iswctype interface (see wctype(3C)), in
the same manner as the other classifications can be used.
Valid first bytes of two-byte characters.
Valid second bytes of two-byte characters.
Character case conversion definitions:
Lowercase to uppercase character relationships.
Uppercase to lowercase character relationships.
Miscellaneous character attribute and classifications:
String mapped into the ASCII
equivalent string
``b!"#$%&'()*+,-./:;<=>?@[\]^_`{}~'', where b
is a blank (a langinfo(5) item).
Defines one or more locale-specific character class names as
strings separated by semicolons. Each named
character class can then be defined subse‐
quently in the definition. The first character
of a character class name must be a letter and
the class name cannot match any of the prede‐
fined classifications (for example,
String operand indicates text direction (a
langinfo(5) item). String operand "1" indi‐
cates right-to-left text direction.
String operand indicates character context analysis. String
"1"
indicates Arabic context analysis is required.
The category provides collation sequence definition for relative
ordering between collating elements (single and multi-character
collating elements) in the locale. The following keywords
belong to this category and should come between the category tag
and The first two keywords can be in any order, but must come
before the keyword. Any number of the first two keywords can be
specified.
Defines a multi-character collating element,
symbol, composed of the characters in string.
String is limited to two characters.
Makes symbol a collating symbol which can be used to
define a place in the collating sequence.
Symbol does not represent any actual charac‐
ter.
Denotes the start of the collation sequence.
The directives have an effect on string colla‐
tion.
The lines following the keyword and before the
keyword contain collating element entries, one
per line.
Operands can optionally appear after the key‐
word to defined rules for string comparison
using a multiple-weight scheme (if no operands
are specified, a single operand is assumed).
The possible operands are:
Specifies that comparison operations proceed
from start of string towards
the end of it.
Specifies that comparison operations proceed
from end of string towards
the beginning of it.
Marks the end of the list of collating element entries.
The category defines the rules and symbols used to format monetary
numeric information. The following keywords belong to this cate‐
gory and should come between the category tag and
The operand is a four-character string used to designate the
international
currency symbol. The first three characters
should contain the alphabetic international
currency symbol in accordance with those spec‐
ified in the ISO 4217 standard. The fourth
character is the character used to separate
the international currency symbol from the
monetary quantity.
The operand is a string used as the local currency symbol.
The operand is a string containing the symbol used as the
decimal
delimiter (radix character).
The operand is a string containing the symbol used as a sepa‐
rator for
groups of digits to the left of decimal delim‐
iter.
The operand is a semicolon-separated list of integers.
The initial integer defines the size of the
group immediately preceding the decimal delim‐
iter, and the following integers define the
preceding groups. If the last integer is not
-1, then the size of the previous group (if
any) will be repeatedly used for the remainder
of the digits. If the last integer is -1,
then no further grouping will be performed.
The operand is a string to indicate a non-negative monetary
quantity.
The operand is a string to indicate a negative monetary quan‐
tity.
The operand is an integer representing the number of frac‐
tional digits
used in formatted monetary values using
The operand is an integer representing the number of frac‐
tional digits
used in formatted monetary values using
The operand is an integer which if set to 1 indicates the
precedes a monetary quantity, and if set to 0
the symbol succeeds the value.
The operand is an integer which indicates the separation of
the
the sign string, and the value for a non-nega‐
tive formatted monetary quantity.
The value of and are interpreted according to
the following:
No space separates the currency symbol and
value.
If the currency symbol and sign string are
adjacent, a space separates
them from the value; otherwise, a
space separates the currency symbol
from the value.
If the currency symbol and sign string are
adjacent, a space separates them;
otherwise, a space separates the sign
string from the value.
The operand is an integer which if set to 1 indicates the
precedes a negative monetary quantity, and if
set to 0 the symbol succeeds the negative
value.
The operand is an integer which indicates the separation of
the
the sign string, and the value for a negative
formatted monetary quantity.
The operand is an integer which indicates the positioning of
the
for a positive monetary quantity. The possi‐
ble values are:
Parenthesis surround the quantity and the
or
The sign string precedes the quantity and
the
or
The sign string succeeds the quantity and
the
or
The sign string precedes the
or
The sign string succeeds the
or
The operand is an integer set to a value indicating the posi‐
tioning of
the negative_sign for a negative formatted
monetary quantity.
The operand is an integer which if set to 1 indicates the
precedes a monetary quantity, and if set to 0
the symbol succeeds the value.
The operand is an integer which indicates the separation of
the
the sign string, and the value for a non-nega‐
tive internationally formatted monetary quan‐
tity.
The operand is an integer which if set to 1 indicates the
precedes a negative monetary quantity, and if
set to 0 the symbol succeeds the negative
value.
The operand is an integer which indicates the separation of
the
the sign string, and the value for a negative
internationally formatted monetary quantity.
The operand is an integer which indicates the positioning of
the
for a positive monetary quantity formatted
with the international format.
The operand is an integer which indicates the positioning of
the
for a negative monetary quantity formatted
with the international format.
The category defines rules and symbols used to format non-monetary
numeric information. The following keywords belong to this cat‐
egory and should come between the category tag and
The operand is a string containing the symbol used as the
decimal
delimiter (radix character) in numeric, non-
monetary formatted quantities. This keyword
cannot be omitted and cannot be set to the
empty string.
The operand is a string containing the symbol used as a sepa‐
rator
for groups of digits to the left of the deci‐
mal delimiter.
The operand is a semicolon-separated list of integers.
The initial integer defines the size of the
group immediately preceding the decimal delim‐
iter, and the following integers define the
preceding groups. If the last integer is not
-1, then the size of the previous group (if
any) will be repeatedly used for the remainder
of the digits. If the last integer is -1, then
no further grouping will be performed.
String mapped into the ASCII
equivalent string "", where b is a blank (a
langinfo(5) item). The keyword is an HP
extension to the POSIX standards and it has a
different meaning than the defined in POSIX
standards.
The category defines the rules for generating locale-specific for‐
matted date strings. The following mandatory keywords belong to
this category and should come between the category tag and
Seven semicolon-separated strings
giving abbreviated names for the days of the
week beginning with Sunday.
Seven semicolon-separated strings
giving full names for the days of the week
beginning with Sunday.
Twelve semicolon-separated strings giving abbreviated names
for the months,
beginning with January.
Twelve semicolon-separated strings giving full names for the
months,
beginning with January.
The operand is a string defining the appropriate date and
time
representation.
The operand is a string defining the appropriate date
representation.
The operand is a string defining the appropriate time
representation.
The operand is two semicolon-separated strings giving
the representations for and
The operand is a string defining the appropriate time repre‐
sentation
in the 12-hour clock format with
The operand is a semi-colon-separated list of strings. Each
string
defines the name and date of an era or emperor
for a locale. Each string should conform to
the following format:
direction:offset:start_date:end_date:name:format
where:
direction Either a or character. The
character indicates the time
axis should be such that the
years count in the positive
direction when moving from
the starting date towards the
ending date. The character
indicates the time axis
should be such that the years
count in the negative direc‐
tion when moving from the
starting date towards the
ending date.
offset A number in the range indi‐
cating the number of the
first year of the era.
start_date A date in the form where
yyyy, mm, and dd are the
year, month and day numbers,
respectively, of the start of
the era. Years prior to the
year 0 A.D. are represented
as negative numbers. For
example, an era beginning
March 5th in the year 100
B.C. would be represented as
Years in the range are sup‐
ported.
end_date The ending date of the era in
the same form as the
start_date above or one of
the two special values or A
value of indicates the ending
date of the era extends to
the beginning of time while
indicates it extends to the
end of time. The ending date
can be chronologically either
before or after the starting
date of an era. For example,
the expressions for the
Christian eras A.D. and B.C.
would be:
name A string representing the
name of the era which is sub‐
stituted for the directive of
and (see date(1) and strf‐
time(3C)).
format A string for formatting the
directive of and This string
is usually a function of the
and directives. If format is
not specified, the string
specified for the category
keyword (see below) is used
as a default.
The operand is a string defining the format of date in era
notation.
The operand is a string defining the format of time in era
notation.
The operand is a string defining the format of date and
time in era notation.
The operand is a semi-colon-separated list of strings. The
first
string is the alternative symbol corresponding
to zero, the second string is the alternative
symbol corresponding to one, and so on. Note
that if the HP-UX-proprietary keyword has been
specified in the same locale, the first ten
symbols should be identical for these two key‐
words.
In addition to the above, the following HP-UX-proprietary key‐
words are recognized (these are provided for backward compati‐
bility and their use is otherwise not recommended):
The category defines the format and values for affirmative and nega‐
tive responses. The following keywords belong to this category
and should come between the category tag and
The string operand is
an Extended Regular Expression matching
acceptable affirmative responses to yes/no
queries.
The string operand is
an Extended Regular Expression matching
acceptable negative responses to yes/no
queries.
The string operand identifies the affirmative response for
yes/no questions.
This keyword is now obsolete and should be
used instead.
The string operand identifies the negative response for
yes/no questions
This keyword is now obsolete and should be
used instead.
Keyword Operands
Keyword operands consist of character-code constants and symbols,
strings, and metacharacters. The types of legal expressions are: and
operands consist of single character-code constants or symbolic
names
separated by semicolons, or a character-code range
consisting of a constant or symbolic name followed by
an ellipsis followed by another constant or symbolic
name. The constant preceding the ellipsis must have a
smaller code value than the constant following the
ellipsis. A range represents a set of consecutive
character codes. If the list is longer than a single
line, the escape character must be used at the end of
each line as a continuation character. It is an error
to use any symbolic name that is not defined in an
accompanying charmap file (see charmap(4)).
operands consist of strings separated by semicolons. If longer
than one line, the escape character must be used for
continuation.
operands consist of a sequence of zero or more characters
surrounded by double quotes ("). Within a string, the
double-quote character must be preceded by an escape
character. The following escape sequences also can be
used:
newline
horizontal tab
backspace
carriage return
form feed
backslash
single quote
bit pattern
The escape consists of the escape character
followed by 1, 2, or 3 octal digits specifying
the value of the desired character (for other
possible bit pattern specification, see
below). Also, an escape character (\) and an
immediately-following newline are ignored.
Although the backslash (\) has been used for illustra‐
tion, another escape character can be substituted by
the keyword.
Constants represent character codes in the operands.
They can be used in the following forms:
decimal constants An escape character followed by
a followed by up to three deci‐
mal digits.
octal constants An escape character followed by
up to three octal digits.
hexadecimal constants An escape character followed by
a followed by two hexadecimal
digits.
Unicode constants An escape character followed by
a followed by four to eight
hexadecimal digits which speci‐
fies a Unicode scalar value in
a charmap file to be used with
the option of the command.
character constants A single character (for exam‐
ple, A) having the numerical
value of the character in the
machine's character set.
symbolic names A string enclosed between and
is a symbolic name. input
files are recommended to be
written entirely in symbolic
names, utilizing a user defined
or system-supplied charmap
file. This aids portability of
input files between different
encoded character sets (see
charmap(4)).
Symbolic names can be defined
within a locale definition file
by the and keywords. These are
not character constants. It is
an error if such an internally
defined symbolic name collides
with one defined in a charmap
file.
operands consists of one or more decimal digits separated by
semicolons.
operands follow keywords
and and must consist of two character-code constants
enclosed by left and right parentheses and separated
by a comma. Each such character pair is separated
from the next by a semicolon. For the first constant
represents an uppercase character and the second the
corresponding lowercase character. For the first con‐
stant represents an lowercase character and the second
the corresponding uppercase character.
The keyword is followed by collating element entries, one
per line, in ascending order by collating position.
The collating element entries have the form:
collation_element can be a character, a collating sym‐
bol enclosed in angle brackets representing a charac‐
ter or collating element, the special symbol or an
ellipsis
A character stands for itself; a collating symbol can
be a symbolic name for a character that is interpreted
by the charmap file, a multi-character collating ele‐
ment defined by a keyword, or a collating symbol
defined by the
The special symbol specifies the collating position of
any characters not explicitly defined by collating
element entries. For example, if some group of char‐
acters is to be omitted from the collation sequence
and just collate after all defined characters, a col‐
lating symbol might be defined before the keyword:
Then somewhere in the list of collating element
entries:
Notice that there is no second weight. This means
that on a second pass all characters collate by their
encoded value.
An ellipsis is interpreted as a list of characters
with an encoded value higher than that of the charac‐
ter on the preceding line and lower than that on the
following line. Because it is tied to encoded value
of characters, the ellipsis is inherently non-porta‐
ble. If it is used, a warning is issued and no output
generated unless the option was given.
The weight operands provide information about how the
collating element is to be collated on first and sub‐
sequent passes. Weight can be a two-character string,
the special symbol or a collating element of any of
the forms specified for collating_element except If
there are no weights, the character is collating
strictly by its position in the list. If there is
only one weight given, the character sorts by its rel‐
ative position in the list on the second collation
pass.
An equivalence class is defined by a series of collat‐
ing element entries all having the same character or
symbol in the first weight position. For example, in
many locales all forms of the character 'A' collate
equal on the first pass. This is represented in the
collating element entries as:
Two-to-one collating elements are specified by collat‐
ing-elements defined before the keyword. For example,
the two-to-one collating element in Spanish, would be
defined before the keyword as
It would then be used in a collating element entry as
A one-to-two collating element is defined by having a
two-character string in one of the weight positions.
For example, if the character collates equal to the
pair "AE", the collating element entry would be:
A don't-care character is defined by the special sym‐
bol For example, the dash character, may be a don't
care on the first collation pass. The collating ele‐
ment entry is:
Symbols defined by the keyword can be used to indicate
that a given character collates higher or lower than
some position in the sequence. For example if all
characters with an encoded value less than that of are
to collate lower than all other characters on the
first pass, and in relative order on the second pass,
define a collating symbol before the keyword:
The first two collating element entries are then:
This also illustrates the use of the ellipsis to indi‐
cate a range. The first ellipsis is interpreted as
"all characters in the encoded character set with a
value lower than '0'"; the second ellipsis means that
all characters in the range defined by the first col‐
late in relative order.
operands conform to
the Extended Regular Expressions specifications as
described in regexp(5).
Metacharacters
Metacharacters are characters having a special meaning to localedef in
operands. To escape the special meaning of these characters, surround
them with single quotes or precede them by an escape character.
localedef meta-characters include:
Indicates the beginning of a symbolic name.
Indicates the end of a symbolic name.
Indicates the beginning of a character shift pair following the
and keywords.
Indicates the end of a character shift pair.
Used to separate the characters of a character shift pair.
Used to quote strings.
Used as a separator in list operands.
escape character
Used to escape special meaning from other metacharacters
and itself. It is backslash (\) by default, but can be
redefined by the keyword.
Comments
Comments are lines beginning with a comment character. The comment
character is pound sign (#) by default, but can be redefined by the
keyword. Comments and blank lines are ignored.
Separators
Separator characters include blanks and tabs. Any number of separators
can be used to delimit the keywords, metacharacters, constants and
strings that comprise a localedef script except that all characters
between and are considered to be part of the symbolic name even they
are <blank>s.
EXAMPLES
Please see the files under for examples of locale description files.
These files were used to create the various locales which are delivered
with HP-UX.
localedef(4)