regexp(5)regexp(5)NAMEregexp - regular expression and pattern matching notation definitions
DESCRIPTION
A is a mechanism supported by many utilities for locating and manipu‐
lating patterns in text. is used by shells and other utilities for
file name expansion. This manual entry defines two forms of regular
expressions: and and the one form of
BASIC REGULAR EXPRESSIONS
Basic regular expression (RE) notation and construction rules apply to
utilities defined as using basic REs. Any exceptions to the following
rules are noted in the descriptions of the specific utilities that use
REs.
REs Matching a Single Character
The following REs match a single character or a single collating ele‐
ment: An ordinary character is an RE that matches itself. An ordinary
character is any character in the supported character set except new‐
line and the regular expression special characters listed in Special
Characters below. An ordinary character preceded by a backslash is
treated as the ordinary character itself, except when the character is
or or the digits through (see REs Matching Multiple Characters).
Matching is based on the bit pattern used for encoding the character;
not on the graphic representation of the character. A regular expres‐
sion special character preceded by a backslash is a regular expression
that matches the special character itself. When not preceded by a
backslash, such characters have special meaning in the specification of
REs. Regular expression special characters and the contexts in which
they have special meaning are:
The period, left square bracket, and backslash are special
except when used in a bracket expression (see RE
Bracket Expression).
The asterisk is special except when used in a bracket expres‐
sion,
as the first character of a regular expression,
or as the first character following the character
pair (see REs Matching Multiple Characters).
The circumflex is special when used as the first character
of an entire RE (see Expression Anchoring) or as
the first character of a bracket expression.
The dollar sign is special when used as the last character of an
entire RE
(see Expression Anchoring).
delimiter Any character used to bound (i.e., delimit) an
entire RE is special for that RE.
A period when used outside of a bracket expression, is an RE that
matches any printable or nonprintable character except newline.
RE Bracket Expression
A bracket expression enclosed in square brackets is an RE that matches
a single collating element contained in the nonempty set of collating
elements represented by the bracket expression.
The following rules apply to bracket expressions:
A bracket expression is either a
or a and consists of one or more expressions in any
order. Expressions can be: collating elements,
collating symbols, noncollating characters, equiva‐
lence classes, range expressions, or character
classes. The right bracket loses its special mean‐
ing and represents itself in a bracket expression
if it occurs first in the list (after an initial if
any). Otherwise, it terminates the bracket expres‐
sion (unless it is the ending right bracket for a
valid collating symbol, equivalence class, or char‐
acter class, or it is the collating element within
a collating symbol or equivalence class expres‐
sion). The special characters
(period, asterisk, left bracket, and backslash)
lose their special meaning within a bracket expres‐
sion.
The character sequences:
(left-bracket followed by a period, equal-sign or
colon) are special inside a bracket expression and
are used to delimit collating symbols, equivalence
class expressions and character class expressions.
These symbols must be followed by a valid expres‐
sion and the matching terminating or
A matching list expression specifies a list that matches any one
of the
characters represented in the list. The first
character in the list cannot be the circumflex.
For example, is an RE that matches any of or
A expression begins with a circumflex and specifies a
list that matches any character or collating ele‐
ment except newline and the characters represented
in the list. For example, is an RE that matches
any character except newline or or The circumflex
has this special meaning when it occurs first in
the list, immediately following the left square
bracket.
A is a sequence of one or more characters that repre‐
sents a single element in the collating sequence as
identified via the most current setting of the
locale variable (see setlocale(3C)).
A is a collating element enclosed within bracket-
period delimiters. Multicharacter collating ele‐
ments must be represented as collating symbols to
distinguish them from single-character collating
elements. For example, if the string is a valid
collating element, then is treated as an element
matching the same string of characters, while is
treated as a simple list of the characters and If
the string within the bracket-period delimiters is
not a valid collating element in the current col‐
lating sequence definition, the symbol is treated
as an invalid expression.
A is a character that is ignored for collating pur‐
poses. By definition, such characters cannot par‐
ticipate in equivalence classes or range expres‐
sions.
An expression represents the set of collating elements
belonging to an equivalence class. It is expressed
by enclosing any one of the collating elements in
the equivalence class within bracket-equal delim‐
iters. For example, if and belong to the same
equivalence class, then and are each equivalent to
A represents the set of collating elements that fall
between two elements in the current collation
sequence as defined via the most current setting of
the locale variable (see setlocale(3C)). It is
expressed as the starting point and the ending
point separated by a hyphen
The starting range point and the ending range point
must be a collating element, collating symbol, or
equivalence class expression. An equivalence class
expression used as an end point of a range expres‐
sion is interpreted such that all collating ele‐
ments within the equivalence class are included in
the range. For example, if the collating order is
and and the characters and belong to the same
equivalence class, then the expression is treated
as
Both starting and ending range points must be valid
collating elements, collating symbols, or equiva‐
lence class expressions, and the ending range point
must collate equal to or higher than the starting
range point; otherwise the expression is invalid.
For example, with the above collating order and
assuming that is a noncollating character, then
both the expressions and are invalid.
An ending range point can also be the starting
range point in a subsequent range expression. Each
such range expression is evaluated separately. For
example, the bracket expression is treated as
The hyphen character is treated as itself if it
occurs first (after an initial if any) or last in
the list, or as the rightmost symbol in a range
expression. As examples, the expressions and are
equivalent and match any of the characters or the
expressions and are equivalent and match any char‐
acters except newline, or the expression matches
any of the characters in the defined collating
sequence between and inclusive; the expression
matches any of the characters in the defined col‐
lating sequence between and inclusive; and the
expression is invalid, assuming precedes in the
collating sequence.
If a bracket expression must specify both and the
must be placed first (after the if any) and the
last within the bracket expression.
A character class expression represents the set of characters
belonging
to a character class, as defined via the most cur‐
rent setting of the locale variable It is expressed
as a character class name enclosed within bracket-
colon delimiters.
Standard character class expressions supported in
all locales are:
letters
upper-case letters
lower-case letters
decimal digits
hexadecimal digits
letters or decimal digits
characters producing white-space in displayed
text
printing characters
punctuation characters
characters with a visible representation
control characters
blank characters
For example, if the locale variable is set to the
expression is equivalent to Similarly the expres‐
sion is same as
REs Matching Multiple Characters
The following rules may be used to construct REs matching multiple
characters from REs matching a single character:
RERE The concatenation of REs is an RE that matches the
first encountered concatenation of the strings
matched by each component of the RE. For example,
the RE matches the second and third characters of
the string
An RE matching a single character followed by an asterisk
is an RE that matches zero or more occurrences of
the RE preceding the asterisk. The first encoun‐
tered string that permits a match is chosen, and
the matched string will encompass the maximum num‐
ber of characters permitted by the RE. For exam‐
ple, in the string both the RE and the RE are
matched by the substring in the second through
fifth positions. An asterisk as the first charac‐
ter of an RE loses this special meaning and is
treated as itself.
A subexpression can be defined within an RE
by enclosing it between the character pairs and
Such a subexpression matches whatever it would have
matched without the and Subexpressions can be arbi‐
trarily nested. An asterisk immediately following
the loses its special meaning and is treated as
itself. An asterisk immediately following the is
treated as an invalid character.
The expression matches the same string of characters as was
matched by a subexpression enclosed between and
preceding the The character n must be a digit from
through specifying the n-th subexpression (the one
that begins with the n-th and ends with the corre‐
sponding paired For example, the expression matches
a line consisting of two adjacent appearances of
the same string.
If the is followed by an asterisk, it matches zero
or more occurrences of the subexpression referred
to. For example, the expression matches the string
An RE matching a single character followed by
or is an RE that matches repeated occurrences of
the RE. The values of m and n must be decimal
integers in the range 0 through 255, with m speci‐
fying the exact or minimum number of occurrences
and n specifying the maximum number of occurrences.
matches exactly m occurrences of the preceding RE,
matches at least m occurrences, and matches any
number of occurrences between m and n, inclusive.
The first encountered string that matches the
expression is chosen; it will contain as many
occurrences of the RE as possible. For example, in
the string the RE is matched by characters two
through four, the RE is matched by characters two
through eight, and the RE is matched by characters
four through nine.
Expression Anchoring
An RE can be limited to matching strings that begin or end a line
(i.e., anchored) according to the following rules:
· A circumflex as the first character of an RE anchors the
expression to the beginning of a line; only strings starting at
the first character of a line are matched by the RE. For exam‐
ple, the RE matches the string in the line but not the same
string in the line
· A dollar sign as the last character of an RE anchors the
expression to the end of a line; only strings ending at the
last character of a line are matched by the RE. For example,
the RE matches the string in the line but not the same string
in the line
· An RE anchored by both and matches only strings that are lines.
For example, the RE matches only lines consisting of the string
The use of duplication characters (+,*) following anchors is illegal.
EXTENDED REGULAR EXPRESSIONS
The extended regular expression (ERE) notation and construction rules
apply to utilities defined as using extended REs. Any exceptions to
the following rules are noted in the descriptions of the specific util‐
ities using EREs.
EREs Matching a Single Character
The following EREs match a single character or a single collating ele‐
ment: An ordinary character is an ERE that matches itself. An ordinary
character is any character in the supported character set except new‐
line and the regular expression special characters listed in Special
Characters below. An ordinary character preceded by a backslash is
treated as the ordinary character itself. Matching is based on the bit
pattern used for encoding the character, not on the graphic representa‐
tion of the character. A regular expression special character preceded
by a backslash is a regular expression that matches the special charac‐
ter itself. When not preceded by a backslash, such characters have
special meaning in the specification of EREs. The extended regular
expression special characters and the contexts in which they have their
special meaning are:
The period, left square bracket, backslash, left parenthesis,
right parenthesis, asterisk, plus sign, question
mark, dollar sign, and vertical bar are special
except when used in a bracket expression (see ERE
Bracket Expression).
The circumflex is special except when used
in a bracket expression in a non-leading posi‐
tion.
delimiter Any character used to bound (i.e., delimit) an
entire ERE is special for that ERE.
A period when used outside of a bracket expression, is an ERE that
matches any printable or nonprintable character except newline.
ERE Bracket Expression
The syntax and rules for ERE bracket expressions are the same as for RE
bracket expressions found above.
EREs Matching Multiple Characters
The following rules may be used to construct EREs matching multiple
characters from EREs matching a single character:
EREERE A concatenation of EREs matches the first encoun‐
tered concatenation of the strings matched by each
component of the ERE. Such a concatenation of EREs
enclosed in parentheses matches whatever the con‐
catenation without the parentheses matches. For
example, both the ERE and the ERE matches the sec‐
ond and third characters of the string The longest
overall string is matched.
The special character plus
when following an ERE matching a single character,
or a concatenation of EREs enclosed in parenthesis,
is an ERE that matches one or more occurrences of
the ERE preceding the plus sign. The string
matched will contain as many occurrences as possi‐
ble. For example, the ERE matches the fourth
through seventh characters in the string
The special character asterisk
when following an ERE matching a single character,
or a concatenation of EREs enclosed in parenthesis,
is an ERE that matches zero or more occurrences of
the ERE preceding the asterisk. For example, the
ERE matches the first character in the string If
there is any choice, the longest left-most string
that permits a match is chosen. For example, the
ERE matches the third through seventh characters in
the string
The special character question mark
when following an ERE matching a single character,
or a concatenation of EREs enclosed in parenthesis,
is an ERE that matches zero or one occurrences of
the ERE preceding the question mark. The string
matched will contain as many occurrences as possi‐
ble. For example, the ERE matches the second char‐
acter in the string
interval expression that functions the same way
as basic regular expression syntax,
Alternation
Two EREs separated by the special character vertical bar matches a
string that is matched by either ERE. For example, the ERE matches the
string and the string A vertical bar '|' may not appear as follows:
may not appear first or last in an ERE.
may not appear immediately following a vertical bar.
may not appear after a left parenthesis.
may not appear immediately preceding a right parenthesis.
Precedence
The order of precedence is as follows, from high to low:
square brackets
asterisk, plus sign, question mark
anchoring
concatenation
alternation
For example, the ERE is interpreted as "match either or It does not
mean "match followed by or followed in turn by (because concatenation
has a higher order of precedence than alternation).
Expression Anchoring
An ERE can be limited to matching strings that begin or end a line
(i.e., anchored) according to the following rules:
· A circumflex matches the beginning of a line (anchors the
expression to the beginning of a line). For example, the ERE
matches the string in the line but not the same string in the
line
· A dollar sign matches the end of a line (anchors the expression
to the end of a line). For example, the ERE matches the string
in the line but not the same string in the line
· An ERE anchored by both and matches only strings that are
lines. For example, the ERE matches only lines consisting of
the string Only empty lines match the ERE
The use of duplication characters (+,*) following anchors is illegal.
PATTERN MATCHING NOTATION
The following rules apply to pattern matching notation except as noted
in the descriptions of the specific utilities using pattern matching.
Patterns Matching a Single Character
The following patterns match a single character or a single collating
element: An ordinary character is a pattern that matches itself. An
ordinary character is any character in the supported character set
except newline and the pattern matching special characters listed in
Special Characters below. Matching is based on the bit pattern used
for encoding the character, not on the graphic representation of the
character. A pattern matching special character preceded by a back‐
slash is a pattern that matches the special character itself. When not
preceded by a backslash, such characters have special meaning in the
specification of patterns. The pattern matching special characters and
the contexts in which they have their special meaning are:
The question mark, asterisk, and left square bracket are special
except when
used in a bracket expression (see Pattern Bracket
Expression).
A question mark when used outside of a bracket expression, is a pattern
that matches any printable or nonprintable character except newline.
Pattern Bracket Expression
The syntax and rules for pattern bracket expressions are the same as
for RE bracket expressions found above with the following exceptions:
The exclamation point character replaces the circumflex charac‐
ter in its role in a non-matching list in the regular expression
notation.
The backslash is used as an escape character within bracket
expressions.
Patterns Matching Multiple Characters
The following rules may be used to construct patterns matching multiple
characters from patterns matching a single character:
The asterisk is a pattern that matches any string, including
the null string.
RERE The concatenation of patterns matching a single
character is a valid pattern that matches the
concatenation of the single characters or collat‐
ing elements matched by each of the concatenated
patterns. For example, the pattern matches the
string and
The concatenation of one or more patterns match‐
ing a single character with one or more asterisks
is a valid pattern. In such patterns, each
asterisk matches a string of zero or more charac‐
ters, up to the first character that matches the
character following the asterisk in the pattern.
For example, the pattern matches the strings and
but not the string When an asterisk is the first
or last character in a pattern, it matches zero
or more characters that precede or follow the
characters matched by the remainder of the pat‐
tern. For example, the pattern matches the
strings and the pattern matches the strings and
Rule Qualification for Patterns Used for Filename Expansion
The rules described above for pattern matching are qualified by the
following rules when the pattern matching notation is used for filename
expansion by sh(1), csh(1), ksh(1), and make(1).
If a filename (including the component of a pathname that fol‐
lows the slash character) begins with a period the period must
be explicitly matched by using a period as the first character
of the pattern; it cannot be matched by either the asterisk spe‐
cial character, the question mark special character, or a
bracket expression. This rule does not apply to make(1).
The slash character in a pathname must be explicitly matched by
using a slash in the pattern; it cannot be matched by either the
asterisk special character, the question mark special character,
or a bracket expression. For make(1) only the part of the path‐
name following the last slash character can be matched by a spe‐
cial character. That is, all special characters preceding the
last slash character lose their special meaning.
Specified patterns are matched against existing filenames and
pathnames, as appropriate. If the pattern matches any existing
filenames or pathnames, the pattern is replaced with those file‐
names and pathnames, sorted according to the collating sequence
in effect. If the pattern does not match any existing filenames
or pathnames, the pattern string is left unchanged.
If the pattern begins with a tilde character, all of the ordi‐
nary characters preceding the first slash (or all characters if
there is no slash) are treated as a possible login name. If the
login name is null (i.e., the pattern contains only the tilde or
the tilde is immediately followed by a slash), the tilde is
replaced by a pathname of the process's home directory, followed
by a slash. Otherwise, the combination of tilde and login name
are replaced by a pathname of the home directory associated with
the login name, followed by a slash. If the system cannot iden‐
tify the login name, the result is implementation-defined. This
rule does not apply to sh(1) or make(1).
If the pattern contains a character, variable substitution can
take place. Environmental variables can be embedded within pat‐
terns as:
or:
Braces are used to guarantee that characters following name are
not interpreted as belonging to name. Substitution occurs in
the order specified only once; that is, the resulting string is
not examined again for new names that occurred because of the
substitution.
Rule Qualification for Patterns Used in the case Command
The rules described above for pattern matching are qualified by the
following rule when the pattern matching notation is used in the case
command of sh(1) and ksh(1).
Multiple alternative patterns in a single clause can be speci‐
fied by separating individual patterns with the vertical bar
character strings matching any of the patterns separated this
way will cause the corresponding command list to be selected.
SEE ALSOksh(1), sh(1), fnmatch(3C), glob(3C), regcomp(3C), setlocale(3C), envi‐
ron(5).
STANDARDS CONFORMANCEregexp(5)