Regexp::Common man page on OpenServer

Man page or keyword search:  
man Server   5388 pages
apropos Keyword Search (all sections)
Output format
OpenServer logo
[printable version]

Regexp::Common(3)     User Contributed Perl Documentation    Regexp::Common(3)

NAME
       Regexp::Common - Provide commonly requested regular expressions

SYNOPSIS
	# STANDARD USAGE

	use Regexp::Common;

	while (<>) {
	    /$RE{num}{real}/		   and print q{a number};
	    /$RE{quoted}		   and print q{a ['"`] quoted string};
	    /$RE{delimited}{-delim=>'/'}/  and print q{a /.../ sequence};
	    /$RE{balanced}{-parens=>'()'}/ and print q{balanced parentheses};
	    /$RE{profanity}/		   and print q{a #*@%-ing word};
	}

	# SUBROUTINE-BASED INTERFACE

	use Regexp::Common 'RE_ALL';

	while (<>) {
	    $_ =~ RE_num_real()		     and print q{a number};
	    $_ =~ RE_quoted()		     and print q{a ['"`] quoted string};
	    $_ =~ RE_delimited(-delim=>'/')  and print q{a /.../ sequence};
	    $_ =~ RE_balanced(-parens=>'()'} and print q{balanced parentheses};
	    $_ =~ RE_profanity()	     and print q{a #*@%-ing word};
	}

	# IN-LINE MATCHING...

	if ( $RE{num}{int}->matches($text) ) {...}

	# ...AND SUBSTITUTION

	my $cropped = $RE{ws}{crop}->subs($uncropped);

	# ROLL-YOUR-OWN PATTERNS

	use Regexp::Common 'pattern';

	pattern name   => ['name', 'mine'],
		create => '(?i:J[.]?\s+A[.]?\s+Perl-Hacker)',
		;

	my $name_matcher = $RE{name}{mine};

	pattern name	=> [ 'lineof', '-char=_' ],
		create	=> sub {
			       my $flags = shift;
			       my $char = quotemeta $flags->{-char};
			       return '(?:^$char+$)';
			   },
		matches => sub {
			       my ($self, $str) = @_;
			       return $str !~ /[^$self->{flags}{-char}]/;
			   },
		subs   => sub {
			       my ($self, $str, $replacement) = @_;
			       $_[1] =~ s/^$self->{flags}{-char}+$//g;
			  },
		;

	my $asterisks = $RE{lineof}{-char=>'*'};

	# DECIDING WHICH PATTERNS TO LOAD.

	use Regexp::Common qw /comment number/;	 # Comment and number patterns.
	use Regexp::Common qw /no_defaults/;	 # Don't load any patterns.
	use Regexp::Common qw /!delimited/;	 # All, but delimited patterns.

DESCRIPTION
       By default, this module exports a single hash (%RE) that stores or gen-
       erates commonly needed regular expressions (see "List of available pat-
       terns").

       There is an alternative, subroutine-based syntax described in "Subrou-
       tine-based interface".

       General syntax for requesting patterns

       To access a particular pattern, %RE is treated as a hierarchical hash
       of hashes (of hashes...), with each successive key being an identifier.
       For example, to access the pattern that matches real numbers, you spec-
       ify:

	       $RE{num}{real}

       and to access the pattern that matches integers:

	       $RE{num}{int}

       Deeper layers of the hash are used to specify flags: arguments that
       modify the resulting pattern in some way. The keys used to access these
       layers are prefixed with a minus sign and may have a value; if a value
       is given, it's done by using a multidimensional key.  For example, to
       access the pattern that matches base-2 real numbers with embedded com-
       mas separating groups of three digits (e.g. 10,101,110.110101101):

	       $RE{num}{real}{-base => 2}{-sep => ','}{-group => 3}

       Through the magic of Perl, these flag layers may be specified in any
       order (and even interspersed through the identifier keys!)  so you
       could get the same pattern with:

	       $RE{num}{real}{-sep => ','}{-group => 3}{-base => 2}

       or:

	       $RE{num}{-base => 2}{real}{-group => 3}{-sep => ','}

       or even:

	       $RE{-base => 2}{-group => 3}{-sep => ','}{num}{real}

       etc.

       Note, however, that the relative order of amongst the identifier keys
       is significant. That is:

	       $RE{list}{set}

       would not be the same as:

	       $RE{set}{list}

       Flag syntax

       In versions prior to 2.113, flags could also be written as
       "{"-flag=value"}". This no longer works, although "{"-flag$;value"}"
       still does. However, "{-flag => 'value'}" is the preferred syntax.

       Universal flags

       Normally, flags are specific to a single pattern.  However, there is
       two flags that all patterns may specify.

       "-keep"
	   By default, the patterns provided by %RE contain no capturing
	   parentheses. However, if the "-keep" flag is specified (it requires
	   no value) then any significant substrings that the pattern matches
	   are captured. For example:

		   if ($str =~ $RE{num}{real}{-keep}) {
			   $number   = $1;
			   $whole    = $3;
			   $decimals = $5;
		   }

	   Special care is needed if a "kept" pattern is interpolated into a
	   larger regular expression, as the presence of other capturing
	   parentheses is likely to change the "number variables" into which
	   significant substrings are saved.

	   See also "Adding new regular expressions", which describes how to
	   create new patterns with "optional" capturing brackets that respond
	   to "-keep".

       "-i"
	   Some patterns or subpatterns only match lowercase or uppercase let-
	   ters.  If one wants the do case insensitive matching, one option is
	   to use the "/i" regexp modifier, or the special sequence "(?i)".
	   But if the functional interface is used, one does not have this
	   option. The "-i" switch solves this problem; by using it, the pat-
	   tern will do case insensitive matching.

       OO interface and inline matching/substitution

       The patterns returned from %RE are objects, so rather than writing:

	       if ($str =~ /$RE{some}{pattern}/ ) {...}

       you can write:

	       if ( $RE{some}{pattern}->matches($str) ) {...}

       For matching this would seem to have no great advantage apart from
       readability (but see below).

       For substitutions, it has other significant benefits. Frequently you
       want to perform a substitution on a string without changing the origi-
       nal. Most people use this:

	       $changed = $original;
	       $changed =~ s/$RE{some}{pattern}/$replacement/;

       The more adept use:

	       ($changed = $original) =~ s/$RE{some}{pattern}/$replacement/;

       Regexp::Common allows you do write this:

	       $changed = $RE{some}{pattern}->subs($original=>$replacement);

       Apart from reducing precedence-angst, this approach has the added
       advantages that the substitution behaviour can be optimized from the
       regular expression, and the replacement string can be provided by
       default (see "Adding new regular expressions").

       For example, in the implementation of this substitution:

	       $cropped = $RE{ws}{crop}->subs($uncropped);

       the default empty string is provided automatically, and the substitu-
       tion is optimized to use:

	       $uncropped =~ s/^\s+//;
	       $uncropped =~ s/\s+$//;

       rather than:

	       $uncropped =~ s/^\s+|\s+$//g;

       Subroutine-based interface

       The hash-based interface was chosen because it allows regexes to be
       effortlessly interpolated, and because it also allows them to be "cur-
       ried". For example:

	       my $num = $RE{num}{int};

	       my $commad     = $num->{-sep=>','}{-group=>3};
	       my $duodecimal = $num->{-base=>12};

       However, the use of tied hashes does make the access to Regexp::Common
       patterns slower than it might otherwise be. In contexts where impa-
       tience overrules laziness, Regexp::Common provides an additional sub-
       routine-based interface.

       For each (sub-)entry in the %RE hash ($RE{key1}{key2}{etc}), there is a
       corresponding exportable subroutine: "RE_key1_key2_etc()". The name of
       each subroutine is the underscore-separated concatenation of the non-
       flag keys that locate the same pattern in %RE. Flags are passed to the
       subroutine in its argument list. Thus:

	       use Regexp::Common qw( RE_ws_crop RE_num_real RE_profanity );

	       $str =~ RE_ws_crop() and die "Surrounded by whitespace";

	       $str =~ RE_num_real(-base=>8, -sep=>" ") or next;

	       $offensive = RE_profanity(-keep);
	       $str =~ s/$offensive/$bad{$1}++; "<expletive deleted>"/ge;

       Note that, unlike the hash-based interface (which returns objects),
       these subroutines return ordinary "qr"'d regular expressions. Hence
       they do not curry, nor do they provide the OO match and substitution
       inlining described in the previous section.

       It is also possible to export subroutines for all available patterns
       like so:

	       use Regexp::Common 'RE_ALL';

       Or you can export all subroutines with a common prefix of keys like so:

	       use Regexp::Common 'RE_num_ALL';

       which will export "RE_num_int" and "RE_num_real" (and if you have cre-
       ate more patterns who have first key num, those will be exported as
       well). In general, RE_key1_..._keyn_ALL will export all subroutines
       whose pattern names have first keys key1 ... keyn.

       Adding new regular expressions

       You can add your own regular expressions to the %RE hash at run-time,
       using the exportable "pattern" subroutine. It expects a hash-like list
       of key/value pairs that specify the behaviour of the pattern. The vari-
       ous possible argument pairs are:

	   "name => [ @list ]"
	       A required argument that specifies the name of the pattern, and
	       any flags it may take, via a reference to a list of strings.
	       For example:

			pattern name => [qw( line of -char )],
				# other args here
				;

	       This specifies an entry $RE{line}{of}, which may take a "-char"
	       flag.

	       Flags may also be specified with a default value, which is then
	       used whenever the flag is omitted, or specified without an
	       explicit value. For example:

			pattern name => [qw( line of -char=_ )],
				# default char is '_'
				# other args here
				;

	   "create => $sub_ref_or_string"
	       A required argument that specifies either a string that is to
	       be returned as the pattern:

		       pattern name    => [qw( line of underscores )],
			       create  => q/(?:^_+$)/
			       ;

	       or a reference to a subroutine that will be called to create
	       the pattern:

		       pattern name    => [qw( line of -char=_ )],
			       create  => sub {
					       my ($self, $flags) = @_;
					       my $char = quotemeta $flags->{-char};
					       return '(?:^$char+$)';
					   },
			       ;

	       If the subroutine version is used, the subroutine will be
	       called with three arguments: a reference to the pattern object
	       itself, a reference to a hash containing the flags and their
	       values, and a reference to an array containing the non-flag
	       keys.

	       Whatever the subroutine returns is stringified as the pattern.

	       No matter how the pattern is created, it is immediately post-
	       processed to include or exclude capturing parentheses (accord-
	       ing to the value of the "-keep" flag). To specify such
	       "optional" capturing parentheses within the regular expression
	       associated with "create", use the notation "(?k:...)". Any
	       parentheses of this type will be converted to "(...)"  when the
	       "-keep" flag is specified, or "(?:...)" when it is not.	It is
	       a Regexp::Common convention that the outermost capturing paren-
	       theses always capture the entire pattern, but this is not
	       enforced.

	   "matches => $sub_ref"
	       An optional argument that specifies a subroutine that is to be
	       called when the "$RE{...}->matches(...)" method of this pattern
	       is invoked.

	       The subroutine should expect two arguments: a reference to the
	       pattern object itself, and the string to be matched against.

	       It should return the same types of values as a "m/.../" does.

		    pattern name    => [qw( line of -char )],
			    create  => sub {...},
			    matches => sub {
					    my ($self, $str) = @_;
					    $str !~ /[^$self->{flags}{-char}]/;
				       },
			    ;

	   "subs => $sub_ref"
	       An optional argument that specifies a subroutine that is to be
	       called when the "$RE{...}->subs(...)" method of this pattern is
	       invoked.

	       The subroutine should expect three arguments: a reference to
	       the pattern object itself, the string to be changed, and the
	       value to be substituted into it.	 The third argument may be
	       "undef", indicating the default substitution is required.

	       The subroutine should return the same types of values as an
	       "s/.../.../" does.

	       For example:

		    pattern name    => [ 'lineof', '-char=_' ],
			    create  => sub {...},
			    subs    => sub {
					 my ($self, $str, $ignore_replacement) = @_;
					 $_[1] =~ s/^$self->{flags}{-char}+$//g;
				       },
			    ;

	       Note that such a subroutine will almost always need to modify
	       $_[1] directly.

	   "version => $minimum_perl_version"
	       If this argument is given, it specifies the minimum version of
	       perl required to use the new pattern. Attempts to use the pat-
	       tern with earlier versions of perl will generate a fatal diag-
	       nostic.

	   Loading specific sets of patterns.

	   By default, all the sets of patterns listed below are made avail-
	   able.  However, it is possible to indicate which sets of patterns
	   should be made available - the wanted sets should be given as argu-
	   ments to "use". Alternatively, it is also possible to indicate
	   which sets of patterns should not be made available - those sets
	   will be given as argument to the "use" statement, but are preceeded
	   with an exclaimation mark. The argument no_defaults indicates none
	   of the default patterns should be made available. This is useful
	   for instance if all you want is the "pattern()" subroutine.

	   Examples:

	    use Regexp::Common qw /comment number/;  # Comment and number patterns.
	    use Regexp::Common qw /no_defaults/;     # Don't load any patterns.
	    use Regexp::Common qw /!delimited/;	     # All, but delimited patterns.

	   It's also possible to load your own set of patterns. If you have a
	   module "Regexp::Common::my_patterns" that makes patterns available,
	   you can have it made available with

	    use Regexp::Common qw /my_patterns/;

	   Note that the default patterns will still be made available - only
	   if you use no_defaults, or mention one of the default sets
	   explicitely, the non mentioned defaults aren't made available.

	   List of available patterns

	   The patterns listed below are currently available. Each set of pat-
	   terns has its own manual page describing the details. For each pat-
	   tern set named name, the manual page Regexp::Common::name describes
	   the details.

	   Currently available are:

	   Regexp::Common::balanced
	       Provides regexes for strings with balanced parenthesized delim-
	       iters.

	   Regexp::Common::comment
	       Provides regexes for comments of various languages (43 lan-
	       guages currently).

	   Regexp::Common::delimited
	       Provides regexes for delimited strings.

	   Regexp::Common::lingua
	       Provides regexes for palindromes.

	   Regexp::Common::list
	       Provides regexes for lists.

	   Regexp::Common::net
	       Provides regexes for IPv4 addresses and MAC addresses.

	   Regexp::Common::number
	       Provides regexes for numbers (integers and reals).

	   Regexp::Common::profanity
	       Provides regexes for profanity.

	   Regexp::Common::whitespace
	       Provides regexes for leading and trailing whitespace.

	   Regexp::Common::zip
	       Provides regexes for zip codes.

	   Forthcoming patterns and features

	   Future releases of the module will also provide patterns for the
	   following:

		   * email addresses
		   * HTML/XML tags
		   * more numerical matchers,
		   * mail headers (including multiline ones),
		   * more URLS
		   * telephone numbers of various countries
		   * currency (universal 3 letter format, Latin-1, currency names)
		   * dates
		   * binary formats (e.g. UUencoded, MIMEd)

	   If you have other patterns or pattern generators that you think
	   would be generally useful, please send them to the maintainer --
	   preferably as source code using the "pattern" subroutine. Submis-
	   sions that include a set of tests will be especially welcome.

DIAGNOSTICS
       "Can't export unknown subroutine %s"
	   The subroutine-based interface didn't recognize the requested sub-
	   routine.  Often caused by a spelling mistake or an incompletely
	   specified name.

       "Can't create unknown regex: $RE{...}"
	   Regexp::Common doesn't have a generator for the requested pattern.
	   Often indicates a mispelt or missing parameter.

       "Perl %f does not support the pattern $RE{...}. You need Perl %f or
       later"
	   The requested pattern requires advanced regex features (e.g. recur-
	   sion) that not available in your version of Perl. Time to upgrade.

       "pattern() requires argument: name => [ @list ]"
	   Every user-defined pattern specification must have a name.

       "pattern() requires argument: create => $sub_ref_or_string"
	   Every user-defined pattern specification must provide a pattern
	   creation mechanism: either a pattern string or a reference to a
	   subroutine that returns the pattern string.

       "Base must be between 1 and 36"
	   The $RE{num}{real}{-base=>'N'} pattern uses the characters [0-9A-Z]
	   to represent the digits of various bases. Hence it only produces
	   regular expressions for bases up to hexatricensimal.

       "Must specify delimiter in $RE{delimited}"
	   The pattern has no default delimiter.  You need to write:
	   $RE{delimited}{-delim=>X'} for some character X

ACKNOWLEDGEMENTS
       Deepest thanks to the many people who have encouraged and contributed
       to this project, especially: Elijah, Jarkko, Tom, Nat, Ed, and Vivek.

HISTORY
	 $Log: Common.pm,v $
	 Revision 2.120	 2005/03/16 00:24:45  abigail
	 Load Carp only on demand

	 Revision 2.119	 2005/01/01 16:35:14  abigail
	 - Updated copyright notice. New release.

	 Revision 2.118	 2004/12/14 23:17:57  abigail
	 Fixed the generic OO routines.

	 Revision 2.117	 2004/06/30 15:01:35  abigail
	 Pod nits. (Jim Cromie)

	 Revision 2.116	 2004/06/30 09:37:36  abigail
	 New version

	 Revision 2.115	 2004/06/09 21:58:01  abigail
	 - 'SEN'
	 - New release.

	 Revision 2.114	 2003/05/25 21:34:56  abigail
	 POD nits from Bryan C. Warnock

	 Revision 2.113	 2003/04/02 21:23:48  abigail
	 Removed anything related to $; being '='

	 Revision 2.112	 2003/03/25 23:27:27  abigail
	 New release

	 Revision 2.111	 2003/03/12 22:37:13  abigail
	 +  The -i switch.
	 +  New release.

	 Revision 2.110	 2003/02/21 14:55:31  abigail
	 New release

	 Revision 2.109	 2003/02/10 21:36:58  abigail
	 New release

	 Revision 2.108	 2003/02/09 21:45:07  abigail
	 New release

	 Revision 2.107	 2003/02/07 15:23:03  abigail
	 New release

	 Revision 2.106	 2003/02/02 17:44:58  abigail
	 New release

	 Revision 2.105	 2003/02/02 03:20:32  abigail
	 New release

	 Revision 2.104	 2003/01/24 15:43:40  abigail
	 New release

	 Revision 2.103	 2003/01/23 02:19:01  abigail
	 New release

	 Revision 2.102	 2003/01/22 17:32:34  abigail
	 New release

	 Revision 2.101	 2003/01/21 23:52:18  abigail
	 POD fix.

	 Revision 2.100	 2003/01/21 23:19:40  abigail
	 The whole world understands RCS/CVS version numbers, that 1.9 is an
	 older version than 1.10. Except CPAN. Curse the idiot(s) who think
	 that version numbers are floats (in which universe do floats have
	 more than one decimal dot?).
	 Everything is bumped to version 2.100 because CPAN couldn't deal
	 with the fact one file had version 1.10.

	 Revision 1.30	2003/01/17 13:19:04  abigail
	 New release

	 Revision 1.29	2003/01/16 11:08:41  abigail
	 New release

	 Revision 1.28	2003/01/01 23:03:53  abigail
	 New distribution

	 Revision 1.27	2003/01/01 17:09:07  abigail
	 lingua class added

	 Revision 1.26	2002/12/30 23:08:28  abigail
	 New module Regexp::Common::zip

	 Revision 1.25	2002/12/27 23:34:44  abigail
	 New release

	 Revision 1.24	2002/12/24 00:00:04  abigail
	 New release

	 Revision 1.23	2002/11/06 13:50:23  abigail
	 Minor POD changes.

	 Revision 1.22	2002/10/01 18:25:46  abigail
	 POD buglets.

	 Revision 1.21	2002/09/18 17:46:11  abigail
	 POD Typo fix (Douglas Hunter)

	 Revision 1.20	2002/08/27 17:04:29  abigail
	 VERSION is now extracted from the CVS revision number.

	 Revision 1.19	2002/08/06 14:46:49  abigail
	 Upped version number to 0.09.

	 Revision 1.18	2002/08/06 13:50:08  abigail
	 - Added HISTORY section with CVS log.
	 - Upped version number to 0.08.

	 Revision 1.17	2002/08/05 12:21:46  abigail
	 Upped version number to 0.07.

	 Revision 1.16	2002/08/05 12:16:30  abigail
	 Fixed 'Regex::' typo to 'Regexp::' (Found my Mike Castle).

	 Revision 1.15	2002/08/04 22:56:02  abigail
	 Upped version number to 0.06.

	 Revision 1.14	2002/08/04 19:33:33  abigail
	 Loaded URI by default.

	 Revision 1.13	2002/08/01 10:02:42  abigail
	 Upped version number.

	 Revision 1.12	2002/07/31 23:26:06  abigail
	 Upped version number.

	 Revision 1.11	2002/07/31 13:11:20  abigail
	 Removed URL from the list of default loaded regexes, as this one isn't
	 ready yet.

	 Upped the version number to 0.03.

	 Revision 1.10	2002/07/29 13:16:38  abigail
	 Introduced 'use strict' (which uncovered a bug, \@non_flags was used
	 when $spec{create} was called instead of \@nonflags).

	 Turned warnings on (using local $^W = 1; "use warnings" isn't available
	 in pre 5.6).

	 Revision 1.9  2002/07/28 23:02:54  abigail
	 Split out the remaining pattern groups to separate files.

	 Fixed a bug in _decache, changed the regex /$fpat=(.+)/ to
	 /$fpat=(.*)/, to be able to distinguish the case of a flag
	 set to the empty string, or a flag without an argument.

	 Added 'undef' to @_ in the sub_interface setting to avoid a warning
	 of setting a hash with an odd number of arguments.

	 POD fixes.

	 Revision 1.8  2002/07/25 23:55:54  abigail
	 Moved balanced, net and URL to separate files.

	 Revision 1.7  2002/07/25 20:01:40  abigail
	 Modified import() to deal with factoring out groups of related regexes.
	 Factored out comments into Common/comment.

	 Revision 1.6  2002/07/23 21:20:43  abigail
	 Upped version number to 0.02.

	 Revision 1.5  2002/07/23 21:14:55  abigail
	 Added $RE{comment}{HTML}.

	 Revision 1.4  2002/07/23 17:01:09  abigail
	 Added lines about new maintainer, and an email address to submit bugs
	 and new regexes to.

	 Revision 1.3  2002/07/23 13:58:58  abigail
	 Changed various occurences of C<... => ...> into C<< ... => ... >>.

	 Revision 1.2  2002/07/23 12:27:07  abigail
	 Line 733 was missing the closing > of a C<> in the POD.

	 Revision 1.1  2002/07/23 12:22:51  abigail
	 Initial revision

AUTHOR
       Damian Conway (damian@conway.org)

MAINTAINANCE
       This package is maintained by Abigail (regexp-common@abigail.nl).

BUGS AND IRRITATIONS
       Bound to be plenty.

       For a start, there are many common regexes missing.  Send them in to
       regexp-common@abigail.nl.

COPYRIGHT
	  Copyright (c) 2001 - 2005, Damian Conway and Abigail. All Rights
	Reserved. This module is free software. It may be used, redistributed
	    and/or modified under the terms of the Perl Artistic License
		  (see http://www.perl.com/perl/misc/Artistic.html)

perl v5.8.8			  2003-03-23		     Regexp::Common(3)
[top]
                             _         _         _ 
                            | |       | |       | |     
                            | |       | |       | |     
                         __ | | __ __ | | __ __ | | __  
                         \ \| |/ / \ \| |/ / \ \| |/ /  
                          \ \ / /   \ \ / /   \ \ / /   
                           \   /     \   /     \   /    
                            \_/       \_/       \_/ 
More information is available in HTML format for server OpenServer

List of man pages available for OpenServer

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net