Parse::RecDescent man page on Peanut

Man page or keyword search:  
man Server   7435 pages
apropos Keyword Search (all sections)
Output format
Peanut logo
[printable version]

Parse::RecDescent(3)  User Contributed Perl Documentation Parse::RecDescent(3)

NAME
       Parse::RecDescent - Generate Recursive-Descent Parsers

VERSION
       This document describes version 1.94 of Parse::RecDescent, released
       April  9, 2003.

SYNOPSIS
	use Parse::RecDescent;

	# Generate a parser from the specification in $grammar:

		$parser = new Parse::RecDescent ($grammar);

	# Generate a parser from the specification in $othergrammar

		$anotherparser = new Parse::RecDescent ($othergrammar);

	# Parse $text using rule 'startrule' (which must be
	# defined in $grammar):

	       $parser->startrule($text);

	# Parse $text using rule 'otherrule' (which must also
	# be defined in $grammar):

		$parser->otherrule($text);

	# Change the universal token prefix pattern
	# (the default is: '\s*'):

	       $Parse::RecDescent::skip = '[ \t]+';

	# Replace productions of existing rules (or create new ones)
	# with the productions defined in $newgrammar:

	       $parser->Replace($newgrammar);

	# Extend existing rules (or create new ones)
	# by adding extra productions defined in $moregrammar:

	       $parser->Extend($moregrammar);

	# Global flags (useful as command line arguments under -s):

	       $::RD_ERRORS	  # unless undefined, report fatal errors
	       $::RD_WARN	  # unless undefined, also report non-fatal problems
	       $::RD_HINT	  # if defined, also suggestion remedies
	       $::RD_TRACE	  # if defined, also trace parsers' behaviour
	       $::RD_AUTOSTUB	  # if defined, generates "stubs" for undefined rules
	       $::RD_AUTOACTION	  # if defined, appends specified action to productions

DESCRIPTION
       Overview

       Parse::RecDescent incrementally generates top-down recursive-descent
       text parsers from simple yacc-like grammar specifications. It provides:

       ·   Regular expressions or literal strings as terminals (tokens),

       ·   Multiple (non-contiguous) productions for any rule,

       ·   Repeated and optional subrules within productions,

       ·   Full access to Perl within actions specified as part of the
	   grammar,

       ·   Simple automated error reporting during parser generation and
	   parsing,

       ·   The ability to commit to, uncommit to, or reject particular
	   productions during a parse,

       ·   The ability to pass data up and down the parse tree ("down" via
	   subrule argument lists, "up" via subrule return values)

       ·   Incremental extension of the parsing grammar (even during a parse),

       ·   Precompilation of parser objects,

       ·   User-definable reduce-reduce conflict resolution via "scoring" of
	   matching productions.

       Using "Parse::RecDescent"

       Parser objects are created by calling "Parse::RecDescent::new", passing
       in a grammar specification (see the following subsections). If the
       grammar is correct, "new" returns a blessed reference which can then be
       used to initiate parsing through any rule specified in the original
       grammar. A typical sequence looks like this:

	       $grammar = q {
			       # GRAMMAR SPECIFICATION HERE
			    };

	       $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n";

	       # acquire $text

	       defined $parser->startrule($text) or print "Bad text!\n";

       The rule through which parsing is initiated must be explicitly defined
       in the grammar (i.e. for the above example, the grammar must include a
       rule of the form: "startrule: <subrules>".

       If the starting rule succeeds, its value (see below) is returned.
       Failure to generate the original parser or failure to match a text is
       indicated by returning "undef". Note that it's easy to set up grammars
       that can succeed, but which return a value of 0, "0", or "".  So don't
       be tempted to write:

	       $parser->startrule($text) or print "Bad text!\n";

       Normally, the parser has no effect on the original text. So in the
       previous example the value of $text would be unchanged after having
       been parsed.

       If, however, the text to be matched is passed by reference:

	       $parser->startrule(\$text)

       then any text which was consumed during the match will be removed from
       the start of $text.

       Rules

       In the grammar from which the parser is built, rules are specified by
       giving an identifier (which must satisfy /[A-Za-z]\w*/), followed by a
       colon on the same line, followed by one or more productions, separated
       by single vertical bars. The layout of the productions is entirely
       free-format:

	       rule1:  production1
		    |  production2 |
		       production3 | production4

       At any point in the grammar previously defined rules may be extended
       with additional productions. This is achieved by redeclaring the rule
       with the new productions. Thus:

	       rule1: a | b | c
	       rule2: d | e | f
	       rule1: g | h

       is exactly equivalent to:

	       rule1: a | b | c | g | h
	       rule2: d | e | f

       Each production in a rule consists of zero or more items, each of which
       may be either: the name of another rule to be matched (a "subrule"), a
       pattern or string literal to be matched directly (a "token"), a block
       of Perl code to be executed (an "action"), a special instruction to the
       parser (a "directive"), or a standard Perl comment (which is ignored).

       A rule matches a text if one of its productions matches. A production
       matches if each of its items match consecutive substrings of the text.
       The productions of a rule being matched are tried in the same order
       that they appear in the original grammar, and the first matching
       production terminates the match attempt (successfully). If all
       productions are tried and none matches, the match attempt fails.

       Note that this behaviour is quite different from the "prefer the longer
       match" behaviour of yacc. For example, if yacc were parsing the rule:

	       seq : 'A' 'B'
		   | 'A' 'B' 'C'

       upon matching "AB" it would look ahead to see if a 'C' is next and, if
       so, will match the second production in preference to the first. In
       other words, yacc effectively tries all the productions of a rule
       breadth-first in parallel, and selects the "best" match, where "best"
       means longest (note that this is a gross simplification of the true
       behaviour of yacc but it will do for our purposes).

       In contrast, "Parse::RecDescent" tries each production depth-first in
       sequence, and selects the "best" match, where "best" means first. This
       is the fundamental difference between "bottom-up" and "recursive
       descent" parsing.

       Each successfully matched item in a production is assigned a value,
       which can be accessed in subsequent actions within the same production
       (or, in some cases, as the return value of a successful subrule call).
       Unsuccessful items don't have an associated value, since the failure of
       an item causes the entire surrounding production to immediately fail.
       The following sections describe the various types of items and their
       success values.

       Subrules

       A subrule which appears in a production is an instruction to the parser
       to attempt to match the named rule at that point in the text being
       parsed. If the named subrule is not defined when requested the
       production containing it immediately fails (unless it was "autostubbed"
       - see Autostubbing).

       A rule may (recursively) call itself as a subrule, but not as the left-
       most item in any of its productions (since such recursions are usually
       non-terminating).

       The value associated with a subrule is the value associated with its
       $return variable (see "Actions" below), or with the last successfully
       matched item in the subrule match.

       Subrules may also be specified with a trailing repetition specifier,
       indicating that they are to be (greedily) matched the specified number
       of times. The available specifiers are:

		       subrule(?)      # Match one-or-zero times
		       subrule(s)      # Match one-or-more times
		       subrule(s?)     # Match zero-or-more times
		       subrule(N)      # Match exactly N times for integer N > 0
		       subrule(N..M)   # Match between N and M times
		       subrule(..M)    # Match between 1 and M times
		       subrule(N..)    # Match at least N times

       Repeated subrules keep matching until either the subrule fails to
       match, or it has matched the minimal number of times but fails to
       consume any of the parsed text (this second condition prevents the
       subrule matching forever in some cases).

       Since a repeated subrule may match many instances of the subrule
       itself, the value associated with it is not a simple scalar, but rather
       a reference to a list of scalars, each of which is the value associated
       with one of the individual subrule matches. In other words in the rule:

		       program: statement(s)

       the value associated with the repeated subrule "statement(s)" is a
       reference to an array containing the values matched by each call to the
       individual subrule "statement".

       Repetition modifieres may include a separator pattern:

		       program: statement(s /;/)

       specifying some sequence of characters to be skipped between each
       repetition.  This is really just a shorthand for the <leftop:...>
       directive (see below).

       Tokens

       If a quote-delimited string or a Perl regex appears in a production,
       the parser attempts to match that string or pattern at that point in
       the text. For example:

		       typedef: "typedef" typename identifier ';'

		       identifier: /[A-Za-z_][A-Za-z0-9_]*/

       As in regular Perl, a single quoted string is uninterpolated, whilst a
       double-quoted string or a pattern is interpolated (at the time of
       matching, not when the parser is constructed). Hence, it is possible to
       define rules in which tokens can be set at run-time:

		       typedef: "$::typedefkeyword" typename identifier ';'

		       identifier: /$::identpat/

       Note that, since each rule is implemented inside a special namespace
       belonging to its parser, it is necessary to explicitly quantify
       variables from the main package.

       Regex tokens can be specified using just slashes as delimiters or with
       the explicit "m<delimiter>......<delimiter>" syntax:

		       typedef: "typedef" typename identifier ';'

		       typename: /[A-Za-z_][A-Za-z0-9_]*/

		       identifier: m{[A-Za-z_][A-Za-z0-9_]*}

       A regex of either type can also have any valid trailing parameter(s)
       (that is, any of [cgimsox]):

		       typedef: "typedef" typename identifier ';'

		       identifier: / [a-z_]	       # LEADING ALPHA OR UNDERSCORE
				     [a-z0-9_]*	       # THEN DIGITS ALSO ALLOWED
				   /ix		       # CASE/SPACE/COMMENT INSENSITIVE

       The value associated with any successfully matched token is a string
       containing the actual text which was matched by the token.

       It is important to remember that, since each grammar is specified in a
       Perl string, all instances of the universal escape character '\' within
       a grammar must be "doubled", so that they interpolate to single '\'s
       when the string is compiled. For example, to use the grammar:

		       word:	   /\S+/ | backslash
		       line:	   prefix word(s) "\n"
		       backslash:  '\\'

       the following code is required:

		       $parser = new Parse::RecDescent (q{

			       word:	   /\\S+/ | backslash
			       line:	   prefix word(s) "\\n"
			       backslash:  '\\\\'

		       });

       Terminal Separators

       For the purpose of matching, each terminal in a production is
       considered to be preceded by a "prefix" - a pattern which must be
       matched before a token match is attempted. By default, the prefix is
       optional whitespace (which always matches, at least trivially), but
       this default may be reset in any production.

       The variable $Parse::RecDescent::skip stores the universal prefix,
       which is the default for all terminal matches in all parsers built with
       "Parse::RecDescent".

       The prefix for an individual production can be altered by using the
       "<skip:...>" directive (see below).

       Actions

       An action is a block of Perl code which is to be executed (as the block
       of a "do" statement) when the parser reaches that point in a
       production. The action executes within a special namespace belonging to
       the active parser, so care must be taken in correctly qualifying
       variable names (see also "Start-up Actions" below).

       The action is considered to succeed if the final value of the block is
       defined (that is, if the implied "do" statement evaluates to a defined
       value - even one which would be treated as "false"). Note that the
       value associated with a successful action is also the final value in
       the block.

       An action will fail if its last evaluated value is "undef". This is
       surprisingly easy to accomplish by accident. For instance, here's an
       infuriating case of an action that makes its production fail, but only
       when debugging isn't activated:

	       description: name rank serial_number
			       { print "Got $item[2] $item[1] ($item[3])\n"
				       if $::debugging
			       }

       If $debugging is false, no statement in the block is executed, so the
       final value is "undef", and the entire production fails. The solution
       is:

	       description: name rank serial_number
			       { print "Got $item[2] $item[1] ($item[3])\n"
				       if $::debugging;
				 1;
			       }

       Within an action, a number of useful parse-time variables are available
       in the special parser namespace (there are other variables also
       accessible, but meddling with them will probably just break your
       parser. As a general rule, if you avoid referring to unqualified
       variables - especially those starting with an underscore - inside an
       action, things should be okay):

       @item and %item
	   The array slice @item[1..$#item] stores the value associated with
	   each item (that is, each subrule, token, or action) in the current
	   production. The analogy is to $1, $2, etc. in a yacc grammar.  Note
	   that, for obvious reasons, @item only contains the values of items
	   before the current point in the production.

	   The first element ($item[0]) stores the name of the current rule
	   being matched.

	   @item is a standard Perl array, so it can also be indexed with
	   negative numbers, representing the number of items back from the
	   current position in the parse:

		   stuff: /various/ bits 'and' pieces "then" data 'end'
				   { print $item[-2] }	# PRINTS data
							# (EASIER THAN: $item[6])

	   The %item hash complements the <@item> array, providing named
	   access to the same item values:

		   stuff: /various/ bits 'and' pieces "then" data 'end'
				   { print $item{data}	# PRINTS data
							# (EVEN EASIER THAN USING @item)

	   The results of named subrules are stored in the hash under each
	   subrule's name (including the repetition specifier, if any), whilst
	   all other items are stored under a "named positional" key that
	   indictates their ordinal position within their item type:
	   __STRINGn__, __PATTERNn__, __DIRECTIVEn__, __ACTIONn__:

		   stuff: /various/ bits 'and' pieces "then" data 'end' { save }
				   { print $item{__PATTERN1__}, # PRINTS 'various'
					   $item{__STRING2__},	# PRINTS 'then'
					   $item{__ACTION1__},	# PRINTS RETURN
								# VALUE OF save
				   }

	   If you want proper named access to patterns or literals, you need
	   to turn them into separate rules:

		   stuff: various bits 'and' pieces "then" data 'end'
				   { print $item{various}  # PRINTS various
				   }

		   various: /various/

	   The special entry $item{__RULE__} stores the name of the current
	   rule (i.e. the same value as $item[0].

	   The advantage of using %item, instead of @items is that it removes
	   the need to track items positions that may change as a grammar
	   evolves. For example, adding an interim "<skip>" directive of
	   action can silently ruin a trailing action, by moving an @item
	   element "down" the array one place. In contrast, the named entry of
	   %item is unaffected by such an insertion.

	   A limitation of the %item hash is that it only records the last
	   value of a particular subrule. For example:

		   range: '(' number '..' number )'
				   { $return = $item{number} }

	   will return only the value corresponding to the second match of the
	   "number" subrule. In other words, successive calls to a subrule
	   overwrite the corresponding entry in %item. Once again, the
	   solution is to rename each subrule in its own rule:

		   range: '(' from_num '..' to_num )'
				   { $return = $item{from_num} }

		   from_num: number
		   to_num:   number

       @arg and %arg
	   The array @arg and the hash %arg store any arguments passed to the
	   rule from some other rule (see ""Subrule argument lists"). Changes
	   to the elements of either variable do not propagate back to the
	   calling rule (data can be passed back from a subrule via the
	   $return variable - see next item).

       $return
	   If a value is assigned to $return within an action, that value is
	   returned if the production containing the action eventually matches
	   successfully. Note that setting $return doesn't cause the current
	   production to succeed. It merely tells it what to return if it does
	   succeed.  Hence $return is analogous to $$ in a yacc grammar.

	   If $return is not assigned within a production, the value of the
	   last component of the production (namely: $item[$#item]) is
	   returned if the production succeeds.

       $commit
	   The current state of commitment to the current production (see
	   "Directives" below).

       $skip
	   The current terminal prefix (see "Directives" below).

       $text
	   The remaining (unparsed) text. Changes to $text do not propagate
	   out of unsuccessful productions, but do survive successful
	   productions. Hence it is possible to dynamically alter the text
	   being parsed - for example, to provide a "#include"-like facility:

		   hash_include: '#include' filename
					   { $text = ::loadfile($item[2]) . $text }

		   filename: '<' /[a-z0-9._-]+/i '>'  { $return = $item[2] }
			   | '"' /[a-z0-9._-]+/i '"'  { $return = $item[2] }

       $thisline and $prevline
	   $thisline stores the current line number within the current parse
	   (starting from 1). $prevline stores the line number for the last
	   character which was already successfully parsed (this will be
	   different from $thisline at the end of each line).

	   For efficiency, $thisline and $prevline are actually tied hashes,
	   and only recompute the required line number when the variable's
	   value is used.

	   Assignment to $thisline adjusts the line number calculator, so that
	   it believes that the current line number is the value being
	   assigned. Note that this adjustment will be reflected in all
	   subsequent line numbers calculations.

	   Modifying the value of the variable $text (as in the previous
	   "hash_include" example, for instance) will confuse the line
	   counting mechanism. To prevent this, you should call
	   "Parse::RecDescent::LineCounter::resync($thisline)" immediately
	   after any assignment to the variable $text (or, at least, before
	   the next attempt to use $thisline).

	   Note that if a production fails after assigning to or resync'ing
	   $thisline, the parser's line counter mechanism will usually be
	   corrupted.

	   Also see the entry for @itempos.

	   The line number can be set to values other than 1, by calling the
	   start rule with a second argument. For example:

		   $parser = new Parse::RecDescent ($grammar);

		   $parser->input($text, 10);	   # START LINE NUMBERS AT 10

       $thiscolumn and $prevcolumn
	   $thiscolumn stores the current column number within the current
	   line being parsed (starting from 1). $prevcolumn stores the column
	   number of the last character which was actually successfully
	   parsed. Usually "$prevcolumn == $thiscolumn-1", but not at the end
	   of lines.

	   For efficiency, $thiscolumn and $prevcolumn are actually tied
	   hashes, and only recompute the required column number when the
	   variable's value is used.

	   Assignment to $thiscolumn or $prevcolumn is a fatal error.

	   Modifying the value of the variable $text (as in the previous
	   "hash_include" example, for instance) may confuse the column
	   counting mechanism.

	   Note that $thiscolumn reports the column number before any
	   whitespace that might be skipped before reading a token. Hence if
	   you wish to know where a token started (and ended) use something
	   like this:

		   rule: token1 token2 startcol token3 endcol token4
				   { print "token3: columns $item[3] to $item[5]"; }

		   startcol: '' { $thiscolumn }	   # NEED THE '' TO STEP PAST TOKEN SEP
		   endcol:	{ $prevcolumn }

	   Also see the entry for @itempos.

       $thisoffset and $prevoffset
	   $thisoffset stores the offset of the current parsing position
	   within the complete text being parsed (starting from 0).
	   $prevoffset stores the offset of the last character which was
	   actually successfully parsed. In all cases "$prevoffset ==
	   $thisoffset-1".

	   For efficiency, $thisoffset and $prevoffset are actually tied
	   hashes, and only recompute the required offset when the variable's
	   value is used.

	   Assignment to $thisoffset or <$prevoffset> is a fatal error.

	   Modifying the value of the variable $text will not affect the
	   offset counting mechanism.

	   Also see the entry for @itempos.

       @itempos
	   The array @itempos stores a hash reference corresponding to each
	   element of @item. The elements of the hash provide the following:

		   $itempos[$n]{offset}{from}	   # VALUE OF $thisoffset BEFORE $item[$n]
		   $itempos[$n]{offset}{to}	   # VALUE OF $prevoffset AFTER $item[$n]
		   $itempos[$n]{line}{from}	   # VALUE OF $thisline BEFORE $item[$n]
		   $itempos[$n]{line}{to}	   # VALUE OF $prevline AFTER $item[$n]
		   $itempos[$n]{column}{from}	   # VALUE OF $thiscolumn BEFORE $item[$n]
		   $itempos[$n]{column}{to}	   # VALUE OF $prevcolumn AFTER $item[$n]

	   Note that the various "$itempos[$n]...{from}" values record the
	   appropriate value after any token prefix has been skipped.

	   Hence, instead of the somewhat tedious and error-prone:

		   rule: startcol token1 endcol
			 startcol token2 endcol
			 startcol token3 endcol
				   { print "token1: columns $item[1]
							 to $item[3]
					    token2: columns $item[4]
							 to $item[6]
					    token3: columns $item[7]
							 to $item[9]" }

		   startcol: '' { $thiscolumn }	   # NEED THE '' TO STEP PAST TOKEN SEP
		   endcol:	{ $prevcolumn }

	   it is possible to write:

		   rule: token1 token2 token3
				   { print "token1: columns $itempos[1]{column}{from}
							 to $itempos[1]{column}{to}
					    token2: columns $itempos[2]{column}{from}
							 to $itempos[2]{column}{to}
					    token3: columns $itempos[3]{column}{from}
							 to $itempos[3]{column}{to}" }

	   Note however that (in the current implementation) the use of
	   @itempos anywhere in a grammar implies that item positioning
	   information is collected everywhere during the parse. Depending on
	   the grammar and the size of the text to be parsed, this may be
	   prohibitively expensive and the explicit use of $thisline,
	   $thiscolumn, etc. may be a better choice.

       $thisparser
	   A reference to the "Parse::RecDescent" object through which parsing
	   was initiated.

	   The value of $thisparser propagates down the subrules of a parse
	   but not back up. Hence, you can invoke subrules from another parser
	   for the scope of the current rule as follows:

		   rule: subrule1 subrule2
		       | { $thisparser = $::otherparser } <reject>
		       | subrule3 subrule4
		       | subrule5

	   The result is that the production calls "subrule1" and "subrule2"
	   of the current parser, and the remaining productions call the named
	   subrules from $::otherparser. Note, however that "Bad Things" will
	   happen if "::otherparser" isn't a blessed reference and/or doesn't
	   have methods with the same names as the required subrules!

       $thisrule
	   A reference to the "Parse::RecDescent::Rule" object corresponding
	   to the rule currently being matched.

       $thisprod
	   A reference to the "Parse::RecDescent::Production" object
	   corresponding to the production currently being matched.

       $score and $score_return
	   $score stores the best production score to date, as specified by an
	   earlier "<score:...>" directive. $score_return stores the
	   corresponding return value for the successful production.

	   See "Scored productions".

       Warning: the parser relies on the information in the various "this..."
       objects in some non-obvious ways. Tinkering with the other members of
       these objects will probably cause Bad Things to happen, unless you
       really know what you're doing. The only exception to this advice is
       that the use of "$this...->{local}" is always safe.

       Start-up Actions

       Any actions which appear before the first rule definition in a grammar
       are treated as "start-up" actions. Each such action is stripped of its
       outermost brackets and then evaluated (in the parser's special
       namespace) just before the rules of the grammar are first compiled.

       The main use of start-up actions is to declare local variables within
       the parser's special namespace:

	       { my $lastitem = '???'; }

	       list: item(s)   { $return = $lastitem }

	       item: book      { $lastitem = 'book'; }
		     bell      { $lastitem = 'bell'; }
		     candle    { $lastitem = 'candle'; }

       but start-up actions can be used to execute any valid Perl code within
       a parser's special namespace.

       Start-up actions can appear within a grammar extension or replacement
       (that is, a partial grammar installed via "Parse::RecDescent::Extend()"
       or "Parse::RecDescent::Replace()" - see "Incremental Parsing"), and
       will be executed before the new grammar is installed. Note, however,
       that a particular start-up action is only ever executed once.

       Autoactions

       It is sometimes desirable to be able to specify a default action to be
       taken at the end of every production (for example, in order to easily
       build a parse tree). If the variable $::RD_AUTOACTION is defined when
       "Parse::RecDescent::new()" is called, the contents of that variable are
       treated as a specification of an action which is to appended to each
       production in the corresponding grammar. So, for example, to construct
       a simple parse tree:

	   $::RD_AUTOACTION = q { [@item] };

	   parser = new Parse::RecDescent (q{
	       expression: and_expr '||' expression | and_expr
	       and_expr:   not_expr '&&' and_expr   | not_expr
	       not_expr:   '!' brack_expr	    | brack_expr
	       brack_expr: '(' expression ')'	    | identifier
	       identifier: /[a-z]+/i
	       });

       which is equivalent to:

	   parser = new Parse::RecDescent (q{
	       expression: and_expr '||' expression
			       { [@item] }
			 | and_expr
			       { [@item] }

	       and_expr:   not_expr '&&' and_expr
			       { [@item] }
		       |   not_expr
			       { [@item] }

	       not_expr:   '!' brack_expr
			       { [@item] }
		       |   brack_expr
			       { [@item] }

	       brack_expr: '(' expression ')'
			       { [@item] }
			 | identifier
			       { [@item] }

	       identifier: /[a-z]+/i
			       { [@item] }
	       });

       Alternatively, we could take an object-oriented approach, use different
       classes for each node (and also eliminating redundant intermediate
       nodes):

	   $::RD_AUTOACTION = q
	     { $#item==1 ? $item[1] : new ${"$item[0]_node"} (@item[1..$#item]) };

	   parser = new Parse::RecDescent (q{
	       expression: and_expr '||' expression | and_expr
	       and_expr:   not_expr '&&' and_expr   | not_expr
	       not_expr:   '!' brack_expr	    | brack_expr
	       brack_expr: '(' expression ')'	    | identifier
	       identifier: /[a-z]+/i
	       });

       which is equivalent to:

	   parser = new Parse::RecDescent (q{
	       expression: and_expr '||' expression
			       { new expression_node (@item[1..3]) }
			 | and_expr

	       and_expr:   not_expr '&&' and_expr
			       { new and_expr_node (@item[1..3]) }
		       |   not_expr

	       not_expr:   '!' brack_expr
			       { new not_expr_node (@item[1..2]) }
		       |   brack_expr

	       brack_expr: '(' expression ')'
			       { new brack_expr_node (@item[1..3]) }
			 | identifier

	       identifier: /[a-z]+/i
			       { new identifer_node (@item[1]) }
	       });

       Note that, if a production already ends in an action, no autoaction is
       appended to it. For example, in this version:

	   $::RD_AUTOACTION = q
	     { $#item==1 ? $item[1] : new ${"$item[0]_node"} (@item[1..$#item]) };

	   parser = new Parse::RecDescent (q{
	       expression: and_expr '&&' expression | and_expr
	       and_expr:   not_expr '&&' and_expr   | not_expr
	       not_expr:   '!' brack_expr	    | brack_expr
	       brack_expr: '(' expression ')'	    | identifier
	       identifier: /[a-z]+/i
			       { new terminal_node($item[1]) }
	       });

       each "identifier" match produces a "terminal_node" object, not an
       "identifier_node" object.

       A level 1 warning is issued each time an "autoaction" is added to some
       production.

       Autotrees

       A commonly needed autoaction is one that builds a parse-tree. It is
       moderately tricky to set up such an action (which must treat terminals
       differently from non-terminals), so Parse::RecDescent simplifies the
       process by providing the "<autotree>" directive.

       If this directive appears at the start of grammar, it causes
       Parse::RecDescent to insert autoactions at the end of any rule except
       those which already end in an action. The action inserted depends on
       whether the production is an intermediate rule (two or more items), or
       a terminal of the grammar (i.e. a single pattern or string item).

       So, for example, the following grammar:

	       <autotree>

	       file    : command(s)
	       command : get | set | vet
	       get     : 'get' ident ';'
	       set     : 'set' ident 'to' value ';'
	       vet     : 'check' ident 'is' value ';'
	       ident   : /\w+/
	       value   : /\d+/

       is equivalent to:

	       file    : command(s)		       { bless \%item, $item[0] }
	       command : get			       { bless \%item, $item[0] }
		       | set			       { bless \%item, $item[0] }
		       | vet			       { bless \%item, $item[0] }
	       get     : 'get' ident ';'	       { bless \%item, $item[0] }
	       set     : 'set' ident 'to' value ';'    { bless \%item, $item[0] }
	       vet     : 'check' ident 'is' value ';'  { bless \%item, $item[0] }

	       ident   : /\w+/		{ bless {__VALUE__=>$item[1]}, $item[0] }
	       value   : /\d+/		{ bless {__VALUE__=>$item[1]}, $item[0] }

       Note that each node in the tree is blessed into a class of the same
       name as the rule itself. This makes it easy to build object-oriented
       processors for the parse-trees that the grammar produces. Note too that
       the last two rules produce special objects with the single attribute
       '__VALUE__'. This is because they consist solely of a single terminal.

       This autoaction-ed grammar would then produce a parse tree in a data
       structure like this:

	       {
		 file => {
			   command => {
					[ get => {
						   identifier => { __VALUE__ => 'a' },
						 },
					  set => {
						   identifier => { __VALUE__ => 'b' },
						   value      => { __VALUE__ => '7' },
						 },
					  vet => {
						   identifier => { __VALUE__ => 'b' },
						   value      => { __VALUE__ => '7' },
						 },
					 ],
				      },
			 }
	       }

       (except, of course, that each nested hash would also be blessed into
       the appropriate class).

       Autostubbing

       Normally, if a subrule appears in some production, but no rule of that
       name is ever defined in the grammar, the production which refers to the
       non-existent subrule fails immediately. This typically occurs as a
       result of misspellings, and is a sufficiently common occurance that a
       warning is generated for such situations.

       However, when prototyping a grammar it is sometimes useful to be able
       to use subrules before a proper specification of them is really
       possible.  For example, a grammar might include a section like:

	       function_call: identifier '(' arg(s?) ')'

	       identifier: /[a-z]\w*/i

       where the possible format of an argument is sufficiently complex that
       it is not worth specifying in full until the general function call
       syntax has been debugged. In this situation it is convenient to leave
       the real rule "arg" undefined and just slip in a placeholder (or
       "stub"):

	       arg: 'arg'

       so that the function call syntax can be tested with dummy input such
       as:

	       f0()
	       f1(arg)
	       f2(arg arg)
	       f3(arg arg arg)

       et cetera.

       Early in prototyping, many such "stubs" may be required, so
       "Parse::RecDescent" provides a means of automating their definition.
       If the variable $::RD_AUTOSTUB is defined when a parser is built, a
       subrule reference to any non-existent rule (say, "sr"), causes a "stub"
       rule of the form:

	       sr: 'sr'

       to be automatically defined in the generated parser.  A level 1 warning
       is issued for each such "autostubbed" rule.

       Hence, with $::AUTOSTUB defined, it is possible to only partially
       specify a grammar, and then "fake" matches of the unspecified
       (sub)rules by just typing in their name.

       Look-ahead

       If a subrule, token, or action is prefixed by "...", then it is treated
       as a "look-ahead" request. That means that the current production can
       (as usual) only succeed if the specified item is matched, but that the
       matching does not consume any of the text being parsed. This is very
       similar to the "/(?=...)/" look-ahead construct in Perl patterns. Thus,
       the rule:

	       inner_word: word ...word

       will match whatever the subrule "word" matches, provided that match is
       followed by some more text which subrule "word" would also match
       (although this second substring is not actually consumed by
       "inner_word")

       Likewise, a "...!" prefix, causes the following item to succeed
       (without consuming any text) if and only if it would normally fail.
       Hence, a rule such as:

	       identifier: ...!keyword ...!'_' /[A-Za-z_]\w*/

       matches a string of characters which satisfies the pattern
       "/[A-Za-z_]\w*/", but only if the same sequence of characters would not
       match either subrule "keyword" or the literal token '_'.

       Sequences of look-ahead prefixes accumulate, multiplying their positive
       and/or negative senses. Hence:

	       inner_word: word ...!......!word

       is exactly equivalent the the original example above (a warning is
       issued in cases like these, since they often indicate something left
       out, or misunderstood).

       Note that actions can also be treated as look-aheads. In such cases,
       the state of the parser text (in the local variable $text) after the
       look-ahead action is guaranteed to be identical to its state before the
       action, regardless of how it's changed within the action (unless you
       actually undefine $text, in which case you get the disaster you deserve
       :-).

       Directives

       Directives are special pre-defined actions which may be used to alter
       the behaviour of the parser. There are currently eighteen directives:
       "<commit>", "<uncommit>", "<reject>", "<score>", "<autoscore>",
       "<skip>", "<resync>", "<error>", "<rulevar>", "<matchrule>",
       "<leftop>", "<rightop>", "<defer>", "<nocheck>", "<perl_quotelike>",
       "<perl_codeblock>", "<perl_variable>", and "<token>".

       Committing and uncommitting
	   The "<commit>" and "<uncommit>" directives permit the recursive
	   descent of the parse tree to be pruned (or "cut") for efficiency.
	   Within a rule, a "<commit>" directive instructs the rule to ignore
	   subsequent productions if the current production fails. For
	   example:

		   command: 'find' <commit> filename
			  | 'open' <commit> filename
			  | 'move' filename filename

	   Clearly, if the leading token 'find' is matched in the first
	   production but that production fails for some other reason, then
	   the remaining productions cannot possibly match. The presence of
	   the "<commit>" causes the "command" rule to fail immediately if an
	   invalid "find" command is found, and likewise if an invalid "open"
	   command is encountered.

	   It is also possible to revoke a previous commitment. For example:

		   if_statement: 'if' <commit> condition
					   'then' block <uncommit>
					   'else' block
			       | 'if' <commit> condition
					   'then' block

	   In this case, a failure to find an "else" block in the first
	   production shouldn't preclude trying the second production, but a
	   failure to find a "condition" certainly should.

	   As a special case, any production in which the first item is an
	   "<uncommit>" immediately revokes a preceding "<commit>" (even
	   though the production would not otherwise have been tried). For
	   example, in the rule:

		   request: 'explain' expression
			  | 'explain' <commit> keyword
			  | 'save'
			  | 'quit'
			  | <uncommit> term '?'

	   if the text being matched was "explain?", and the first two
	   productions failed, then the "<commit>" in production two would
	   cause productions three and four to be skipped, but the leading
	   "<uncommit>" in the production five would allow that production to
	   attempt a match.

	   Note in the preceding example, that the "<commit>" was only placed
	   in production two. If production one had been:

		   request: 'explain' <commit> expression

	   then production two would be (inappropriately) skipped if a leading
	   "explain..." was encountered.

	   Both "<commit>" and "<uncommit>" directives always succeed, and
	   their value is always 1.

       Rejecting a production
	   The "<reject>" directive immediately causes the current production
	   to fail (it is exactly equivalent to, but more obvious than, the
	   action "{undef}"). A "<reject>" is useful when it is desirable to
	   get the side effects of the actions in one production, without
	   prejudicing a match by some other production later in the rule. For
	   example, to insert tracing code into the parse:

		   complex_rule: { print "In complex rule...\n"; } <reject>

		   complex_rule: simple_rule '+' 'i' '*' simple_rule
			       | 'i' '*' simple_rule
			       | simple_rule

	   It is also possible to specify a conditional rejection, using the
	   form "<reject:condition>", which only rejects if the specified
	   condition is true. This form of rejection is exactly equivalent to
	   the action "{(condition)?undef:1}>".	 For example:

		   command: save_command
			  | restore_command
			  | <reject: defined $::tolerant> { exit }
			  | <error: Unknown command. Ignored.>

	   A "<reject>" directive never succeeds (and hence has no associated
	   value). A conditional rejection may succeed (if its condition is
	   not satisfied), in which case its value is 1.

	   As an extra optimization, "Parse::RecDescent" ignores any
	   production which begins with an unconditional "<reject>" directive,
	   since any such production can never successfully match or have any
	   useful side-effects. A level 1 warning is issued in all such cases.

	   Note that productions beginning with conditional "<reject:...>"
	   directives are never "optimized away" in this manner, even if they
	   are always guaranteed to fail (for example: "<reject:1>")

	   Due to the way grammars are parsed, there is a minor restriction on
	   the condition of a conditional "<reject:...>": it cannot contain
	   any raw '<' or '>' characters. For example:

		   line: cmd <reject: $thiscolumn > max> data

	   results in an error when a parser is built from this grammar (since
	   the grammar parser has no way of knowing whether the first > is a
	   "less than" or the end of the "<reject:...>".

	   To overcome this problem, put the condition inside a do{} block:

		   line: cmd <reject: do{$thiscolumn > max}> data

	   Note that the same problem may occur in other directives that take
	   arguments. The same solution will work in all cases.

       Skipping between terminals
	   The "<skip>" directive enables the terminal prefix used in a
	   production to be changed. For example:

		   OneLiner: Command <skip:'[ \t]*'> Arg(s) /;/

	   causes only blanks and tabs to be skipped before terminals in the
	   "Arg" subrule (and any of its subrules>, and also before the final
	   "/;/" terminal.  Once the production is complete, the previous
	   terminal prefix is reinstated. Note that this implies that distinct
	   productions of a rule must reset their terminal prefixes
	   individually.

	   The "<skip>" directive evaluates to the previous terminal prefix,
	   so it's easy to reinstate a prefix later in a production:

		   Command: <skip:","> CSV(s) <skip:$item[1]> Modifier

	   The value specified after the colon is interpolated into a pattern,
	   so all of the following are equivalent (though their efficiency
	   increases down the list):

		   <skip: "$colon|$comma">   # ASSUMING THE VARS HOLD THE OBVIOUS VALUES

		   <skip: ':|,'>

		   <skip: q{[:,]}>

		   <skip: qr/[:,]/>

	   There is no way of directly setting the prefix for an entire rule,
	   except as follows:

		   Rule: <skip: '[ \t]*'> Prod1
		       | <skip: '[ \t]*'> Prod2a Prod2b
		       | <skip: '[ \t]*'> Prod3

	   or, better:

		   Rule: <skip: '[ \t]*'>
		       (
			   Prod1
			 | Prod2a Prod2b
			 | Prod3
		       )

	   Note: Up to release 1.51 of Parse::RecDescent, an entirely
	   different mechanism was used for specifying terminal prefixes. The
	   current method is not backwards-compatible with that early
	   approach. The current approach is stable and will not to change
	   again.

       Resynchronization
	   The "<resync>" directive provides a visually distinctive means of
	   consuming some of the text being parsed, usually to skip an
	   erroneous input. In its simplest form "<resync>" simply consumes
	   text up to and including the next newline ("\n") character,
	   succeeding only if the newline is found, in which case it causes
	   its surrounding rule to return zero on success.

	   In other words, a "<resync>" is exactly equivalent to the token
	   "/[^\n]*\n/" followed by the action "{ $return = 0 }" (except that
	   productions beginning with a "<resync>" are ignored when generating
	   error messages). A typical use might be:

		   script : command(s)

		   command: save_command
			  | restore_command
			  | <resync> # TRY NEXT LINE, IF POSSIBLE

	   It is also possible to explicitly specify a resynchronization
	   pattern, using the "<resync:pattern>" variant. This version
	   succeeds only if the specified pattern matches (and consumes) the
	   parsed text. In other words, "<resync:pattern>" is exactly
	   equivalent to the token "/pattern/" (followed by a
	   "{ $return = 0 }" action). For example, if commands were terminated
	   by newlines or semi-colons:

		   command: save_command
			  | restore_command
			  | <resync:[^;\n]*[;\n]>

	   The value of a successfully matched "<resync>" directive (of either
	   type) is the text that it consumed. Note, however, that since the
	   directive also sets $return, a production consisting of a lone
	   "<resync>" succeeds but returns the value zero (which a calling
	   rule may find useful to distinguish between "true" matches and
	   "tolerant" matches).	 Remember that returning a zero value
	   indicates that the rule succeeded (since only an "undef" denotes
	   failure within "Parse::RecDescent" parsers.

       Error handling
	   The "<error>" directive provides automatic or user-defined
	   generation of error messages during a parse. In its simplest form
	   "<error>" prepares an error message based on the mismatch between
	   the last item expected and the text which cause it to fail. For
	   example, given the rule:

		   McCoy: curse ',' name ', I'm a doctor, not a' a_profession '!'
			| pronoun 'dead,' name '!'
			| <error>

	   the following strings would produce the following messages:

	   "Amen, Jim!"
		      ERROR (line 1): Invalid McCoy: Expected curse or pronoun
				      not found

	   "Dammit, Jim, I'm a doctor!"
		      ERROR (line 1): Invalid McCoy: Expected ", I'm a doctor, not a"
				      but found ", I'm a doctor!" instead

	   "He's dead,\n"
		      ERROR (line 2): Invalid McCoy: Expected name not found

	   "He's alive!"
		      ERROR (line 1): Invalid McCoy: Expected 'dead,' but found
				      "alive!" instead

	   "Dammit, Jim, I'm a doctor, not a pointy-eared Vulcan!"
		      ERROR (line 1): Invalid McCoy: Expected a profession but found
				      "pointy-eared Vulcan!" instead

	   Note that, when autogenerating error messages, all underscores in
	   any rule name used in a message are replaced by single spaces (for
	   example "a_production" becomes "a production"). Judicious choice of
	   rule names can therefore considerably improve the readability of
	   automatic error messages (as well as the maintainability of the
	   original grammar).

	   If the automatically generated error is not sufficient, it is
	   possible to provide an explicit message as part of the error
	   directive. For example:

		   Spock: "Fascinating ',' (name | 'Captain') '.'
			| "Highly illogical, doctor."
			| <error: He never said that!>

	   which would result in all failures to parse a "Spock" subrule
	   printing the following message:

		  ERROR (line <N>): Invalid Spock:  He never said that!

	   The error message is treated as a "qq{...}" string and interpolated
	   when the error is generated (not when the directive is specified!).
	   Hence:

		   <error: Mystical error near "$text">

	   would correctly insert the ambient text string which caused the
	   error.

	   There are two other forms of error directive: "<error?>" and
	   "<error?: msg>". These behave just like "<error>" and
	   "<error: msg>" respectively, except that they are only triggered if
	   the rule is "committed" at the time they are encountered. For
	   example:

		   Scotty: "Ya kenna change the Laws of Phusics," <commit> name
			 | name <commit> ',' 'she's goanta blaw!'
			 | <error?>

	   will only generate an error for a string beginning with "Ya kenna
	   change the Laws o' Phusics," or a valid name, but which still fails
	   to match the corresponding production. That is,
	   "$parser->Scotty("Aye, Cap'ain")" will fail silently (since neither
	   production will "commit" the rule on that input), whereas
	   "$parser->Scotty("Mr Spock, ah jest kenna do'ut!")"	will fail with
	   the error message:

		  ERROR (line 1): Invalid Scotty: expected 'she's goanta blaw!'
				  but found 'I jest kenna do'ut!' instead.

	   since in that case the second production would commit after
	   matching the leading name.

	   Note that to allow this behaviour, all "<error>" directives which
	   are the first item in a production automatically uncommit the rule
	   just long enough to allow their production to be attempted (that
	   is, when their production fails, the commitment is reinstated so
	   that subsequent productions are skipped).

	   In order to permanently uncommit the rule before an error message,
	   it is necessary to put an explicit "<uncommit>" before the
	   "<error>". For example:

		   line: 'Kirk:'  <commit> Kirk
		       | 'Spock:' <commit> Spock
		       | 'McCoy:' <commit> McCoy
		       | <uncommit> <error?> <reject>
		       | <resync>

	   Error messages generated by the various "<error...>" directives are
	   not displayed immediately. Instead, they are "queued" in a buffer
	   and are only displayed once parsing ultimately fails. Moreover,
	   "<error...>" directives that cause one production of a rule to fail
	   are automatically removed from the message queue if another
	   production subsequently causes the entire rule to succeed.  This
	   means that you can put "<error...>" directives wherever useful
	   diagnosis can be done, and only those associated with actual parser
	   failure will ever be displayed. Also see "Gotchas".

	   As a general rule, the most useful diagnostics are usually
	   generated either at the very lowest level within the grammar, or at
	   the very highest. A good rule of thumb is to identify those
	   subrules which consist mainly (or entirely) of terminals, and then
	   put an "<error...>" directive at the end of any other rule which
	   calls one or more of those subrules.

	   There is one other situation in which the output of the various
	   types of error directive is suppressed; namely, when the rule
	   containing them is being parsed as part of a "look-ahead" (see
	   "Look-ahead"). In this case, the error directive will still cause
	   the rule to fail, but will do so silently.

	   An unconditional "<error>" directive always fails (and hence has no
	   associated value). This means that encountering such a directive
	   always causes the production containing it to fail. Hence an
	   "<error>" directive will inevitably be the last (useful) item of a
	   rule (a level 3 warning is issued if a production contains items
	   after an unconditional "<error>" directive).

	   An "<error?>" directive will succeed (that is: fail to fail :-), if
	   the current rule is uncommitted when the directive is encountered.
	   In that case the directive's associated value is zero. Hence, this
	   type of error directive can be used before the end of a production.
	   For example:

		   command: 'do' <commit> something
			  | 'report' <commit> something
			  | <error?: Syntax error> <error: Unknown command>

	   Warning: The "<error?>" directive does not mean "always fail (but
	   do so silently unless committed)". It actually means "only fail
	   (and report) if committed, otherwise succeed". To achieve the "fail
	   silently if uncommitted" semantics, it is necessary to use:

		   rule: item <commit> item(s)
		       | <error?> <reject>	# FAIL SILENTLY UNLESS COMMITTED

	   However, because people seem to expect a lone "<error?>" directive
	   to work like this:

		   rule: item <commit> item(s)
		       | <error?: Error message if committed>
		       | <error:  Error message if uncommitted>

	   Parse::RecDescent automatically appends a "<reject>" directive if
	   the "<error?>" directive is the only item in a production. A level
	   2 warning (see below) is issued when this happens.

	   The level of error reporting during both parser construction and
	   parsing is controlled by the presence or absence of four global
	   variables: $::RD_ERRORS, $::RD_WARN, $::RD_HINT, and <$::RD_TRACE>.
	   If $::RD_ERRORS is defined (and, by default, it is) then fatal
	   errors are reported.

	   Whenever $::RD_WARN is defined, certain non-fatal problems are also
	   reported.  Warnings have an associated "level": 1, 2, or 3. The
	   higher the level, the more serious the warning. The value of the
	   corresponding global variable ($::RD_WARN) determines the lowest
	   level of warning to be displayed. Hence, to see all warnings, set
	   $::RD_WARN to 1.  To see only the most serious warnings set
	   $::RD_WARN to 3.  By default $::RD_WARN is initialized to 3,
	   ensuring that serious but non-fatal errors are automatically
	   reported.

	   See "DIAGNOSTICS" for a list of the varous error and warning
	   messages that Parse::RecDescent generates when these two variables
	   are defined.

	   Defining any of the remaining variables (which are not defined by
	   default) further increases the amount of information reported.
	   Defining $::RD_HINT causes the parser generator to offer more
	   detailed analyses and hints on both errors and warnings.  Note that
	   setting $::RD_HINT at any point automagically sets $::RD_WARN to 1.

	   Defining $::RD_TRACE causes the parser generator and the parser to
	   report their progress to STDERR in excruciating detail (although,
	   without hints unless $::RD_HINT is separately defined). This detail
	   can be moderated in only one respect: if $::RD_TRACE has an integer
	   value (N) greater than 1, only the N characters of the "current
	   parsing context" (that is, where in the input string we are at any
	   point in the parse) is reported at any time.

	   Note that the four variables belong to the "main" package, which
	   makes them easier to refer to in the code controlling the parser,
	   and also makes it easy to turn them into command line flags
	   ("-RD_ERRORS", "-RD_WARN", "-RD_HINT", "-RD_TRACE") under perl -s.

       Specifying local variables
	   It is occasionally convenient to specify variables which are local
	   to a single rule. This may be achieved by including a
	   "<rulevar:...>" directive anywhere in the rule. For example:

		   markup: <rulevar: $tag>

		   markup: tag {($tag=$item[1]) =~ s/^<|>$//g} body[$tag]

	   The example "<rulevar: $tag>" directive causes a "my" variable
	   named $tag to be declared at the start of the subroutine
	   implementing the "markup" rule (that is, before the first
	   production, regardless of where in the rule it is specified).

	   Specifically, any directive of the form: "<rulevar:text>" causes a
	   line of the form "my text;" to be added at the beginning of the
	   rule subroutine, immediately after the definitions of the following
	   local variables:

		   $thisparser	   $commit
		   $thisrule	   @item
		   $thisline	   @arg
		   $text	   %arg

	   This means that the following "<rulevar>" directives work as
	   expected:

		   <rulevar: $count = 0 >

		   <rulevar: $firstarg = $arg[0] || '' >

		   <rulevar: $myItems = \@item >

		   <rulevar: @context = ( $thisline, $text, @arg ) >

		   <rulevar: ($name,$age) = $arg{"name","age"} >

	   If a variable that is also visible to subrules is required, it
	   needs to be "local"'d, not "my"'d. "rulevar" defaults to "my", but
	   if "local" is explicitly specified:

		   <rulevar: local $count = 0 >

	   then a "local"-ized variable is declared instead, and will be
	   available within subrules.

	   Note however that, because all such variables are "my" variables,
	   their values do not persist between match attempts on a given rule.
	   To preserve values between match attempts, values can be stored
	   within the "local" member of the $thisrule object:

		   countedrule: { $thisrule->{"local"}{"count"}++ }
				<reject>
			      | subrule1
			      | subrule2
			      | <reject: $thisrule->{"local"}{"count"} == 1>
				subrule3

	   When matching a rule, each "<rulevar>" directive is matched as if
	   it were an unconditional "<reject>" directive (that is, it causes
	   any production in which it appears to immediately fail to match).
	   For this reason (and to improve readability) it is usual to specify
	   any "<rulevar>" directive in a separate production at the start of
	   the rule (this has the added advantage that it enables
	   "Parse::RecDescent" to optimize away such productions, just as it
	   does for the "<reject>" directive).

       Dynamically matched rules
	   Because regexes and double-quoted strings are interpolated, it is
	   relatively easy to specify productions with "context sensitive"
	   tokens. For example:

		   command:  keyword  body  "end $item[1]"

	   which ensures that a command block is bounded by a "<keyword>...end
	   <same keyword>" pair.

	   Building productions in which subrules are context sensitive is
	   also possible, via the "<matchrule:...>" directive. This directive
	   behaves identically to a subrule item, except that the rule which
	   is invoked to match it is determined by the string specified after
	   the colon. For example, we could rewrite the "command" rule like
	   this:

		   command:  keyword  <matchrule:body>	"end $item[1]"

	   Whatever appears after the colon in the directive is treated as an
	   interpolated string (that is, as if it appeared in "qq{...}"
	   operator) and the value of that interpolated string is the name of
	   the subrule to be matched.

	   Of course, just putting a constant string like "body" in a
	   "<matchrule:...>" directive is of little interest or benefit.  The
	   power of directive is seen when we use a string that interpolates
	   to something interesting. For example:

		   command:	   keyword <matchrule:$item[1]_body> "end $item[1]"

		   keyword:	   'while' | 'if' | 'function'

		   while_body:	   condition block

		   if_body:	   condition block ('else' block)(?)

		   function_body:  arglist block

	   Now the "command" rule selects how to proceed on the basis of the
	   keyword that is found. It is as if "command" were declared:

		   command:	   'while'    while_body    "end while"
			  |	   'if'	      if_body	    "end if"
			  |	   'function' function_body "end function"

	   When a "<matchrule:...>" directive is used as a repeated subrule,
	   the rule name expression is "late-bound". That is, the name of the
	   rule to be called is re-evaluated each time a match attempt is
	   made. Hence, the following grammar:

		   { $::species = 'dogs' }

		   pair:   'two' <matchrule:$::species>(s)

		   dogs:   /dogs/ { $::species = 'cats' }

		   cats:   /cats/

	   will match the string "two dogs cats cats" completely, whereas it
	   will only match the string "two dogs dogs dogs" up to the eighth
	   letter. If the rule name were "early bound" (that is, evaluated
	   only the first time the directive is encountered in a production),
	   the reverse behaviour would be expected.

	   Note that the "matchrule" directive takes a string that is to be
	   treated as a rule name, not as a rule invocation. That is, it's
	   like a Perl symbolic reference, not an "eval". Just as you can say:

		   $subname = 'foo';

		   # and later...

		   &{$foo}(@args);

	   but not:

		   $subname = 'foo(@args)';

		   # and later...

		   &{$foo};

	   likewise you can say:

		   $rulename = 'foo';

		   # and in the grammar...

		   <matchrule:$rulename>[@args]

	   but not:

		   $rulename = 'foo[@args]';

		   # and in the grammar...

		   <matchrule:$rulename>

       Deferred actions
	   The "<defer:...>" directive is used to specify an action to be
	   performed when (and only if!) the current production ultimately
	   succeeds.

	   Whenever a "<defer:...>" directive appears, the code it specifies
	   is converted to a closure (an anonymous subroutine reference) which
	   is queued within the active parser object. Note that, because the
	   deferred code is converted to a closure, the values of any "local"
	   variable (such as $text, <@item>, etc.) are preserved until the
	   deferred code is actually executed.

	   If the parse ultimately succeeds and the production in which the
	   "<defer:...>" directive was evaluated formed part of the successful
	   parse, then the deferred code is executed immediately before the
	   parse returns. If however the production which queued a deferred
	   action fails, or one of the higher-level rules which called that
	   production fails, then the deferred action is removed from the
	   queue, and hence is never executed.

	   For example, given the grammar:

		   sentence: noun trans noun
			   | noun intrans

		   noun:     'the dog'
				   { print "$item[1]\t(noun)\n" }
		       |     'the meat'
				   { print "$item[1]\t(noun)\n" }

		   trans:    'ate'
				   { print "$item[1]\t(transitive)\n" }

		   intrans:  'ate'
				   { print "$item[1]\t(intransitive)\n" }
			  |  'barked'
				   { print "$item[1]\t(intransitive)\n" }

	   then parsing the sentence "the dog ate" would produce the output:

		   the dog  (noun)
		   ate	    (transitive)
		   the dog  (noun)
		   ate	    (intransitive)

	   This is because, even though the first production of "sentence"
	   ultimately fails, its initial subrules "noun" and "trans" do match,
	   and hence they execute their associated actions.  Then the second
	   production of "sentence" succeeds, causing the actions of the
	   subrules "noun" and "intrans" to be executed as well.

	   On the other hand, if the actions were replaced by "<defer:...>"
	   directives:

		   sentence: noun trans noun
			   | noun intrans

		   noun:     'the dog'
				   <defer: print "$item[1]\t(noun)\n" >
		       |     'the meat'
				   <defer: print "$item[1]\t(noun)\n" >

		   trans:    'ate'
				   <defer: print "$item[1]\t(transitive)\n" >

		   intrans:  'ate'
				   <defer: print "$item[1]\t(intransitive)\n" >
			  |  'barked'
				   <defer: print "$item[1]\t(intransitive)\n" >

	   the output would be:

		   the dog  (noun)
		   ate	    (intransitive)

	   since deferred actions are only executed if they were evaluated in
	   a production which ultimately contributes to the successful parse.

	   In this case, even though the first production of "sentence" caused
	   the subrules "noun" and "trans" to match, that production
	   ultimately failed and so the deferred actions queued by those
	   subrules were subsequently disgarded. The second production then
	   succeeded, causing the entire parse to succeed, and so the deferred
	   actions queued by the (second) match of the "noun" subrule and the
	   subsequent match of "intrans" are preserved and eventually
	   executed.

	   Deferred actions provide a means of improving the performance of a
	   parser, by only executing those actions which are part of the final
	   parse-tree for the input data.

	   Alternatively, deferred actions can be viewed as a mechanism for
	   building (and executing) a customized subroutine corresponding to
	   the given input data, much in the same way that autoactions (see
	   "Autoactions") can be used to build a customized data structure for
	   specific input.

	   Whether or not the action it specifies is ever executed, a
	   "<defer:...>" directive always succeeds, returning the number of
	   deferred actions currently queued at that point.

       Parsing Perl
	   Parse::RecDescent provides limited support for parsing subsets of
	   Perl, namely: quote-like operators, Perl variables, and complete
	   code blocks.

	   The "<perl_quotelike>" directive can be used to parse any Perl
	   quote-like operator: 'a string', "m/a pattern/", "tr{ans}{lation}",
	   etc.	 It does this by calling Text::Balanced::quotelike().

	   If a quote-like operator is found, a reference to an array of eight
	   elements is returned. Those elements are identical to the last
	   eight elements returned by Text::Balanced::extract_quotelike() in
	   an array context, namely:

	   [0] the name of the quotelike operator -- 'q', 'qq', 'm', 's', 'tr'
	       -- if the operator was named; otherwise "undef",

	   [1] the left delimiter of the first block of the operation,

	   [2] the text of the first block of the operation (that is, the
	       contents of a quote, the regex of a match, or substitution or
	       the target list of a translation),

	   [3] the right delimiter of the first block of the operation,

	   [4] the left delimiter of the second block of the operation if
	       there is one (that is, if it is a "s", "tr", or "y"); otherwise
	       "undef",

	   [5] the text of the second block of the operation if there is one
	       (that is, the replacement of a substitution or the translation
	       list of a translation); otherwise "undef",

	   [6] the right delimiter of the second block of the operation (if
	       any); otherwise "undef",

	   [7] the trailing modifiers on the operation (if any); otherwise
	       "undef".

	   If a quote-like expression is not found, the directive fails with
	   the usual "undef" value.

	   The "<perl_variable>" directive can be used to parse any Perl
	   variable: $scalar, @array, %hash, $ref->{field}[$index], etc.  It
	   does this by calling Text::Balanced::extract_variable().

	   If the directive matches text representing a valid Perl variable
	   specification, it returns that text. Otherwise it fails with the
	   usual "undef" value.

	   The "<perl_codeblock>" directive can be used to parse curly-brace-
	   delimited block of Perl code, such as: { $a = 1; f() =~ m/pat/; }.
	   It does this by calling Text::Balanced::extract_codeblock().

	   If the directive matches text representing a valid Perl code block,
	   it returns that text. Otherwise it fails with the usual "undef"
	   value.

	   You can also tell it what kind of brackets to use as the outermost
	   delimiters. For example:

		   arglist: <perl_codeblock ()>

	   causes an arglist to match a perl code block whose outermost
	   delimiters are "(...)" (rather than the default "{...}").

       Constructing tokens
	   Eventually, Parse::RecDescent will be able to parse tokenized
	   input, as well as ordinary strings. In preparation for this joyous
	   day, the "<token:...>" directive has been provided.	This directive
	   creates a token which will be suitable for input to a
	   Parse::RecDescent parser (when it eventually supports tokenized
	   input).

	   The text of the token is the value of the immediately preceding
	   item in the production. A "<token:...>" directive always succeeds
	   with a return value which is the hash reference that is the new
	   token. It also sets the return value for the production to that
	   hash ref.

	   The "<token:...>" directive makes it easy to build a
	   Parse::RecDescent-compatible lexer in Parse::RecDescent:

		   my $lexer = new Parse::RecDescent q
		   {
			   lex:	   token(s)

			   token:  /a\b/		      <token:INDEF>
				|  /the\b/		      <token:DEF>
				|  /fly\b/		      <token:NOUN,VERB>
				|  /[a-z]+/i { lc $item[1] }  <token:ALPHA>
				|  <error: Unknown token>

		   };

	   which will eventually be able to be used with a regular
	   Parse::RecDescent grammar:

		   my $parser = new Parse::RecDescent q
		   {
			   startrule: subrule1 subrule 2

			   # ETC...
		   };

	   either with a pre-lexing phase:

		   $parser->startrule( $lexer->lex($data) );

	   or with a lex-on-demand approach:

		   $parser->startrule( sub{$lexer->token(\$data)} );

	   But at present, only the "<token:...>" directive is actually
	   implemented. The rest is vapourware.

       Specifying operations
	   One of the commonest requirements when building a parser is to
	   specify binary operators. Unfortunately, in a normal grammar, the
	   rules for such things are awkward:

		   disjunction:	   conjunction ('or' conjunction)(s?)
					   { $return = [ $item[1], @{$item[2]} ] }

		   conjunction:	   atom ('and' atom)(s?)
					   { $return = [ $item[1], @{$item[2]} ] }

	   or inefficient:

		   disjunction:	   conjunction 'or' disjunction
					   { $return = [ $item[1], @{$item[2]} ] }
			      |	   conjunction
					   { $return = [ $item[1] ] }

		   conjunction:	   atom 'and' conjunction
					   { $return = [ $item[1], @{$item[2]} ] }
			      |	   atom
					   { $return = [ $item[1] ] }

	   and either way is ugly and hard to get right.

	   The "<leftop:...>" and "<rightop:...>" directives provide an easier
	   way of specifying such operations. Using "<leftop:...>" the above
	   examples become:

		   disjunction:	   <leftop: conjunction 'or' conjunction>
		   conjunction:	   <leftop: atom 'and' atom>

	   The "<leftop:...>" directive specifies a left-associative binary
	   operator.  It is specified around three other grammar elements
	   (typically subrules or terminals), which match the left operand,
	   the operator itself, and the right operand respectively.

	   A "<leftop:...>" directive such as:

		   disjunction:	   <leftop: conjunction 'or' conjunction>

	   is converted to the following:

		   disjunction:	   ( conjunction ('or' conjunction)(s?)
					   { $return = [ $item[1], @{$item[2]} ] } )

	   In other words, a "<leftop:...>" directive matches the left operand
	   followed by zero or more repetitions of both the operator and the
	   right operand. It then flattens the matched items into an anonymous
	   array which becomes the (single) value of the entire "<leftop:...>"
	   directive.

	   For example, an "<leftop:...>" directive such as:

		   output:  <leftop: ident '<<' expr >

	   when given a string such as:

		   cout << var << "str" << 3

	   would match, and $item[1] would be set to:

		   [ 'cout', 'var', '"str"', '3' ]

	   In other words:

		   output:  <leftop: ident '<<' expr >

	   is equivalent to a left-associative operator:

		   output:  ident				   { $return = [$item[1]]	}
			 |  ident '<<' expr			   { $return = [@item[1,3]]	}
			 |  ident '<<' expr '<<' expr		   { $return = [@item[1,3,5]]	}
			 |  ident '<<' expr '<<' expr '<<' expr	   { $return = [@item[1,3,5,7]] }
			 #  ...etc...

	   Similarly, the "<rightop:...>" directive takes a left operand, an
	   operator, and a right operand:

		   assign:  <rightop: var '=' expr >

	   and converts them to:

		   assign:  ( (var '=' {$return=$item[1]})(s?) expr
					   { $return = [ @{$item[1]}, $item[2] ] } )

	   which is equivalent to a right-associative operator:

		   assign:  var				   { $return = [$item[1]]	}
			 |  var '=' expr		   { $return = [@item[1,3]]	}
			 |  var '=' var '=' expr	   { $return = [@item[1,3,5]]	}
			 |  var '=' var '=' var '=' expr   { $return = [@item[1,3,5,7]] }
			 #  ...etc...

	   Note that for both the "<leftop:...>" and "<rightop:...>"
	   directives, the directive does not normally return the operator
	   itself, just a list of the operands involved. This is particularly
	   handy for specifying lists:

		   list: '(' <leftop: list_item ',' list_item> ')'
				   { $return = $item[2] }

	   There is, however, a problem: sometimes the operator is itself
	   significant.	 For example, in a Perl list a comma and a "=>" are
	   both valid separators, but the "=>" has additional stringification
	   semantics.  Hence it's important to know which was used in each
	   case.

	   To solve this problem the "<leftop:...>" and "<rightop:...>"
	   directives do return the operator(s) as well, under two
	   circumstances.  The first case is where the operator is specified
	   as a subrule. In that instance, whatever the operator matches is
	   returned (on the assumption that if the operator is important
	   enough to have its own subrule, then it's important enough to
	   return).

	   The second case is where the operator is specified as a regular
	   expression. In that case, if the first bracketed subpattern of the
	   regular expression matches, that matching value is returned (this
	   is analogous to the behaviour of the Perl "split" function, except
	   that only the first subpattern is returned).

	   In other words, given the input:

		   ( a=>1, b=>2 )

	   the specifications:

		   list:      '('  <leftop: list_item separator list_item>  ')'

		   separator: ',' | '=>'

	   or:

		   list:      '('  <leftop: list_item /(,|=>)/ list_item>  ')'

	   cause the list separators to be interleaved with the operands in
	   the anonymous array in $item[2]:

		   [ 'a', '=>', '1', ',', 'b', '=>', '2' ]

	   But the following version:

		   list:      '('  <leftop: list_item /,|=>/ list_item>	 ')'

	   returns only the operators:

		   [ 'a', '1', 'b', '2' ]

	   Of course, none of the above specifications handle the case of an
	   empty list, since the "<leftop:...>" and "<rightop:...>" directives
	   require at least a single right or left operand to match. To
	   specify that the operator can match "trivially", it's necessary to
	   add a "(?)" qualifier to the directive:

		   list:      '('  <leftop: list_item /(,|=>)/ list_item>(?)  ')'

	   Note that in almost all the above examples, the first and third
	   arguments of the "<leftop:...>" directive were the same subrule.
	   That is because "<leftop:...>"'s are frequently used to specify
	   "separated" lists of the same type of item. To make such lists
	   easier to specify, the following syntax:

		   list:   element(s /,/)

	   is exactly equivalent to:

		   list:   <leftop: element /,/ element>

	   Note that the separator must be specified as a raw pattern (i.e.
	   not a string or subrule).

       Scored productions
	   By default, Parse::RecDescent grammar rules always accept the first
	   production that matches the input. But if two or more productions
	   may potentially match the same input, choosing the first that does
	   so may not be optimal.

	   For example, if you were parsing the sentence "time flies like an
	   arrow", you might use a rule like this:

		   sentence: verb noun preposition article noun { [@item] }
			   | adjective noun verb article noun	{ [@item] }
			   | noun verb preposition article noun { [@item] }

	   Each of these productions matches the sentence, but the third one
	   is the most likely interpretation. However, if the sentence had
	   been "fruit flies like a banana", then the second production is
	   probably the right match.

	   To cater for such situtations, the "<score:...>" can be used.  The
	   directive is equivalent to an unconditional "<reject>", except that
	   it allows you to specify a "score" for the current production. If
	   that score is numerically greater than the best score of any
	   preceding production, the current production is cached for later
	   consideration. If no later production matches, then the cached
	   production is treated as having matched, and the value of the item
	   immediately before its "<score:...>" directive is returned as the
	   result.

	   In other words, by putting a "<score:...>" directive at the end of
	   each production, you can select which production matches using
	   criteria other than specification order. For example:

		   sentence: verb noun preposition article noun { [@item] } <score: sensible(@item)>
			   | adjective noun verb article noun	{ [@item] } <score: sensible(@item)>
			   | noun verb preposition article noun { [@item] } <score: sensible(@item)>

	   Now, when each production reaches its respective "<score:...>"
	   directive, the subroutine "sensible" will be called to evaluate the
	   matched items (somehow). Once all productions have been tried, the
	   one which "sensible" scored most highly will be the one that is
	   accepted as a match for the rule.

	   The variable $score always holds the current best score of any
	   production, and the variable $score_return holds the corresponding
	   return value.

	   As another example, the following grammar matches lines that may be
	   separated by commas, colons, or semi-colons. This can be tricky if
	   a colon-separated line also contains commas, or vice versa. The
	   grammar resolves the ambiguity by selecting the rule that results
	   in the fewest fields:

		   line: seplist[sep=>',']  <score: -@{$item[1]}>
		       | seplist[sep=>':']  <score: -@{$item[1]}>
		       | seplist[sep=>" "]  <score: -@{$item[1]}>

		   seplist: <skip:""> <leftop: /[^$arg{sep}]*/ "$arg{sep}" /[^$arg{sep}]*/>

	   Note the use of negation within the "<score:...>" directive to
	   ensure that the seplist with the most items gets the lowest score.

	   As the above examples indicate, it is often the case that all
	   productions in a rule use exactly the same "<score:...>" directive.
	   It is tedious to have to repeat this identical directive in every
	   production, so Parse::RecDescent also provides the
	   "<autoscore:...>" directive.

	   If an "<autoscore:...>" directive appears in any production of a
	   rule, the code it specifies is used as the scoring code for every
	   production of that rule, except productions that already end with
	   an explicit "<score:...>" directive. Thus the rules above could be
	   rewritten:

		   line: <autoscore: -@{$item[1]}>
		   line: seplist[sep=>',']
		       | seplist[sep=>':']
		       | seplist[sep=>" "]

		   sentence: <autoscore: sensible(@item)>
			   | verb noun preposition article noun { [@item] }
			   | adjective noun verb article noun	{ [@item] }
			   | noun verb preposition article noun { [@item] }

	   Note that the "<autoscore:...>" directive itself acts as an
	   unconditional "<reject>", and (like the "<rulevar:...>" directive)
	   is pruned at compile-time wherever possible.

       Dispensing with grammar checks
	   During the compilation phase of parser construction,
	   Parse::RecDescent performs a small number of checks on the grammar
	   it's given. Specifically it checks that the grammar is not left-
	   recursive, that there are no "insatiable" constructs of the form:

		   rule: subrule(s) subrule

	   and that there are no rules missing (i.e. referred to, but never
	   defined).

	   These checks are important during development, but can slow down
	   parser construction in stable code. So Parse::RecDescent provides
	   the <nocheck> directive to turn them off. The directive can only
	   appear before the first rule definition, and switches off checking
	   throughout the rest of the current grammar.

	   Typically, this directive would be added when a parser has been
	   thoroughly tested and is ready for release.

       Subrule argument lists

       It is occasionally useful to pass data to a subrule which is being
       invoked. For example, consider the following grammar fragment:

	       classdecl: keyword decl

	       keyword:	  'struct' | 'class';

	       decl:	  # WHATEVER

       The "decl" rule might wish to know which of the two keywords was used
       (since it may affect some aspect of the way the subsequent declaration
       is interpreted). "Parse::RecDescent" allows the grammar designer to
       pass data into a rule, by placing that data in an argument list (that
       is, in square brackets) immediately after any subrule item in a
       production. Hence, we could pass the keyword to "decl" as follows:

	       classdecl: keyword decl[ $item[1] ]

	       keyword:	  'struct' | 'class';

	       decl:	  # WHATEVER

       The argument list can consist of any number (including zero!) of comma-
       separated Perl expressions. In other words, it looks exactly like a
       Perl anonymous array reference. For example, we could pass the keyword,
       the name of the surrounding rule, and the literal 'keyword' to "decl"
       like so:

	       classdecl: keyword decl[$item[1],$item[0],'keyword']

	       keyword:	  'struct' | 'class';

	       decl:	  # WHATEVER

       Within the rule to which the data is passed ("decl" in the above
       examples) that data is available as the elements of a local variable
       @arg. Hence "decl" might report its intentions as follows:

	       classdecl: keyword decl[$item[1],$item[0],'keyword']

	       keyword:	  'struct' | 'class';

	       decl:	  { print "Declaring $arg[0] (a $arg[2])\n";
			    print "(this rule called by $arg[1])" }

       Subrule argument lists can also be interpreted as hashes, simply by
       using the local variable %arg instead of @arg. Hence we could rewrite
       the previous example:

	       classdecl: keyword decl[keyword => $item[1],
				       caller  => $item[0],
				       type    => 'keyword']

	       keyword:	  'struct' | 'class';

	       decl:	  { print "Declaring $arg{keyword} (a $arg{type})\n";
			    print "(this rule called by $arg{caller})" }

       Both @arg and %arg are always available, so the grammar designer may
       choose whichever convention (or combination of conventions) suits best.

       Subrule argument lists are also useful for creating "rule templates"
       (especially when used in conjunction with the "<matchrule:...>"
       directive). For example, the subrule:

	       list:	 <matchrule:$arg{rule}> /$arg{sep}/ list[%arg]
			       { $return = [ $item[1], @{$item[3]} ] }
		   |	 <matchrule:$arg{rule}>
			       { $return = [ $item[1]] }

       is a handy template for the common problem of matching a separated
       list.  For example:

	       function: 'func' name '(' list[rule=>'param',sep=>';'] ')'

	       param:	 list[rule=>'name',sep=>','] ':' typename

	       name:	 /\w+/

	       typename: name

       When a subrule argument list is used with a repeated subrule, the
       argument list goes before the repetition specifier:

	       list:   /some|many/ thing[ $item[1] ](s)

       The argument list is "late bound". That is, it is re-evaluated for
       every repetition of the repeated subrule.  This means that each
       repeated attempt to match the subrule may be passed a completely
       different set of arguments if the value of the expression in the
       argument list changes between attempts. So, for example, the grammar:

	       { $::species = 'dogs' }

	       pair:   'two' animal[$::species](s)

	       animal: /$arg[0]/ { $::species = 'cats' }

       will match the string "two dogs cats cats" completely, whereas it will
       only match the string "two dogs dogs dogs" up to the eighth letter. If
       the value of the argument list were "early bound" (that is, evaluated
       only the first time a repeated subrule match is attempted), one would
       expect the matching behaviours to be reversed.

       Of course, it is possible to effectively "early bind" such argument
       lists by passing them a value which does not change on each repetition.
       For example:

	       { $::species = 'dogs' }

	       pair:   'two' { $::species } animal[$item[2]](s)

	       animal: /$arg[0]/ { $::species = 'cats' }

       Arguments can also be passed to the start rule, simply by appending
       them to the argument list with which the start rule is called (after
       the "line number" parameter). For example, given:

	       $parser = new Parse::RecDescent ( $grammar );

	       $parser->data($text, 1, "str", 2, \@arr);

	       #	     ^^^^^  ^  ^^^^^^^^^^^^^^^
	       #	       |    |	      |
	       # TEXT TO BE PARSED  |	      |
	       # STARTING LINE NUMBER	      |
	       # ELEMENTS OF @arg WHICH IS PASSED TO RULE data

       then within the productions of the rule "data", the array @arg will
       contain "("str", 2, \@arr)".

       Alternations

       Alternations are implicit (unnamed) rules defined as part of a
       production. An alternation is defined as a series of '|'-separated
       productions inside a pair of round brackets. For example:

	       character: 'the' ( good | bad | ugly ) /dude/

       Every alternation implicitly defines a new subrule, whose
       automatically-generated name indicates its origin:
       "_alternation_<I>_of_production_<P>_of_rule<R>" for the appropriate
       values of <I>, <P>, and <R>. A call to this implicit subrule is then
       inserted in place of the brackets. Hence the above example is merely a
       convenient short-hand for:

	       character: 'the'
			  _alternation_1_of_production_1_of_rule_character
			  /dude/

	       _alternation_1_of_production_1_of_rule_character:
			  good | bad | ugly

       Since alternations are parsed by recursively calling the parser
       generator, any type(s) of item can appear in an alternation. For
       example:

	       character: 'the' ( 'high' "plains"      # Silent, with poncho
				| /no[- ]name/	       # Silent, no poncho
				| vengeance_seeking    # Poncho-optional
				| <error>
				) drifter

       In this case, if an error occurred, the automatically generated message
       would be:

	       ERROR (line <N>): Invalid implicit subrule: Expected
				 'high' or /no[- ]name/ or generic,
				 but found "pacifist" instead

       Since every alternation actually has a name, it's even possible to
       extend or replace them:

	       parser->Replace(
		       "_alternation_1_of_production_1_of_rule_character:
			       'generic Eastwood'"
			       );

       More importantly, since alternations are a form of subrule, they can be
       given repetition specifiers:

	       character: 'the' ( good | bad | ugly )(?) /dude/

       Incremental Parsing

       "Parse::RecDescent" provides two methods - "Extend" and "Replace" -
       which can be used to alter the grammar matched by a parser. Both
       methods take the same argument as "Parse::RecDescent::new", namely a
       grammar specification string

       "Parse::RecDescent::Extend" interprets the grammar specification and
       adds any productions it finds to the end of the rules for which they
       are specified. For example:

	       $add = "name: 'Jimmy-Bob' | 'Bobby-Jim'\ndesc: colour /necks?/";
	       parser->Extend($add);

       adds two productions to the rule "name" (creating it if necessary) and
       one production to the rule "desc".

       "Parse::RecDescent::Replace" is identical, except that it first resets
       are rule specified in the additional grammar, removing any existing
       productions.  Hence after:

	       $add = "name: 'Jimmy-Bob' | 'Bobby-Jim'\ndesc: colour /necks?/";
	       parser->Replace($add);

       are are only valid "name"s and the one possible description.

       A more interesting use of the "Extend" and "Replace" methods is to call
       them inside the action of an executing parser. For example:

	       typedef: 'typedef' type_name identifier ';'
			      { $thisparser->Extend("type_name: '$item[3]'") }
		      | <error>

	       identifier: ...!type_name /[A-Za-z_]w*/

       which automatically prevents type names from being typedef'd, or:

	       command: 'map' key_name 'to' abort_key
			      { $thisparser->Replace("abort_key: '$item[2]'") }
		      | 'map' key_name 'to' key_name
			      { map_key($item[2],$item[4]) }
		      | abort_key
			      { exit if confirm("abort?") }

	       abort_key: 'q'

	       key_name: ...!abort_key /[A-Za-z]/

       which allows the user to change the abort key binding, but not to
       unbind it.

       The careful use of such constructs makes it possible to reconfigure a a
       running parser, eliminating the need for semantic feedback by providing
       syntactic feedback instead. However, as currently implemented,
       "Replace()" and "Extend()" have to regenerate and re-"eval" the entire
       parser whenever they are called. This makes them quite slow for large
       grammars.

       In such cases, the judicious use of an interpolated regex is likely to
       be far more efficient:

	       typedef: 'typedef' type_name/ identifier ';'
			      { $thisparser->{local}{type_name} .= "|$item[3]" }
		      | <error>

	       identifier: ...!type_name /[A-Za-z_]w*/

	       type_name: /$thisparser->{local}{type_name}/

       Precompiling parsers

       Normally Parse::RecDescent builds a parser from a grammar at run-time.
       That approach simplifies the design and implementation of parsing code,
       but has the disadvantage that it slows the parsing process down - you
       have to wait for Parse::RecDescent to build the parser every time the
       program runs. Long or complex grammars can be particularly slow to
       build, leading to unacceptable delays at start-up.

       To overcome this, the module provides a way of "pre-building" a parser
       object and saving it in a separate module. That module can then be used
       to create clones of the original parser.

       A grammar may be precompiled using the "Precompile" class method.  For
       example, to precompile a grammar stored in the scalar $grammar, and
       produce a class named PreGrammar in a module file named PreGrammar.pm,
       you could use:

	       use Parse::RecDescent;

	       Parse::RecDescent->Precompile($grammar, "PreGrammar");

       The first argument is the grammar string, the second is the name of the
       class to be built. The name of the module file is generated
       automatically by appending ".pm" to the last element of the class name.
       Thus

	       Parse::RecDescent->Precompile($grammar, "My::New::Parser");

       would produce a module file named Parser.pm.

       It is somewhat tedious to have to write a small Perl program just to
       generate a precompiled grammar class, so Parse::RecDescent has some
       special magic that allows you to do the job directly from the command-
       line.

       If your grammar is specified in a file named grammar, you can generate
       a class named Yet::Another::Grammar like so:

	       > perl -MParse::RecDescent - grammar Yet::Another::Grammar

       This would produce a file named Grammar.pm containing the full
       definition of a class called Yet::Another::Grammar. Of course, to use
       that class, you would need to put the Grammar.pm file in a directory
       named Yet/Another, somewhere in your Perl include path.

       Having created the new class, it's very easy to use it to build a
       parser. You simply "use" the new module, and then call its "new" method
       to create a parser object. For example:

	       use Yet::Another::Grammar;
	       my $parser = Yet::Another::Grammar->new();

       The effect of these two lines is exactly the same as:

	       use Parse::RecDescent;

	       open GRAMMAR_FILE, "grammar" or die;
	       local $/;
	       my $grammar = <GRAMMAR_FILE>;

	       my $parser = Parse::RecDescent->new($grammar);

       only considerably faster.

       Note however that the parsers produced by either approach are exactly
       the same, so whilst precompilation has an effect on set-up speed, it
       has no effect on parsing speed. RecDescent 2.0 will address that
       problem.

       A Metagrammar for "Parse::RecDescent"

       The following is a specification of grammar format accepted by
       "Parse::RecDescent::new" (specified in the "Parse::RecDescent" grammar
       format!):

	grammar	   : components(s)

	component  : rule | comment

	rule	   : "\n" identifier ":" production(s?)

	production : items(s)

	item	   : lookahead(?) simpleitem
		   | directive
		   | comment

	lookahead  : '...' | '...!'		      # +'ve or -'ve lookahead

	simpleitem : subrule args(?)		      # match another rule
		   | repetition			      # match repeated subrules
		   | terminal			      # match the next input
		   | bracket args(?)		      # match alternative items
		   | action			      # do something

	subrule	   : identifier			      # the name of the rule

	args	   : {extract_codeblock($text,'[]')}  # just like a [...] array ref

	repetition : subrule args(?) howoften

	howoften   : '(?)'			      # 0 or 1 times
		   | '(s?)'			      # 0 or more times
		   | '(s)'			      # 1 or more times
		   | /(\d+)[.][.](/\d+)/	      # $1 to $2 times
		   | /[.][.](/\d*)/		      # at most $1 times
		   | /(\d*)[.][.])/		      # at least $1 times

	terminal   : /[/]([\][/]|[^/])*[/]/	      # interpolated pattern
		   | /"([\]"|[^"])*"/		      # interpolated literal
		   | /'([\]'|[^'])*'/		      # uninterpolated literal

	action	   : { extract_codeblock($text) }     # embedded Perl code

	bracket	   : '(' Item(s) production(s?) ')'   # alternative subrules

	directive  : '<commit>'			      # commit to production
		   | '<uncommit>'		      # cancel commitment
		   | '<resync>'			      # skip to newline
		   | '<resync:' pattern '>'	      # skip <pattern>
		   | '<reject>'			      # fail this production
		   | '<reject:' condition '>'	      # fail if <condition>
		   | '<error>'			      # report an error
		   | '<error:' string '>'	      # report error as "<string>"
		   | '<error?>'			      # error only if committed
		   | '<error?:' string '>'	      #	  "    "    "	 "
		   | '<rulevar:' /[^>]+/ '>'	      # define rule-local variable
		   | '<matchrule:' string '>'	      # invoke rule named in string

	identifier : /[a-z]\w*/i		      # must start with alpha

	comment	   : /#[^\n]*/			      # same as Perl

	pattern	   : {extract_bracketed($text,'<')}   # allow embedded "<..>"

	condition  : {extract_codeblock($text,'{<')}  # full Perl expression

	string	   : {extract_variable($text)}	      # any Perl variable
		   | {extract_quotelike($text)}	      #	  or quotelike string
		   | {extract_bracketed($text,'<')}   #	  or balanced brackets

GOTCHAS
       This section describes common mistakes that grammar writers seem to
       make on a regular basis.

       1. Expecting an error to always invalidate a parse

       A common mistake when using error messages is to write the grammar like
       this:

	       file: line(s)

	       line: line_type_1
		   | line_type_2
		   | line_type_3
		   | <error>

       The expectation seems to be that any line that is not of type 1, 2 or 3
       will invoke the "<error>" directive and thereby cause the parse to
       fail.

       Unfortunately, that only happens if the error occurs in the very first
       line.  The first rule states that a "file" is matched by one or more
       lines, so if even a single line succeeds, the first rule is completely
       satisfied and the parse as a whole succeeds. That means that any error
       messages generated by subsequent failures in the "line" rule are
       quietly ignored.

       Typically what's really needed is this:

	       file: line(s) eofile    { $return = $item[1] }

	       line: line_type_1
		   | line_type_2
		   | line_type_3
		   | <error>

	       eofile: /^\Z/

       The addition of the "eofile" subrule  to the first production means
       that a file only matches a series of successful "line" matches that
       consume the complete input text. If any input text remains after the
       lines are matched, there must have been an error in the last "line". In
       that case the "eofile" rule will fail, causing the entire "file" rule
       to fail too.

       Note too that "eofile" must match "/^\Z/" (end-of-text), not "/^\cZ/"
       or "/^\cD/" (end-of-file).

       And don't forget the action at the end of the production. If you just
       write:

	       file: line(s) eofile

       then the value returned by the "file" rule will be the value of its
       last item: "eofile". Since "eofile" always returns an empty string on
       success, that will cause the "file" rule to return that empty string.
       Apart from returning the wrong value, returning an empty string will
       trip up code such as:

	       $parser->file($filetext) || die;

       (since "" is false).

       Remember that Parse::RecDescent returns undef on failure, so the only
       safe test for failure is:

	       defined($parser->file($filetext)) || die;

DIAGNOSTICS
       Diagnostics are intended to be self-explanatory (particularly if you
       use -RD_HINT (under perl -s) or define $::RD_HINT inside the program).

       "Parse::RecDescent" currently diagnoses the following:

       ·   Invalid regular expressions used as pattern terminals (fatal
	   error).

       ·   Invalid Perl code in code blocks (fatal error).

       ·   Lookahead used in the wrong place or in a nonsensical way (fatal
	   error).

       ·   "Obvious" cases of left-recursion (fatal error).

       ·   Missing or extra components in a "<leftop>" or "<rightop>"
	   directive.

       ·   Unrecognisable components in the grammar specification (fatal
	   error).

       ·   "Orphaned" rule components specified before the first rule (fatal
	   error) or after an "<error>" directive (level 3 warning).

       ·   Missing rule definitions (this only generates a level 3 warning,
	   since you may be providing them later via
	   "Parse::RecDescent::Extend()").

       ·   Instances where greedy repetition behaviour will almost certainly
	   cause the failure of a production (a level 3 warning - see "ON-
	   GOING ISSUES AND FUTURE DIRECTIONS" below).

       ·   Attempts to define rules named 'Replace' or 'Extend', which cannot
	   be called directly through the parser object because of the
	   predefined meaning of "Parse::RecDescent::Replace" and
	   "Parse::RecDescent::Extend". (Only a level 2 warning is generated,
	   since such rules can still be used as subrules).

       ·   Productions which consist of a single "<error?>" directive, and
	   which therefore may succeed unexpectedly (a level 2 warning, since
	   this might conceivably be the desired effect).

       ·   Multiple consecutive lookahead specifiers (a level 1 warning only,
	   since their effects simply accumulate).

       ·   Productions which start with a "<reject>" or "<rulevar:...>"
	   directive. Such productions are optimized away (a level 1 warning).

       ·   Rules which are autogenerated under $::AUTOSTUB (a level 1
	   warning).

AUTHOR
       Damian Conway (damian@conway.org)

BUGS AND IRRITATIONS
       There are undoubtedly serious bugs lurking somewhere in this much code
       :-) Bug reports and other feedback are most welcome.

       Ongoing annoyances include:

       ·   There's no support for parsing directly from an input stream.  If
	   and when the Perl Gods give us regular expressions on streams, this
	   should be trivial (ahem!) to implement.

       ·   The parser generator can get confused if actions aren't properly
	   closed or if they contain particularly nasty Perl syntax errors
	   (especially unmatched curly brackets).

       ·   The generator only detects the most obvious form of left recursion
	   (potential recursion on the first subrule in a rule). More subtle
	   forms of left recursion (for example, through the second item in a
	   rule after a "zero" match of a preceding "zero-or-more" repetition,
	   or after a match of a subrule with an empty production) are not
	   found.

       ·   Instead of complaining about left-recursion, the generator should
	   silently transform the grammar to remove it. Don't expect this
	   feature any time soon as it would require a more sophisticated
	   approach to parser generation than is currently used.

       ·   The generated parsers don't always run as fast as might be wished.

       ·   The meta-parser should be bootstrapped using "Parse::RecDescent"
	   :-)

ON-GOING ISSUES AND FUTURE DIRECTIONS
       1.  Repetitions are "incorrigibly greedy" in that they will eat
	   everything they can and won't backtrack if that behaviour causes a
	   production to fail needlessly.  So, for example:

		   rule: subrule(s) subrule

	   will never succeed, because the repetition will eat all the
	   subrules it finds, leaving none to match the second item. Such
	   constructions are relatively rare (and "Parse::RecDescent::new"
	   generates a warning whenever they occur) so this may not be a
	   problem, especially since the insatiable behaviour can be overcome
	   "manually" by writing:

		   rule: penultimate_subrule(s) subrule

		   penultimate_subrule: subrule ...subrule

	   The issue is that this construction is exactly twice as expensive
	   as the original, whereas backtracking would add only 1/N to the
	   cost (for matching N repetitions of "subrule"). I would welcome
	   feedback on the need for backtracking; particularly on cases where
	   the lack of it makes parsing performance problematical.

       2.  Having opened that can of worms, it's also necessary to consider
	   whether there is a need for non-greedy repetition specifiers.
	   Again, it's possible (at some cost) to manually provide the
	   required functionality:

		   rule: nongreedy_subrule(s) othersubrule

		   nongreedy_subrule: subrule ...!othersubrule

	   Overall, the issue is whether the benefit of this extra
	   functionality outweighs the drawbacks of further complicating the
	   (currently minimalist) grammar specification syntax, and (worse)
	   introducing more overhead into the generated parsers.

       3.  An "<autocommit>" directive would be nice. That is, it would be
	   useful to be able to say:

		   command: <autocommit>
		   command: 'find' name
			  | 'find' address
			  | 'do' command 'at' time 'if' condition
			  | 'do' command 'at' time
			  | 'do' command
			  | unusual_command

	   and have the generator work out that this should be "pruned" thus:

		   command: 'find' name
			  | 'find' <commit> address
			  | 'do' <commit> command <uncommit>
				   'at' time
				   'if' <commit> condition
			  | 'do' <commit> command <uncommit>
				   'at' <commit> time
			  | 'do' <commit> command
			  | unusual_command

	   There are several issues here. Firstly, should the "<autocommit>"
	   automatically install an "<uncommit>" at the start of the last
	   production (on the grounds that the "command" rule doesn't know
	   whether an "unusual_command" might start with "find" or "do") or
	   should the "unusual_command" subgraph be analysed (to see if it
	   might be viable after a "find" or "do")?

	   The second issue is how regular expressions should be treated. The
	   simplest approach would be simply to uncommit before them (on the
	   grounds that they might match). Better efficiency would be obtained
	   by analyzing all preceding literal tokens to determine whether the
	   pattern would match them.

	   Overall, the issues are: can such automated "pruning" approach a
	   hand-tuned version sufficiently closely to warrant the extra set-up
	   expense, and (more importantly) is the problem important enough to
	   even warrant the non-trivial effort of building an automated
	   solution?

COPYRIGHT
       Copyright (c) 1997-2000, Damian Conway. All Rights Reserved.  This
       module is free software. It may be used, redistributed and/or modified
       under the terms of the Perl Artistic License
	 (see http://www.perl.com/perl/misc/Artistic.html)

perl v5.10.0			  2003-04-09		  Parse::RecDescent(3)
[top]

List of man pages available for Peanut

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net