yacc man page on DigitalUNIX

Man page or keyword search:  
man Server   12896 pages
apropos Keyword Search (all sections)
Output format
DigitalUNIX logo
[printable version]

yacc(1)								       yacc(1)

NAME
       yacc  -	Generates  an LR(1) parsing program from input consisting of a
       context-free grammar specification

SYNOPSIS
       yacc [-vltds] [-b prefix] [-N number] [-p symbol_prefix]	 [-P pathname]
       grammar

STANDARDS
       Interfaces  documented on this reference page conform to industry stan‐
       dards as follows:

       yacc:  XPG4, XPG4-UNIX

       Refer to the standards(5) reference page	 for  more  information	 about
       industry standards and associated tags.

OPTIONS
       Uses  prefix  instead of y as the prefix for all output filenames (pre‐
       fix.tab.c, prefix.tab.h, and prefix.output).   Produces	the  <y.tab.h>
       file,  which  contains  the #define statements that associate the yacc-
       assigned token codes with your token names. This	 allows	 source	 files
       other  than  y.tab.c to access the token codes by including this header
       file.  Includes no #line constructs in y.tab.c. Use this only after the
       grammar	and associated actions are fully debugged.  [Tru64 UNIX]  Pro‐
       vides yacc with extra storage for building its LALR tables,  which  may
       be  necessary  when compiling very large grammars. The number should be
       larger than 40,000 when you use	this  option.	Allows	multiple  yacc
       parsers	to be linked together. Use symbol_prefix instead of yy to pre‐
       fix global symbols.   [Tru64  UNIX]  Specifies  an  alternative	parser
       (instead	 of /usr/ccs/lib/yaccpar). The pathname specifies the filename
       of the skeleton to be used in place of yaccpar).	 [Tru64	 UNIX]	Breaks
       the yyparse() function into several smaller functions. Because its size
       is somewhat proportional to that of the grammar,	 it  is	 possible  for
       yyparse()  to  become  too large to compile, optimize, or execute effi‐
       ciently.	 Compiles run-time debugging code. By default,	this  code  is
       not  included when y.tab.c is compiled. If YYDEBUG has a nonzero value,
       the C compiler (cc) includes the debugging code, whether or not the  -t
       option  was  used. Without compiling this code, yyparse() will run more
       quickly.	  Produces  the	 y.output  file,  which	 contains  a  readable
       description  of	the parsing tables and a report on conflicts generated
       by grammar ambiguities.

OPERANDS
       The pathname of a file containing input	instructions.  The  format  of
       this file is described in the DESCRIPTION section.

DESCRIPTION
       The  yacc  command converts a context-free grammar specification into a
       set of tables for a simple automaton that  executes  an	LR(1)  parsing
       algorithm.  The	yacc  grammar  can  be ambiguous; specified precedence
       rules are used to break ambiguities.

       You must compile the y.tab.c output file with a C language compiler  to
       produce	the  yyparse()	function.  This function must be loaded with a
       yylex lexical analyzer function, as well as two routines that you  must
       provide,	 main() and an error-handling routine, yyerror(). The lex com‐
       mand is useful for creating lexical analyzers usable by yacc.

       The  yacc  program  reads   its	 skeleton   parser   from   the	  file
       /usr/ccs/lib/yaccpar.  Use  the environment variable YACCPAR to specify
       another location for the yacc program to read from.  If	you  use  this
       environment variable, the -P option is ignored, if specified.

       The general format of the yacc input file is as follows:

       [definitions] %% rules [%% [user subroutines]]

       where Is the section where you define the variables to be used later in
       the grammar, such as in the rules section. It is also where  files  are
       included	 (#include)  and processing conditions are defined.  This sec‐
       tion is optional.  Is the section that contains grammar rules  for  the
       parser.	 A  yacc input file must have a rules section.	Is the section
       that contains user-supplied subroutines that can be used by the actions
       in the rules section. This section is optional.

       Comments, in C syntax, can appear anywhere in the user subroutines sec‐
       tion or the definitions section. In the	rules  section,	 comments  can
       appear wherever a symbol is allowed. Blank lines or lines consisting of
       white space can be inserted anywhere in the file, and are ignored.  The
       NULL character must not be used in grammar rules or literals.

   Definitions Section of Input File
       The definitions section of a yacc input file contains entries that per‐
       form the	 following  functions:	Includes  standard  I/O	 header	 file.
       Defines	global variables.  Defines the list rule as the place to start
       processing.  Defines the tokens used by the parser.  Defines the opera‐
       tors and their precedence.

       Each  line  in  the definitions section can be: When placed on lines by
       themselves, these enclose C code to be passed into the  global  defini‐
       tions  of  the  output  file.  Such lines commonly include preprocessor
       directives and declarations of external variables and functions.	 Lists
       tokens  or  terminal  symbols to be used in the rest of the input file.
       This line is needed for tokens that do not appear in  other  %  defini‐
       tions.  If  type	 is present, the C type for all tokens on this line is
       declared to be the type referenced by type. If a positive integer  num‐
       ber  follows  a	token, that value is assigned to the token.  Indicates
       that each token is an operator, all  tokens  in	this  definition  have
       equal precedence, and a succession of the operators listed in this def‐
       inition are evaluated left to right.  Indicates that each token	is  an
       operator, that all tokens in this definition have equal precedence, and
       that a succession of the operators listed in this definition are evalu‐
       ated right to left.  Indicates that each token is an operator, and that
       the operators listed in this definition cannot  appear  in  succession.
       Indicates  that	the token cannot be used associatively.	 Indicates the
       highest-level production rule to be reduced; in other words,  the  rule
       where  the parser can consider its work done and can terminate process‐
       ing. If this definition is not included, the parser uses the first pro‐
       duction	rule.  The symbol must be non-terminal (not a token).  Defines
       each symbol as data type type, to resolve  ambiguities.	If  this  con‐
       struct  is  present,  yacc performs type checking and otherwise assumes
       all symbols to be of type integer.  Defines the yylval global  variable
       as a union, where union-def is a standard C definition in the format: {
       type member ; [type member ; ...] }

	      At least one member should be an int. Any valid C data type  can
	      be  defined, including structures. When you run yacc with the -d
	      option, the definition of yylval is placed in the <y.tab.h> file
	      and can be referred to in a lex input file.

       Every  token (non-terminal symbol) must be listed in one of the preced‐
       ing % definitions. Multiple tokens can be separated by white  space  or
       commas.	All the tokens in %left, %right, and %nonassoc definitions are
       assigned a precedence with tokens in later  definitions	having	prece‐
       dence over those in earlier definitions.

       In  addition  to	 symbols, a token can be literal character enclosed in
       single quotes. (Multibyte characters are recognized by the lexical ana‐
       lyzer  and returned as tokens.) The following special characters can be
       used, just as in C programs: Alert Newline Tab  Vertical	 tab  Carriage
       Return  Backspace Form Feed Backslash Single Quote Question mark One or
       more octal digits specifying the integer value of the character

   Rules Section of Input File
       The rules section of a yacc input file defines the rules that parse the
       input  stream.  It  consists  of	 a series of production rules that the
       parser tries to reduce. The format of each production rule is:

       symbol : symbol-sequence [action] [| symbol-sequence [action] ...] ;

       A symbol-sequence consists of zero or more symbols separated  by	 white
       space.  The  first  symbol must be the first character of the line, but
       newlines and other white space can appear anywhere else	in  the	 rule.
       All terminal symbols must be declared in %token definitions.

       Each  symbol-sequence  represents  an  alternative  way of reducing the
       rule. A symbol can appear recursively in	 its  own  rule.   Always  use
       left-recursion (where the recursive symbol appears before the terminat‐
       ing case in symbol-sequence).

       The following sequence indicates that the current sequence  of  symbols
       is  to be preferred over others, at the level of precedence assigned to
       token in the definitions section of the input file:

       %prec token

       The specially defined token error matches any unrecognized sequence  of
       input.  This token causes the parser to invoke the yyerror function. By
       default, the parser tries to synchronize with the  input	 and  continue
       processing it by reading and discarding all input up to the symbol fol‐
       lowing error. (You can  override	 this  behavior	 through  the  yyerrok
       action.)	 If  no error token appears in the yacc input file, the parser
       exits with an error message upon encountering unrecognized input.

       The parser always executes action after encountering  the  symbol  that
       precedes	 it.  Thus,  an	 action	 can appear in the middle of a symbol-
       sequence, after each symbol-sequence, or after  multiple	 instances  of
       symbol-sequence.	 In  the last case, action is executed when the parser
       matches any of the sequences.

       The action consists of standard C code within braces and can also  take
       the  following  values, variables, and keywords.	 If the token returned
       by the yylex function is associated with	 a  significant	 value,	 yylex
       should  place  the value in this global variable. By default, yylval is
       of type long. The definitions section can include a  %union  definition
       to  associate  with  other data types, including structures. If you run
       yacc with the -d option, the full yylval definition is passed into  the
       <y.tab.h>  file	for access by lex.  Causes the parser to start parsing
       tokens immediately after an erroneous sequence, instead	of  performing
       the  default  action  of reading and discarding tokens up to a synchro‐
       nization token. The yyerrok action should appear immediately after  the
       error  token.   Refers  to  symbol  n, a token index in the production,
       counting from the beginning of the production  rule,  where  the	 first
       symbol  after  the colon is $1. The type variable is the name of one of
       the union lines listed in the %union directive in the declaration  sec‐
       tion. The <type> syntax (non-standard) allows the value to be cast to a
       specific data type. Note that you will rarely need to use the type syn‐
       tax.   Refers  to the value returned by the matched symbol-sequence and
       used for the matched symbol when	 reducing  other  rules.  The  symbol-
       sequence generally assigns a value to $$. The type variable is the name
       of one of the union lines listed in the %union directive in the	decla‐
       ration section. The <type> syntax (non-standard) allows the value to be
       cast to a specific data type. Note that you will rarely need to use the
       type syntax.

   User Subroutines Section of Input File
       The  user subroutines section of the yacc input file contains user-sup‐
       plied functions. Because these functions are included in this file, you
       do  not	need to use the yacc library when processing this file. If you
       supply a lexical analyzer (yylex) to the parser, it must	 be  contained
       in the user subroutines section.

       The  following  functions,  which are contained in the user subroutines
       section, are invoked within the yyparse	function  generated  by	 yacc.
       The  lexical  analyzer  called  by  yyparse  to recognize each token of
       input. Usually this function is created by  lex.	  yylex	 reads	input,
       recognizes  expressions	within	the  input, and returns a token number
       representing the kind of token read. The function returns an int value.
       A return value of 0 (zero) means the end of input.

	      If  the  parser  and  yylex do not agree on these token numbers,
	      reliable communication between them cannot occur. For  one-char‐
	      acter  literals,	the  token  is simply the numeric value of the
	      character in the current character set. The  numbers  for	 other
	      tokens can be chosen by either yacc or the user. In either case,
	      the #define construct of C is used to allow  yylex()  to	return
	      these  numbers symbolically. The #define statements are put into
	      the code file,  and  into	 the  header  file  if	that  file  is
	      requested. The set of characters permitted by yacc in an identi‐
	      fier is larger than that permitted by C. Token  names  found  to
	      contain such characters will not be included in the #define dec‐
	      larations.

	      If the token numbers are chosen by yacc, those tokens other than
	      literals	are  assigned  numbers	greater	 than 256, although no
	      order is implied. A token can be explicitly assigned a number by
	      following its first appearance in the declaration section with a
	      number. Names and literals not defined in this way retain	 their
	      default  definition.  All	 assigned token numbers are unique and
	      distinct from the token numbers used for literals. If  duplicate
	      token numbers cause conflicts in parser generation, yacc reports
	      an error; otherwise, it is unspecified whether the token assign‐
	      ment is accepted or an error is reported.

	      The  end	of  the	 input is marked by a special token called the
	      endmarker that has a token number that is zero or negative.  All
	      lexical analyzers return zero or negative as a token number upon
	      reaching the end of their input. If the tokens up	 to,  but  not
	      excluding, the endmarker form a structure that matches the start
	      symbol, the parser accepts the input. If the endmarker  is  seen
	      in  any  other context, it is considered an error.  The function
	      that the parser calls upon  encountering	an  input  error.  The
	      default function, defined in liby.a, simply prints string to the
	      standard error. The user can redefine the	 function.  The	 func‐
	      tion's  type  is void.  The wrap-up routine that returns a value
	      of 1 when the end of input occurs.

       The liby.a library contains default  main()  and	 yyerror()  functions.
       (main()	is the required main program that calls yyparse() to start the
       program.) These routines look like the following, respectively:

       main() {
	    setlocale(LC_ALL, );
	    (void) yyparse();
	    return(0); }

       int yyerror(s);
	    char *s; {
	    fprintf(stderr,"%s\n",s);
	    return (0); }

NOTES
       The LANG and LC_* variables affect the execution of the yacc command as
       stated. The main() function defined by yacc issues the following call:

       setlocale(LC_ALL, )

       As a result, the program generated by yacc will also be affected by the
       contents of these variables at run time.

       The lex program can be compiled as a C program  with  -std0,  -std,  or
       -std1  mode. It can also be compiled as a C++ program. If YY_NOPROTO is
       defined on the compilation command line, function  prototypes  are  not
       generated.

EXIT STATUS
       The  following  exit  values  are  returned: Successful completion.  An
       error occurred.

EXAMPLES
       This section describes the example programs for the lex and  yacc  com‐
       mands, which together create a simple desk calculator program that per‐
       forms addition, subtraction, multiplication, and	 division  operations.
       The  calculator	program	 also allows you to assign values to variables
       (each designated by a single lowercase ASCII letter), and then use  the
       variables  in  calculations.  The files that contain the program are as
       follows: The lex specification file that defines the  lexical  analysis
       rules.	The yacc grammar file that defines the parsing rules and calls
       the yylex() function created by lex to provide input.

       The remaining text expects that the current directory is the  directory
       that contains the lex and yacc example program files.

   Compiling the Example Program
       Perform the following steps to create the example program using lex and
       yacc: Process the yacc grammar file using the -d option. The -d	option
       tells yacc to create a file that defines the tokens it uses in addition
       to creating the C language source code file.

	      yacc -d calc.y

	      The following files are created: The C language source file that
	      yacc  created  for the parser.  A header file containing #define
	      statements for the tokens used by the parser.

	      (The *.o	files  are  created  temporarily  and  then  removed.)
	      Process the lex specification file:

	      lex calc.l

	      The  following  file is created: The C language source file that
	      lex created for the lexical analyzer.  Compile and link the  two
	      C language source files:

	      cc -o calc y.tab.c lex.yy.c

	      The  following  files  are created: The object file for y.tab.c.
	      The object file for lex.yy.c.  The executable program file.

       You can then run the program directly by entering: calc

       Then, enter numbers and operators  in  calculator  fashion.  After  you
       press  <Return>,	 the program displays the result of the operation.  If
       you assign a value to a variable as follows, the cursor	moves  to  the
       next line:

       m=4 <Return> _

       You  can	 then  use  the	 variable in calculations and it will have the
       value assigned to it:

       m+5 <Return> 9

   The Parser Source Code
       The file calc.y has entries in all three of  the	 sections  of  a  yacc
       grammar	file--declarations,  rules,  and user subroutines. It contains
       the following source code:

       %{ #include <stdio.h>

       int regs[26]; int base;

       %}

       %start list

       %token DIGIT LETTER

       %left '|' %left '&' %left '+' '-' %left '*' '/' '%' %left UMINUS /*sup‐
       plies precedence for unary minus */

       %%     /* beginning of rules section */

       list   :	     /*empty */
	      |	     list stat '\n'
	      |	     list error '\n'
		     {	      yyerrok;	      }
	      ;

       stat   :	     expr
		     {	      printf("%d\n",$1);	}
	      |	     LETTER '=' expr
		     {	      regs[$1] = $3;  }
	      ;

       expr   :	     '(' expr ')'
		     {	    $$ = $2;	    }
	      |	     expr '*' expr
		     {	      $$ = $1 * $3;	   }
	      |	     expr '/' expr
	      {	     $$ = $1 / $3;	  }
	      |	     expr '%' expr
		     {	      $$ = $1 % $3;	   }
	      |	     expr '+' expr
		     {	      $$ = $1 + $3;	   }
	      |	     expr '-' expr
		     {	      $$ = $1 - $3;	   }
	      |	     expr '&' expr
		     {	      $$ = $1 & $3;	   }
	      |	     expr '|' expr
		     {	      $$ = $1 | $3;	   }
	      |	     '-' expr %prec UMINUS
		     {	      $$ = -$2;	       }
	      |	     LETTER
		     {	      $$ = regs[$1];	    }
	      |	     number
	      ;

       number :	     DIGIT
		     {	      $$ = $1; base = ($1==0) ? 8:10;	     }
	      |	     number	   DIGIT
		     {	      $$ = base * $1 + $2;	  }
	      ;

       %%     /* beginning of user subroutines section */ main() {
	       return(yyparse()); }

       yyerror(s) char *s; {
	       fprintf(stderr,"%s\n",s); }

       yywrap() {
	       return(1); }

   The Lexical Analyzer Source Code
       The  file calc.l contains the lexical analyzer source code. It contains
       the rules used to generate the tokens from the input stream.   It  also
       contains	 include  statements for standard input and output, as well as
       for the <y.tab.h> file. The yacc program generates the  <y.tab.h>  file
       from  the  yacc grammar file information, if you use the -d option with
       the yacc command. The  file  <y.tab.h>  contains	 definitions  for  the
       tokens that the parser program uses.

       Contents of calc.1: %{

       #include	 <stdio.h>  #include  "y.tab.h"	 int c; #if !defined (YYSTYPE)
       #define YYSTYPE long #endif extern YYSTYPE yylval; %} %% " "	;  [a-
       z]   {
		      c = yytext[0];
		      yylval = c - 'a';
		      return(LETTER);
	       } [0-9]	 {
		      c = yytext[0];
		      yylval = c - '0';
		      return(DIGIT);
	       } [^a-z 0-9]	 {
		       c = yytext[0];
		       return(c);
		       }

ENVIRONMENT VARIABLES
       The  following environment variables affect the execution of yacc: Pro‐
       vides a default value for the internationalization variables  that  are
       unset  or  null. If LANG is unset or null, the corresponding value from
       the default locale is used.  If any of the  internationalization	 vari‐
       ables contain an invalid setting, the utility behaves as if none of the
       variables had been defined.  If set to a non-empty string value,	 over‐
       rides  the  values  of  all  the	 other internationalization variables.
       Determines the locale for the interpretation of sequences of  bytes  of
       text  data as characters (for example, single-byte as opposed to multi-
       byte characters in arguments and input files).  Determines  the	locale
       for  the format and contents of diagnostic messages written to standard
       error.  Determines the location of message catalogs for the  processing
       of LC_MESSAGES.

FILES
       A readable description of parsing tables and a report on conflicts gen‐
       erated by grammar ambiguities Output file Definitions for  token	 names
       Temporary  file	Temporary  file Temporary file Default skeleton parser
       for C programs The yacc library

SEE ALSO
       Commands:  lex(1)

       Standards:  standards(5)

       Programming Support Tools

								       yacc(1)
[top]

List of man pages available for DigitalUNIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net