flex man page on DigitalUNIX

Printed from http://www.polarhome.com/service/man/?qf=flex&af=0&tf=2&of=DigitalUNIX

flex(1)								       flex(1)

NAME
       flex - Generates a C Language lexical analyzer

SYNOPSIS
       flex [-bcdfinpstvFILT8] -C[efmF] [-Sskeleton] [file...]

OPTIONS
       Generates  backtracking information to lex.backtrack. This is a list of
       scanner states that require backtracking and the	 input	characters  on
       which  they do so.  By adding rules you can remove backtracking states.
       If all backtracking states are eliminated and -f or  -F	is  used,  the
       generated  scanner will run faster.  Makes the generated scanner run in
       debug  mode.   Whenever	a  pattern  is	recognized  and	  the	global
       yy_lex_debug  is	 nonzero (which is the default), the scanner writes to
       stderr a line of the form:

	      --accepting rule at line 53 ("the matched text")

	      The line number refers to the location of the rule in  the  file
	      defining the scanner (the input to lex).	Messages are also gen‐
	      erated when the scanner backtracks, accepts  the	default	 rule,
	      reaches  the  end of its input buffer (or encounters a NULL), or
	      reaches an End-of-File.  Specifies full table (no table compres‐
	      sion  is	done).	The  result  is large but fast. This option is
	      equivalent to -Cf.  Instructs flex to generate  a	 case-insensi‐
	      tive  scanner.  The case of letters given in the flex input pat‐
	      terns will be ignored, and tokens in the input will  be  matched
	      regardless  of case.  The matched text given in yytext will have
	      the original case (as read by the scanner).  Generates a perfor‐
	      mance  report  to	 stderr.  This identifies features of the flex
	      input file that will cause a loss of performance in the  result‐
	      ing  scanner.   Causes  the default rule (that unmatched scanner
	      input is echoed to stdout) to be	suppressed.   If  the  scanner
	      encounters input that does not match any of its rules, it aborts
	      with an error.  Instructs flex to write the scanner it generates
	      to  standard  output  instead  of lex.yy.c.  Specifies that flex
	      should write to stderr a summary	of  statistics	regarding  the
	      scanner  it  generates.	Specifies  that the fast scanner table
	      representation should be used.  This representation is about  as
	      fast as the full table representation (-f), and for some sets of
	      patterns will be considerably smaller (and for others,  larger).
	      This option is equivalent to -CF.	 Instructs flex to generate an
	      interactive scanner; that is, a scanner that  stops  immediately
	      rather than looking ahead if it knows that the currently scanned
	      text cannot be part of a longer rule's match. Note, -I cannot be
	      used  in	conjunction with full or fast tables; that is, the -f,
	      -F, -Cf, or -CF options.	Instructs flex not to  generate	 #line
	      directives  in  lex.yy.c. The default is to generate such direc‐
	      tives so error messages in the actions will be correctly located
	      with  respect to the original lex input file.  Makes flex run in
	      trace mode.  It will generate a lot of messages to  stdout  con‐
	      cerning the form of the input and the resultant nondeterministic
	      and deterministic finite automata.  This option  is  mostly  for
	      use  in  maintaining  flex.  Instructs flex to generate an 8-bit
	      scanner (which is the default).  Controls the  degree  of	 table
	      compression.  The	 default  setting  is  -Cem which provides the
	      highest degree of table compression.  Faster-executing  scanners
	      can  be traded off at the cost of larger tables with the follow‐
	      ing generally being true:

	      Slowest and smallest

	      -Cem -Cm -Ce -C -C{f,F}e -C{f,F}

	      Fastest and largest

	      The -C options  are  not	cumulative;  whenever  the  option  is
	      encountered,  the previous -C settings are forgotten.  The -f or
	      -F and -Cm options do not	 make  sense  together;	 there	is  no
	      opportunity  for	meta-equivalence  classes  if the table is not
	      being compressed.	 Otherwise, the options may be	freely	mixed.
	      A lone -C specifies that the scanner tables should be compressed
	      and neither equivalence  classes	nor  meta-equivalence  classes
	      should  be used.	Directs flex to construct equivalence classes;
	      for example, sets of  characters	that  have  identical  lexical
	      properties. Equivalence classes usually give dramatic reductions
	      in the final table/object file sizes (typically a factor of 2 to
	      5)  and  are inexpensive performance-wise (one array look-up per
	      character scanned).  Directs flex to construct  meta-equivalence
	      classes,	which  are sets of equivalence classes (or characters,
	      if equivalence classes are not being  used)  that	 are  commonly
	      used  together.	Meta-equivalence  classes  are often a big win
	      when using compressed tables, but they have a  moderate  perfor‐
	      mance  impact  (one  or two “if” tests and one array look-up per
	      character scanned).  Specifies  that  the	 full  scanner	tables
	      should be generated; flex should not compress the tables by tak‐
	      ing advantage of	similar	 transition  functions	for  different
	      states.  Specifies that the alternative fast scanner representa‐
	      tion should be used.  Overrides the default skeleton  file  from
	      which  flex  constructs  its  scanners.  This is useful for flex
	      maintenance   or	 development.	 Specifies   table-compression
	      options.	(Obsolescent) Suppresses the statistics summaries that
	      the -v option typically generates.  (Obsolete)

DESCRIPTION
       The flex command is a tool for generating scanners: programs which rec‐
       ognize lexical patterns in text. The flex command reads the given input
       files, or its standard input if no filenames are given or if a file op‐
       erand  is  -  (dash)  for  a  description of a scanner to generate. The
       description is in the form of pairs of regular expressions and C	 code,
       called  rules.	The  flex command generates as output a C source file,
       lex.yy.c, which defines a routine yylex(). This file  is	 compiled  and
       linked  with  the  -ll  library to produce an executable. When the exe‐
       cutable is run, it scans its input and the regular expressions  in  its
       rules  looking for the best match (longest input). When it has selected
       a rule it executes the associated  C  code  which  has  access  to  the
       matched	input sequence (commonly referred to as a token). This process
       then repeats until input is exhausted.

       The flex command treats multiple input files as one.

   Syntax for Input
       This section contains a description of the flex input  file,  which  is
       normally	 named	with  a suffix.	 The section provides a listing of the
       special values, macros, and functions recognized by flex.

       The flex input file consists of three sections,	separated  by  a  line
       with just %% in it:

       [ definitions ] %% [ rules ] [ %% [ user functions ]]

       Contains declarations to simplify the scanner specification, and decla‐
       rations of start states which are explained below.  Describes what  the
       scanner	is  to	do.   Contains	user-supplied  functions  that	copied
       straight through to lex.yy.c.

	      With the exception of the first %%  sequence  all	 sections  are
	      optional.	 The  minimal scanner %%, copies its input to standard
	      output.

       Each line in the definitions section can be: Defines name to expand  to
       regexp.	 name  is  a word beginning with a letter or an underscore (_)
       followed by zero or more letters, digits, underscores or dashes (-). In
       the  regular-expression	parts  of  the rules section, flex substitutes
       regexp wherever you refer to  {name}  (name  within  braces).   Defines
       names  for  states used in the rules section. A rule may be made condi‐
       tionally active based on the  current  scanner  state.  Multiple	 lines
       defining	 states can appear, and each can contain multiple state names,
       separated by white space. The name of a state follows the  same	syntax
       as  that	 of  regexp  names except that dashes ('-') are not permitted.
       Unlike regexp names, state names share the C #define namespace. In  the
       rules  section  states  are  recognized	as <state> (state within angle
       brackets).

	      The %x directive names exclusive states.	When a scanner	is  in
	      an  exclusive  state,  only  rules  prefixed with that state are
	      active. Inclusive states are named with the %s directive.	  When
	      placed  on  lines by themselves, these symbols enclose C code to
	      be passed verbatim into the global  definitions  of  the	output
	      file.   Such  lines commonly include preprocessor directives and
	      declarations of external variables and functions.	 Lines	begin‐
	      ning  with  a space or tab in the definitions section are passed
	      directly into the lex.yy.c output file, as part of  the  initial
	      global definitions.

       The rules section follows the definitions, separated by a line consist‐
       ing of %%.  The rules section contains rules  for  matching  input  and
       taking actions, in the following format: pattern [action]

       The  pattern  starts  in the first column of the line and extends until
       the first non-escaped white space character. The flex command  attempts
       to find the pattern that matches the longest input sequence and execute
       the associated action. If two or more patterns match the same input the
       one  which  appears  first in the rules section is chosen. If no action
       exists the matched input is discarded. If no pattern matches the	 input
       the default is to copy it to standard output.

       All action code is placed in the yylex() function. Text (C code or dec‐
       larations) placed at the beginning of the rules section	is  copied  to
       the  beginning of the yylex() function and may be used in actions. This
       text must begin with a space or a tab (to distinguish it	 from  rules).
       In  addition,  any  input  (beginning  with a space or within %{ and %}
       delimiter lines) appearing at the beginning of the rules section before
       any  rules are specified will be written to lex.yy.c after the declara‐
       tions of variables for the yylex() function and before the  first  line
       of code in yylex().

       Elements	 of  each rule are: A pattern may begin with a comma separated
       list of state names enclosed by angle  brackets	(<  state  [,state...]
       >).   These  states  are	 entered via the BEGIN statement. If a pattern
       begins with a state, the scanner can only recognize  it	when  in  that
       state.	The  initial state is 0 (zero).	 A regular expression to match
       against the input stream. The regular expressions  in  flex  provide  a
       rich character matching syntax.

	      The  following  characters,  shown in order of decreasing prece‐
	      dence have special meanings: Matches the character  x.   Enclose
	      characters and treat them as literal strings.  For example, "*+"
	      is treated as the asterisk character followed by the plus	 char‐
	      acter.   If str is one of the characters a, b, f, n, r, t, or v,
	      then the ANSI C interpretation is adopted (for example, \n is  a
	      newline).	  If str is a string of octal digits it is interpreted
	      as a character with octal value str. If str is a string of hexa‐
	      decimal digits with a leading x it is interpreted as a character
	      with that value. Otherwise, it is interpreted literally with  no
	      special  meaning. For example, x\*yz represents the four charac‐
	      ters x*yz.  Represents a character class in the  enclosed	 range
	      ([.-.])	or  the	 enclosed  list ([...]). The dash character is
	      used to define a range of characters from the ASCII value or the
	      8-bit  class  of the character that comes before it to the ASCII
	      value or the 8-bit class of the character that follows  it.  For
	      example, [abcx-z] matches a, b, c, x, y, or z.

	      The circumflex when it appears as the first character in a char‐
	      acter class, indicates the complement of the set	of  characters
	      within  that  class.   For example, [^abc] matches any character
	      except a, b or c, including  special  characters	like  newline.
	      Groups regular expressions. For example, (ab) will be considered
	      as a single regular expression.  When enclosing  numbers,	 indi‐
	      cates a number of consecutive occurrences of the expression that
	      comes before it.	For example, (ab){1,5} indicates a  match  for
	      from 1 to 5 occurrences of the string ab.

	      When  enclosing a name, the name represents a regular expression
	      defined in the definitions  section.  For	 example,  {digit}  is
	      replaced	by the defined regular expression for digit. Note that
	      the expansion takes place as if the definition were enclosed  in
	      parentheses.   Matches  any  single  character  except  newline.
	      Matches zero or one of the preceding expressions.	 For  example,
	      ab?c  matches both ac and abc.  Matches zero or more of the pre‐
	      ceding expressions. For example, a* is zero or more  consecutive
	      a	 characters.  The utility of matching zero occurrences is more
	      obvious in complicated expressions.  For	example,  the  expres‐
	      sion,  [A-Za-z][A-Za-z0-9]*  indicates  all alphanumeric strings
	      with a leading alphabetic character, including strings that  are
	      only  one alphabetic character.  Matches one or more of the pre‐
	      ceding expressions. For example, [a-z]+ is all strings of lower‐
	      case  letters.  Matches the expression x followed by the expres‐
	      sion y.  Matches either the preceding expression or the  follow‐
	      ing  expression.	 For  example,	a(br  matches either ab or cd.
	      Matches expression x only if  expression	y  (trailing  context)
	      immediately follows it. For example, ab/cd matches the string ab
	      but only if followed by cd. Only one trailing context is permit‐
	      ted  per	pattern.  When it appears at the beginning of the pat‐
	      tern matches the beginning of a line.  For  example,  ^abc  will
	      match  the string abc if it is found at the beginning of a line.
	      When it appears at the end of a pattern matches  the  end	 of  a
	      line.  It is equivalent to /\n. For example, abc$ will match the
	      string abc if it is found at the end of a line.  Matches an End-
	      of-File.	 Identifies  a	state  name  (see  above) and may only
	      appear at the beginning of a pattern. For example, <done><<EOF>>
	      matches an End-of-File, but only if it is in state done.

	      In  addition, the following rules apply for bracket expressions:
	      These represent the set of collating elements in an  equivalence
	      class  and are enclosed within bracket-equal delimiters ([= =]).
	      An equivalence class generally is designed to deal with primary-
	      secondary	 sorting;  that	 is,  for  languages  like French that
	      define groups of characters as sorting to the same primary loca‐
	      tion, and then have a tie-breaking, secondary sort. For example,
	      if a, `, and ^  belong  to  the  same  equivalence  class,  then
	      [[=a=]b],	 [[=`=]b], and [[=^=]b] are each equivalent to [a`^b].
	      These represent the set of  characters  in  the  current	locale
	      belonging	 to  the  named	 ctype class. These are expressed as a
	      ctype class name enclosed in bracket-colon delimiters ([: :]).

	      In the C or POSIX locale,	 this operating	 system	 supports  the
	      following	 character  class  expressions:	 [:alpha:], [:upper:],
	      [:lower:],   [:digit:],	[:alnum:],   [:xdigit:],    [:space:],
	      [:print:], [:punct:], [:graph:], [:cntrl:].

	      Other locales may define additional character classes.

	      Letters  and  digits  never  have special meanings.  A character
	      such as ^ or -, which has a special meaning in  particular  con‐
	      texts,  refers simply to itself when found outside that context.
	      Spaces and tabs must be escaped to appear in a  regular  expres‐
	      sion;  otherwise	they indicate the end of the expression.  Each
	      pattern in a rule has a corresponding action, which can  be  any
	      arbitrary C statement. The pattern ends at the first non-escaped
	      white space character; the remainder of the line is its  action.
	      If  the  action  is  empty, then when the pattern is matched the
	      input which matched it is discarded.

	      If the action contains a {, then the action spans till the  bal‐
	      ancing  }	 is  found,  and  the action may cross multiple lines.
	      Using a return statement in an action returns from yylex().

	      An action consisting solely of a vertical bar (|) means same  as
	      the action for the next rule.

	      The  flex	 variables  which  can	be  used within actions are: A
	      string (char *) containing the current matched input. It	cannot
	      be  modified.  The length (int) of the current matched input. It
	      cannot be modified.  A stream (FILE  *)  that  flex  reads  from
	      (stdin by default). It may be changed but because of the buffer‐
	      ing flex uses this makes sense only before scanning begins. Once
	      scanning	terminates  because  an	 End-of-File  was  seen,  void
	      yyrestart (FILE *new_file) may be called to point yyin at a  new
	      input file. Alternatively, yyin may be changed whenever a new or
	      different buffer is  selected  (see  yy_switch_to_buffer()).   A
	      stream  (FILE  *)	 to  which  ECHO  output is written (stdout by
	      default). It can be changed by the user.	 Returns  the  current
	      buffer (YY_BUFFER_STATE) used for scanner input.

	      The  flex	 command  macros and functions that may be used within
	      actions are: Copies yytext to the scanner's output.  Changes the
	      scanner state to be state.  This affects which rules are active.
	      The state must be defined in a %s, or %x definition.   The  ini‐
	      tial  state  of the scanner is INITIAL or 0 (zero).  Directs the
	      scanner to proceed immediately to the  next  best	 pattern  that
	      matches  the input (which may be a prefix of the current match).
	      yytext and yyleng are reset appropriately.  Note that REJECT  is
	      a	 particularly  expensive  feature  in terms of scanner perfor‐
	      mance; if it is used in any of the scanner's  actions,  it  will
	      slow  down  all  of  the	scanner's pattern matching operations.
	      REJECT cannot be used if flex is invoked with either  -f	or  -F
	      options.	 Indicates  that  the  next  matched  text  should  be
	      appended to the currently matched text in	 yytext	 (rather  than
	      replace it).  Returns all but the first n characters of the cur‐
	      rent token back to the input stream, where  they	will  be  res‐
	      canned  when  the	 scanner looks for the next match.  yytext and
	      yyleng are adjusted accordingly.	Returns 0 (zero) if  there  is
	      more  input  to  scan or 1 if there is not. The default yywrap()
	      always returns 1. Currently it is implemented as a  macro,  how‐
	      ever in future implementations it may become a function.	Can be
	      used in lieu of a return statement in an action.	It  terminates
	      the scanner and returns a 0 (zero) to the scanner's caller.

	      yyterminate()  is	 automatically	called	when an End-of-File is
	      encountered. It is a macro and  may  be  redefined.   Returns  a
	      YY_BUFFER_STATE  handle  to  a  new input buffer large enough to
	      accommodate size characters and associated with the given	 file.
	      When in doubt, use YY_BUF_SIZE for the size.  Switches the scan‐
	      ner's processing to scan for tokens from the given buffer, which
	      must  be	a YY_BUFFER_STATE.  Deletes the given buffer.  Enables
	      scanning to continue after yyin has been pointed at a  new  file
	      to  process.   Controls  how  the	 scanning function, yylex() is
	      declared. By default, it is int yylex(), or, if  prototypes  are
	      being  used, int yylex(void).  This definition may be changed by
	      redefining the YY_DECL macro.  This macro	 is  expanded  immedi‐
	      ately  before  the {...} (braces) that delimit the scanner func‐
	      tion body.  Controls scanner input. By default,  YY_INPUT	 reads
	      from  the	 file-pointer  yyin.   Its  action  is	to place up to
	      max_size characters in the character array buf and return in the
	      integer  variable result either the number of characters read or
	      the constant YY_NULL to indicate	EOF.  Following	 is  a	sample
	      redefinition  of	YY_INPUT,  in  the  definitions section of the
	      input file:

	      %{ #undef YY_INPUT #define YY_INPUT(buf,result,max_size)\
		 {\
		     int c = getchar();\
		     result = (c == EOF) ? YY_NULL : (buf[0] = c, 1);\
		 } %}

	      When  the	 scanner  receives  an	End-of-File  indication	  from
	      YY_INPUT,	 it  checks the yywrap() function. If yywrap() returns
	      zero, it is assumed that the yyin has been set up	 to  point  to
	      another  input  file, and scanning continues. If it returns non-
	      zero, then the scanner terminates, returning zero to its caller.
	      Redefinable  to provide an action which is always executed prior
	      to the matched pattern's	action.	  Redefinable  to  provide  an
	      action  which is always executed before the first scan.  Is used
	      in the scanner to separate different actions. By default, it  is
	      simply a break, but may be redefined if necessary.

       The  user functions section consists of complete C functions, which are
       passed directly into the lex.y.cc output file (the effect is similar to
       defining	 the  functions	 in  separate  files  and  linking  them  with
       lex.y.cc).  This section is separated from the rules section by the  %%
       delimiter.

       Comments,  in  C	 syntax,  can appear anywhere in the user functions or
       definitions sections.  In the rules section, comments can  be  embedded
       within  actions.	 Empty	lines  or  lines consisting of white space are
       ignored.

       The following macros are	 not  normally	called	explicitly  within  an
       action,	but are used internally by flex to handle the input and output
       streams.	 Reads the next character from the input  stream.  You	cannot
       redefine	 input().   Writes  the	 next  character to the output stream.
       Puts the character c back onto the input stream. It will	 be  the  next
       character scanned. You cannot redefine unput().

	      The  libl.a  contains  default  functions	 to support testing or
	      quick use of a flex program without yacc; these functions can be
	      linked  in  through -ll.	They can also be provided by the user.
	      A simple wrapper that simply calls setlocale()  and  then	 calls
	      the  yylex()  function.	The  function  called when the scanner
	      reaches the end of an input stream.  The default definition sim‐
	      ply  returns  1,	which  causes  the scanner in turn to return 0
	      (zero).

NOTES
       Some trailing context patterns cannot be properly matched and  generate
       warning messages

	      Dangerous trailing context

	      These  are  patterns  where  the ending of the first part of the
	      rule matches the beginning of the second part, such as  zx*/xy*,
	      where the x* matches the x at the beginning of the trailing con‐
	      text.  For some trailing context rules, parts that are  actually
	      fixed  length  are not recognized as such, leading to the previ‐
	      ously mentioned performance loss. In particular, patterns	 using
	      {n} (such as test{3}) are always considered variable length.

	      Combining	 trailing  context  with  the special | (vertical bar)
	      action can result in fixed trailing context  being  turned  into
	      the  more	 expensive variable trailing context.  This happens in
	      the following example:

	      %% abc| xyz/def Use  of  unput()	invalidates  the  contents  of
	      yytext  and  yyleng  within  the	current	 flex  action.	Use of
	      unput() to push back more text than was matched  can  result  in
	      the  pushed-back text matching a beginning-of-line (^) rule even
	      though it did not come at the beginning of  the  line.   Pattern
	      matching	of  NULLs  is substantially slower than matching other
	      characters.  The flex command does not  generate	correct	 #line
	      directives  for  code  internal  to  the	scanner; thus, bugs in
	      flex.skel yield invalid line numbers.  Due to both buffering  of
	      input  and  read-ahead,  you  cannot intermix calls to <stdio.h>
	      routines, such as, for example, getchar(), with flex  rules  and
	      expect  it  to  work.   Call  input()  instead.  The total table
	      entries listed by the -v option excludes	the  number  of	 table
	      entries  needed  to determine what rule was matched.  The number
	      of entries is equal to the number of deterministic  finite-state
	      automaton	 (DFA)	states if the scanner does not use REJECT, and
	      somewhat greater than the number of states if it	does.	REJECT
	      cannot be used with the -f or -F options.

EXAMPLES
       The  following  command	processes  the file lexcommands to produce the
       scanner file lex.yy.c: flex lexcommands

	      This is then compiled and linked by the  command:	 cc  -oscanner
	      lex.yy.c -ll

	      This  produces  a program scanner.  The scanner program converts
	      uppercase to lowercase letters, removes spaces at the end	 of  a
	      line,  and replaces multiple spaces with single spaces. The lex‐
	      commands command contains:

	      %% [A-Z]	 putchar(tolower(yytext[0])); [ ]+$  [	]+   putchar('
	      ');

FILES
       Skeleton	 scanner.   Generated scanner C source.	 Backtracking informa‐
       tion generated from -b option.

SEE ALSO
       Commands:  yacc(1), sed(1), awk(1)

       Files:  locale(4)

								       flex(1)
[top]

List of man pages available for DigitalUNIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net