agrep man page on Ultrix

Man page or keyword search:  
man Server   3690 pages
apropos Keyword Search (all sections)
Output format
Ultrix logo
[printable version]

AGREP(l)							      AGREP(l)

NAME
       agrep - search a file for a string or regular expression, with approxi‐
       mate matching capabilities

SYNOPSIS
       agrep [ -#cdehiklnpstvwxBDGIS ] pattern [  -f  patternfile  ]  [	 file‐
       name... ]

DESCRIPTION
       agrep  searches the input filenames (standard input is the default, but
       see a warning under LIMITATIONS) for records containing	strings	 which
       either  exactly	or  approximately  match  a  pattern.	A record is by
       default a line, but it can be defined differently using the  -d	option
       (see  below).   Normally,  each	record found is copied to the standard
       output.	Approximate matching allows finding records that  contain  the
       pattern	with  several  errors including substitutions, insertions, and
       deletions.  For example, Massechusets matches  Massachusetts  with  two
       errors	(one  substitution  and	 one  insertion).   Running  agrep  -2
       Massechusets foo outputs all lines in foo containing any string with at
       most 2 errors from Massechusets.

       agrep  supports	many  kinds of queries including arbitrary wild cards,
       sets of patterns, and in general, regular  expressions.	 See  PATTERNS
       below.	It  supports  most of the options supported by the grep family
       plus several more (but it is not 100% compatible with grep).  For  more
       information  on	the  algorithms used by agrep see Wu and Manber, "Fast
       Text Searching With Errors," Technical  report  #91-11,	Department  of
       Computer Science, University of Arizona, June 1991 (available by anony‐
       mous ftp from cs.arizona.edu in agrep/agrep.ps.1), and Wu  and  Manber,
       "Agrep  --  A  Fast  Approximate	 Pattern Searching Tool", To appear in
       USENIX Conference 1992 January (available by anonymous ftp from cs.ari‐
       zona.edu in agrep/agrep.ps.2).

       As with the rest of the grep family, the characters `$', `^', `∗', `[',
       `]', `^', `|', `(', `)', `!', and `\' can cause unexpected results when
       included in the pattern, as these characters are also meaningful to the
       shell.  To avoid these problems, one should always enclose  the	entire
       pattern	argument in single quotes, i.e., 'pattern'.  Do not use double
       quotes (").

       When agrep is applied to more than one input file, the name of the file
       is  displayed preceding each line which matches the pattern.  The file‐
       name is not displayed when processing a single file, so if you actually
       want  the  filename  to	appear,	 use /dev/null as a second file in the
       list.

OPTIONS
       -#     # is a non-negative integer (at most 8) specifying  the  maximum
	      number  of  errors  permitted in finding the approximate matches
	      (defaults to zero).  Generally,  each  insertion,	 deletion,  or
	      substitution  counts as one error.  It is possible to adjust the
	      relative cost of insertions, deletions and substitutions (see -I
	      -D and -S options).

       -c     Display only the count of matching records.

       -d 'delim'
	      Define  delim  to	 be  the  separator  between two records.  The
	      default value is '$', namely a record  is	 by  default  a	 line.
	      delim  can be a string of size at most 8 (with possible use of ^
	      and $), but not a regular expression.  Text between two delim's,
	      before  the  first delim, and after the last delim is considered
	      as one record.  For  example,  -d	 '$$'  defines	paragraphs  as
	      records and -d '^From ' defines mail messages as records.	 agrep
	      matches each record separately.  This option does not  currently
	      work with regular expressions.

       -e pattern
	      Same  as	a simple pattern argument, but useful when the pattern
	      begins with a `-'.

       -f patternfile
	      patternfile contains a set of (simple) patterns.	The output  is
	      all  lines  that	match at least one of the patterns in pattern‐
	      file.  Currently, the -f option works only for exact  match  and
	      for simple patterns (any meta symbol is interpreted as a regular
	      character); it is compatible only with -c, -h, -i, -l,  -s,  -v,
	      -w, and -x options.  see LIMITATIONS for size bounds.

       -h     Do not display filenames.

       -i     Case-insensitive	search	—  e.g.,  "A"  and  "a" are considered
	      equivalent.

       -k     No symbol in the pattern is treated as a	meta  character.   For
	      example,	agrep  -k  'a(b|c)*d' foo will find the occurrences of
	      a(b|c)*d in foo whereas agrep  'a(b|c)*d'	 foo  will  find  sub‐
	      strings in foo that match the regular expression 'a(b|c)*d'.

       -l     List only the files that contain a match.	 This option is useful
	      for looking for files containing a certain pattern.   For	 exam‐
	      ple,  "  agrep  -l 'wonderful'  * " will list the names of those
	      files in current directory that contain the word 'wonderful'.

       -n     Each line that is printed is prefixed by its  record  number  in
	      the file.

       -p     Find  records  in	 the  text that contain a supersequence of the
	      pattern.	For example,
	       agrep -p DCS foo will match "Department of Computer Science."

       -s     Work silently, that is, display nothing except  error  messages.
	      This is useful for checking the error status.

       -t     Output the record starting from the end of delim to (and includ‐
	      ing) the next delim.  This  is  useful  for  cases  where	 delim
	      should come at the end of the record.

       -v     Inverse  mode  —	display only those records that do not contain
	      the pattern.

       -w     Search for the pattern as a word	—  i.e.,  surrounded  by  non-
	      alphanumeric characters.	The non-alphanumeric must surround the
	      match;  they cannot be counted as errors.	 For example, agrep -w
	      -1 car will match cars, but not characters.

       -x     The pattern must match the whole line.

       -y     Used with -B option. When -y is on, agrep will always output the
	      best matches without giving a prompt.

       -B     Best match mode.	When -B is specified and no exact matches  are
	      found,  agrep  will continue to search until the closest matches
	      (i.e., the ones with minimum number of  errors)  are  found,  at
	      which point the following message will be shown: "the best match
	      contains x errors, there are y matches, output them? (y/n)"  The
	      best match mode is not supported for standard input, e.g., pipe‐
	      line input.  When the -#, -c, or -l options are  specified,  the
	      -B option is ignored.  In general, -B may be slower than -#, but
	      not by very much.

       -Dk    Set the cost of a deletion to k (k is a positive integer).  This
	      option does not currently work with regular expressions.

       -G     Output the files that contain a match.

       -Ik    Set  the	cost  of  an insertion to k (k is a positive integer).
	      This option does not currently work with regular expressions.

       -Sk    Set the cost of a substitution to k (k is a  positive  integer).
	      This option does not currently work with regular expressions.

PATTERNS
       agrep  supports	a large variety of patterns, including simple strings,
       strings with classes of characters, sets of strings,  wild  cards,  and
       regular expressions.

       Strings
	      any  sequence  of	 characters, including the special symbols `^'
	      for beginning of line and `$' for	 end  of  line.	  The  special
	      characters  listed  above	 (  `$', `^', `∗', `[', `^', `|', `(',
	      `)', `!', and `\' ) should be preceded by `\' if they are to  be
	      matched as regular characters.  For example, \^abc\\ corresponds
	      to the string ^abc\, whereas ^abc corresponds to the string  abc
	      at the beginning of a line.

       Classes of characters
	      a	 list  of  characters  inside [] (in order) corresponds to any
	      character from the list.	For example, [a-ho-z] is any character
	      between  a  and  h or between o and z.  The symbol `^' inside []
	      complements the list.  For example, [^i-n] denote any  character
	      in  the  character  set except character 'i' to 'n'.  The symbol
	      `^' thus has two meanings, but this is  consistent  with	egrep.
	      The  symbol  `.'	(don't care) stands for any symbol (except for
	      the newline symbol).

       Boolean operations
	      agrep supports an `and' operation `;' and an `or' operation `,',
	      but  not	a  combination	of  both.  For example, 'fast;network'
	      searches for all records containing both words.

       Wild cards
	      The symbol '#' is used to denote a wild card.  # matches zero or
	      any  number  of arbitrary characters.  For example, ex#e matches
	      example.	The symbol # is equivalent to .* in egrep.   In	 fact,
	      .*  will work too, because it is a valid regular expression (see
	      below), but unless this is part of an actual regular expression,
	      # will work faster.

       Combination of exact and approximate matching
	      any pattern inside angle brackets <> must match the text exactly
	      even if the match is with errors.	  For  example,	 <mathemat>ics
	      matches  mathematical  with one error (replacing the last s with
	      an a), but mathe<matics> does not match mathematical  no	matter
	      how many errors we allow.

       Regular expressions
	      The  syntax  of  regular	expressions in agrep is in general the
	      same as that for egrep.  The union operation `|', Kleene closure
	      `*', and parentheses () are all supported.  Currently '+' is not
	      supported.  Regular expressions are currently limited to approx‐
	      imately  30  characters  (generally  excluding meta characters).
	      Some options (-d, -w, -f, -t, -x, -D, -I, -S) do	not  currently
	      work with regular expressions.  The maximal number of errors for
	      regular expressions that use '*' or '|' is 4.

EXAMPLES
       agrep -2 -c ABCDEFG foo
	      gives the number of lines	 in  file  foo	that  contain  ABCDEFG
	      within two errors.

       agrep -1 -D2 -S2 'ABCD#YZ' foo
	      outputs  the  lines  containing  ABCD followed, within arbitrary
	      distance, by YZ, with up to one additional  insertion  (-D2  and
	      -S2 make deletions and substitutions too "expensive").

       agrep -5 -p abcdefghij /usr/dict/words
	      outputs the list of all words containing at least 5 of the first
	      10 letters of the alphabet in order.  (Try it:  any list	start‐
	      ing  with	 academia and ending with sacrilegious must mean some‐
	      thing!)

       agrep -1 'abc[0-9](de|fg)*[x-z]' foo
	      outputs the lines containing, within up to one error, the string
	      that  starts with abc followed by one digit, followed by zero or
	      more repetitions of either de or fg, followed by either x, y, or
	      z.

       agrep -d '^From ' 'breakdown;internet' mbox
	      outputs  all  mail messages (the pattern '^From ' separates mail
	      messages in a mail file) that contain keywords  'breakdown'  and
	      'internet'.

       agrep -d '$$' -1 '<word1> <word2>' foo
	      finds  all  paragraphs that contain word1 followed by word2 with
	      one error in place of the blank.	In particular, if word1 is the
	      last  word  in  a	 line  and word2 is the first word in the next
	      line, then the space will be substituted by a newline symbol and
	      it  will match.  Thus, this is a way to overcome separation by a
	      newline.	Note that -d '$$' (or another delim which  spans  more
	      than  one	 line)	is necessary, because otherwise agrep searches
	      only one line at a time.

       agrep '^agrep' <this manual>
	      outputs all the examples of the use of agrep in this man pages.

SEE ALSO
       ed(1), ex(1), grep(1V), sh(1), csh(1).

BUGS/LIMITATIONS
       Any bug reports or comments will be appreciated!	 Please mail  them  to
       sw@cs.arizona.edu or udi@cs.arizona.edu

       Regular	expressions  do	 not support the '+' operator (match 1 or more
       instances of the preceding token).  These can be searched for by	 using
       this syntax in the pattern:

	  'pattern(pattern)*'

       (search for strings containing one instance of the pattern, followed by
       0 or more instances of the pattern).

       The following can cause an  infinite  loop:  agrep  pattern  *  >  out‐
       put_file.   If  the number of matches is high, they may be deposited in
       output_file before it is completely read leading to more matches of the
       pattern	within	output_file  (the matches are against the whole direc‐
       tory).  It's not clear whether this is a "bug" (grep will do the same),
       but be warned.

       The  maximum  size  of  the patternfile is limited to be 250Kb, and the
       maximum number of patterns is limited to be 30,000.

       Standard input is the default if no input file is given.	  However,  if
       standard	 input is keyed in directly (as opposed to through a pipe, for
       example) agrep may not work for some non-simple patterns.

       There is no size limit for simple patterns.  More complicated  patterns
       are  currently  limited to approximately 30 characters.	Lines are lim‐
       ited to 1024 characters.	 Records are limited to 48K, and may be	 trun‐
       cated  if they are larger than that.  The limit of record length can be
       changed by modifying the parameter Max_record in agrep.h.

DIAGNOSTICS
       Exit status is 0 if any matches are found, 1  if	 none,	2  for	syntax
       errors or inaccessible files.

AUTHORS
       Sun  Wu	and  Udi Manber, Department of Computer Science, University of
       Arizona, Tucson, AZ 85721.  {sw|udi}@cs.arizona.edu.

				 Jan 17, 1992			      AGREP(l)
[top]

List of man pages available for Ultrix

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net