indexer man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

INDEXER(1)	       mnoGoSearch 3.2 reference manual		    INDEXER(1)

NAME
       indexer - indexing WWW space.

SYNOPSIS
       indexer	[ -a ] [ -b ] [ -n number ] [ -e ] [ -m ] [ -q ] [ -o ] [ -r ]
       [ -i ] [ -w ] [ -R ] [ -N number ] [ -p seconds ] [ -t tag ] [ -u  pat‐
       tern ] [ -s status ] [ -y content-type ] [ configfile ]

       indexer -C [ -R ] [ -t tag ] [ -u pattern ] [ -s status ] [ -y content-
       type ] [ configfile ]

       indexer -S [ -R ] [ -t tag ] [ -u pattern ] [ -s status ] [ -y content-
       type ] [ configfile ]

       indexer -I [ -R ] [ -t tag ] [ -u pattern ] [ -s status ] [ -y content-
       type ] [ configfile ]

       indexer -h|-?

DESCRIPTION
       indexer is a part of mnoGoSearch	 -   search  engine.  The  purpose  of
       indexer	is  to	walk through HTTP, HTTPS, FTP, NEWS servers as well as
       local file system, recursively grabbing all the documents  and  storing
       metadata	 about	documents into SQL or built-in database in a smart and
       effective manner. Since every document is referenced by its correspond‐
       ing  URL,  metadata  collected  by  indexer  is	used later in a search
       process.

       The behaviour of indexer is controlled mainly  via  configuration  file
       indexer.conf  (5)  ,  which it reads on startup. There is a compiled-in
       default for configuration file name and location, so you don't need  to
       specify it every time you run indexer , but you can specify alternative
       configuration file as the last argument.

       indexer supports HTML-formatted	(text/html  MIME  type),  XML-formated
       (text/xml  MIME	type) and plain text (text/plain MIME type) documents.
       Support for other data types is provided by  using  external  programs,
       which  are  called  "parsers". Parser should get data of some type from
       stdin  and  put	text/html  or  text/plain   data   to	stdout.	   See
       indexer.conf(5) for details.

       You  may	 run  indexer  regularly from cron (8) to keep metadata up-to-
       date.

       indexer is also used to manipulate database. It may be  used  to	 clear
       some  data  from	 database,  to output some statistics and to calculate
       popolarity ranking.

OPTIONS
       Indexing

       -a     Reindex all documents even if not expired.

	      By  default  indexer  reindex  only  whose  documents  that  are
	      "expired",  e.g.	 time  since  their last reindexing is greater
	      than "Period" from indexer.conf (5) file. This  option  disables
	      the  feature,  so all documents will be reindexed, irrelevant to
	      their state.  To achieve this, indexer just first marks all URLs
	      as "expired". This gives the following side effect: if you start
	      indexer -a and then  terminate  it  (for	example,  by  pressing
	      Ctrl-C  ) and start again, all URLs will be considered "expired"
	      and will be reindexed again.

       -m     This option force indexer to reindex documents,  even  if	 their
	      content  has  not been modified.	It is achived by disabling If-
	      Modified-Since HTTP header and MD5 hash check.  This  is	usable
	      if  you  have  changed  some Allow , Disallow , MaxHops or other
	      directives in your indexer.conf(5) file.	Thus,  there  will  be
	      different	 set of rules for storing document URLs and so differ‐
	      ent set of URLs. To find out that URLs, there is a need to rein‐
	      dex even-not-changed documents.

       -n     number Reindex only given number of URLs and exit.

       -c     seconds limit indexing time to a given number of seconds

       -e     Reindex  most  expired  documents first.	That option forces the
	      list of documents to reindex to be  sorted  by  last  reindexing
	      time. That means that most "expired" documents will be reindexed
	      first. You may or may not experience some minor delay with  that
	      option,  but  at	least  in theory it should slow down indexer a
	      bit.

	      The combination of -e and -n  number is  seems  to  be  of  some
	      value.  So,  you	can use indexer -e -n  100 to reindex just 100
	      most expired documents.

       -q     Quick startup. This mode is useful if you haven't added or modi‐
	      fied  Server  commands.	indexer	 will not insert URLs given in
	      Server commands into database which leads to some startup speed-
	      up.

       -k     skip  locking  (this  option  affects  only MySQL and PostgreSQL
	      only).

       -i     Isert new URLs. New  URL	must  be  specified  using  -u	or  -f
	      options.

       -p     seconds Specifies time in seconds to pause after each URL.

       -w     Turns off warnings before clearing database.

       -o     Index documents with less depth (hops value) first.

       -r     Do  not  try  to	reduce	remote servers load by randomising url
	      fetch list before indexing (recommended for very big  number  of
	      URLs).

       -b     Block start more than one indexer instances

       -N     number  Run number threads, if multithreaded mnoGoSearch version
	      was compiled.

       -R     Calculate popularity rank before program exit.

	      Subsection control

       -t tag
       -u pattern
       -s status
       -g category
       -y content-type

	      Set URL filters on tag , pattern , status ,  category  and  con‐
	      tent-type respectively.

	      tag  is  a  server tag that you can arbitrary set in config file
	      indexer.conf (5)

	      pattern is a SQL LIKE wildcard for URL. In short, underscore ( _
	      )	 means	"any  symbol", and per cent ( % ) means "any symbols",
	      and the comparison is case insensitive. For example, indexer -u
	      %izhcom.ru% will reindex all documents that URLs contains string
	      "izhcom.ru".

	      status is a filter on document's	HTTP  status  obtained	during
	      last  reindexing.	  For example, -s  0 is a filter for all docu‐
	      ments that has not been indexed before.  -s  200 is a filter for
	      all  documents that was retrieved with "HTTP 200 Ok" status, and
	      -s 301 is a filter for all documents  that  was  retrieved  with
	      "HTTP  301  Redirect"  status.  See HTTP protocol specifications
	      for details on HTTP status codes and their respective meanings.

	      category is a filter for documents that match specific category.
	      Categories are almost like tags but nested.

	      content-type  is	a  MIME	 type for documents with that Content-
	      Type.

	      You can freely combine any number of -t , -u , -s ,  -g  and  -y
	      options.	The  filters  of the same class (tag, pattern, status)
	      are be combined using logical OR, and the filters	 of  different
	      classes  will  be combined using logical AND. That means, if you
	      type indexer -u %izhcom.ru% -u %udm.net% -t 1 -s 200  the	 docu‐
	      ments-to-index  will  be	those  with tag 1 and HTTP status 200,
	      which URLs contains the strings "izhcom.ru" or "udm.net".

       -f     filename Read URL to be indexed/inserted/cleared	from  a	 file.
	      (With  -a	 or -C option, it supports SQL LIKE wildcard '%' , has
	      no effect when combined with -m option.

       -f     - Use STDIN instead of a file to read URL list

	      Logging options

       -l     Do not log to stdout/stderr.

       -v     level Verbose level, can be set to 0-5.

	      Misc.

       -C     Clear databases.

	      This will erase data previously collected by  indexer  from  the
	      mnoGoSearch  databases.  You  can	 use  options  -t  , -u and -s
	      described above to select what do you want to delete.

	      WARNING: Use this option with extreme caution!

       -S     Show statistics.

	      This option outputs a brief statistics of how many documents are
	      there in database, their HTTP status, and how many documents are
	      expired. You can use options -t , -u and -s described  above  to
	      select what documents do you want statistics on.

       -I     Show referrers.

	      This option shows you the referrers of URLs. Or, in other words,
	      all hyperlinks from the document. You can use options  -t	 ,  -u
	      and  -s  described above to select what documents do you want to
	      show referrers on.

       -h
       -?     Shows help screen with  brief  overall  description  of  indexer
	      options.

BUGS
       If  you	think  you've found a bug in indexer, please report it to mno‐
       GoSearch bugreport system at  http://www.mnogosearch.org/bugs/  (please
       post in English only).

COPYRIGHT
       Copyright    ©	  1998-2015    Lavtech.Com   Corp.    (http://www.mno‐
       gosearch.org/).

       This program is free software; you can redistribute it and/or modify it
       under  the  terms of the GNU General Public License as published by the
       Free Software Foundation; either version 2 of the License, or (at  your
       option) any later version.

       This  program  is  distributed  in the hope that it will be useful, but
       WITHOUT ANY  WARRANTY;  without	even  the  implied  warranty  of  MER‐
       CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

SEE ALSO
       indexer.conf(5)

mnoGoSearch 3.2		       23 December 2002			    INDEXER(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net