Bio::DB::GFF::Adaptor::dbi::pg man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

Bio::DB::GFF::Adaptor:User:Contributed Perl DBio::DB::GFF::Adaptor::dbi::pg(3)

NAME
       Bio::DB::GFF::Adaptor::dbi::pg -- Database adaptor for a specific
       postgres schema

NOTES
       SQL commands that need to be executed before this adaptor will work:

	 CREATE DATABASE <dbname>;

       Also, select permission needs to be granted for each table in the
       database to the owner of the httpd process (usually 'nobody', but for
       some RedHat systems it is 'apache') if this adaptor is to be used with
       the Generic Genome Browser (gbrowse):

	 CREATE USER nobody;
	 GRANT SELECT ON TABLE fmeta		     TO nobody;
	 GRANT SELECT ON TABLE fgroup		     TO nobody;
	 GRANT SELECT ON TABLE fdata		     TO nobody;
	 GRANT SELECT ON TABLE fattribute_to_feature TO nobody;
	 GRANT SELECT ON TABLE fdna		     TO nobody;
	 GRANT SELECT ON TABLE fattribute	     TO nobody;
	 GRANT SELECT ON TABLE ftype		     TO nobody;

   Optimizing the database
       PostgreSQL generally requires some tuning before you get very good
       performance for large databases.	 For general information on tuning a
       PostgreSQL server, see
       http://www.varlena.com/GeneralBits/Tidbits/perf.html Of particular
       importance is executing VACUUM FULL ANALYZE whenever you change the
       database.

       Additionally, for a GFF database, there are a few items you can tune.
       For each automatic class in your GBrowse conf file, there will be one
       or two searches done when searching for a feature.  If there are lots
       of features, these search can take several seconds.  To speed these
       searches, do two things:

       1.  Set 'enable_seqscan = false' in your postgresql.conf file (and
	   restart your server).

       2.  Create 'partial' indexes for each automatic class, doing this for
	   the example class 'Allele':

	     CREATE INDEX partial_allele_gclass ON
	       fgroup (lower('gname')) WHERE gclass='Allele';

	   And be sure to run VACUUM FULL ANALYZE after creating the indexes.

DESCRIPTION
       This adaptor implements a specific postgres database schema that is
       compatible with Bio::DB::GFF.  It inherits from
       Bio::DB::GFF::Adaptor::dbi, which itself inherits from Bio::DB::GFF.

       The schema uses several tables:

       fdata
	   This is the feature data table.  Its columns are:

	       fid	      feature ID (integer)
	       fref	      reference sequence name (string)
	       fstart	      start position relative to reference (integer)
	       fstop	      stop postion relative to reference (integer)
	       ftypeid	      feature type ID (integer)
	       fscore	      feature score (float); may be null
	       fstrand	      strand; one of "+" or "-"; may be null
	       fphase	      phase; one of 0, 1 or 2; may be null
	       gid	      group ID (integer)
	       ftarget_start  for similarity features, the target start position (integer)
	       ftarget_stop   for similarity features, the target stop position (integer)

	   Note that it would be desirable to normalize the reference sequence
	   name, since there are usually many features that share the same
	   reference feature.  However, in the current schema, query
	   performance suffers dramatically when this additional join is
	   added.

       fgroup
	   This is the group table. There is one row for each group.  Columns:

	       gid	 the group ID (integer)
	       gclass	 the class of the group (string)
	       gname	 the name of the group (string)

	   The group table serves multiple purposes.  As you might expect, it
	   is used to cluster features that logically belong together, such as
	   the multiple exons of the same transcript.  It is also used to
	   assign a name and class to a singleton feature.  Finally, the group
	   table is used to identify the target of a similarity hit.  This is
	   consistent with the way in which the group field is used in the GFF
	   version 2 format.

	   The fgroup.gid field joins with the fdata.gid field.

	   Examples:

	     sql> select * from fgroup where gname='sjj_2L52.1';
	     +-------+-------------+------------+
	     | gid   | gclass	   | gname	|
	     +-------+-------------+------------+
	     | 69736 | PCR_product | sjj_2L52.1 |
	     +-------+-------------+------------+
	     1 row in set (0.70 sec)

	     sql> select fref,fstart,fstop from fdata,fgroup
		       where gclass='PCR_product' and gname = 'sjj_2L52.1'
			     and fdata.gid=fgroup.gid;
	     +---------------+--------+-------+
	     | fref	     | fstart | fstop |
	     +---------------+--------+-------+
	     | CHROMOSOME_II |	 1586 |	 2355 |
	     +---------------+--------+-------+
	     1 row in set (0.03 sec)

       ftype
	   This table contains the feature types, one per row.	Columns are:

	       ftypeid	    the feature type ID (integer)
	       fmethod	    the feature type method name (string)
	       fsource	    the feature type source name (string)

	   The ftype.ftypeid field joins with the fdata.ftypeid field.
	   Example:

	     sql> select fref,fstart,fstop,fmethod,fsource from fdata,fgroup,ftype
		    where gclass='PCR_product'
			  and gname = 'sjj_2L52.1'
			  and fdata.gid=fgroup.gid
			  and fdata.ftypeid=ftype.ftypeid;
	     +---------------+--------+-------+-------------+-----------+
	     | fref	     | fstart | fstop | fmethod	    | fsource	|
	     +---------------+--------+-------+-------------+-----------+
	     | CHROMOSOME_II |	 1586 |	 2355 | PCR_product | GenePairs |
	     +---------------+--------+-------+-------------+-----------+
	     1 row in set (0.08 sec)

       fdna
	   This table holds the raw DNA of the reference sequences.  It has
	   three columns:

	       fref	     reference sequence name (string)
	       foffset	     offset of this sequence
	       fdna	     the DNA sequence (longblob)

	   To overcome problems loading large blobs, DNA is automatically
	   fragmented into multiple segments when loading, and the position of
	   each segment is stored in foffset.  The fragment size is controlled
	   by the -clump_size argument during initialization.

       fattribute_to_feature
	   This table holds "attributes", which are tag/value pairs stuffed
	   into the GFF line.  The first tag/value pair is treated as the
	   group, and anything else is treated as an attribute (weird, huh?).

	    CHR_I assembly_tag Finished	    2032 2036 . + . Note "Right: cTel33B"
	    CHR_I assembly_tag Polymorphism 668	 668  . + . Note "A->C in cTel33B"

	   The columns of this table are:

	       fid		   feature ID (integer)
	       fattribute_id	   ID of the attribute (integer)
	       fattribute_value	   text of the attribute (text)

	   The fdata.fid column joins with fattribute_to_feature.fid.

       fattribute
	   This table holds the normalized names of the attributes.  Fields
	   are:

	     fattribute_id	ID of the attribute (integer)
	     fattribute_name	Name of the attribute (varchar)

   Data Loading Methods
       In addition to implementing the abstract SQL-generating methods of
       Bio::DB::GFF::Adaptor::dbi, this module also implements the data
       loading functionality of Bio::DB::GFF.

   new
	Title	: new
	Usage	: $db = Bio::DB::GFF->new(@args)
	Function: create a new adaptor
	Returns : a Bio::DB::GFF object
	Args	: see below
	Status	: Public

       The new constructor is identical to the "dbi" adaptor's new() method,
       except that the prefix "dbi:pg" is added to the database DSN identifier
       automatically if it is not there already.

	 Argument	Description
	 --------	-----------

	 -dsn		the DBI data source, e.g. 'dbi:Pg:dbname=:ens0040' or "ens0040"

	 -user		username for authentication

	 -pass		the password for authentication

   schema
	Title	: schema
	Usage	: $schema = $db->schema
	Function: return the CREATE script for the schema
	Returns : a list of CREATE statemetns
	Args	: none
	Status	: protected

       This method returns a list containing the various CREATE statements
       needed to initialize the database tables.

   setup_load
	Title	: setup_load
	Usage	: $db->setup_load
	Function: called before load_gff_line()
	Returns : void
	Args	: none
	Status	: protected

       This method performs schema-specific initialization prior to loading a
       set of GFF records.  It prepares a set of DBI statement handlers to be
       used in loading the data.

   load_gff_line
	Title	: load_gff_line
	Usage	: $db->load_gff_line($fields)
	Function: called to load one parsed line of GFF
	Returns : true if successfully inserted
	Args	: hashref containing GFF fields
	Status	: protected

       This method is called once per line of the GFF and passed a series of
       parsed data items that are stored into the hashref $fields.  The keys
       are:

	ref	     reference sequence
	source	     annotation source
	method	     annotation method
	start	     annotation start
	stop	     annotation stop
	score	     annotation score (may be undef)
	strand	     annotation strand (may be undef)
	phase	     annotation phase (may be undef)
	group_class  class of annotation's group (may be undef)
	group_name   ID of annotation's group (may be undef)
	target_start start of target of a similarity hit
	target_stop  stop of target of a similarity hit
	attributes   array reference of attributes, each of which is a [tag=>value] array ref

   get_table_id
	Title	: get_table_id
	Usage	: $integer = $db->get_table_id($table,@ids)
	Function: get the ID of a group or type
	Returns : an integer ID or undef
	Args	: none
	Status	: private

       This internal method is called by load_gff_line to look up the integer
       ID of an existing feature type or group.	 The arguments are the name of
       the table, and two string identifiers.  For feature types, the
       identifiers are the method and source.  For groups, the identifiers are
       group name and class.

       This method requires that a statement handler named lookup_$table, have
       been created previously by setup_load().	 It is here to overcome
       deficiencies in mysql's INSERT syntax.

   range_query
	Title	: range_query
	Usage	: $db->range_query($range_type,$refseq,$refclass,$start,$stop,$types,$order_by_group,$attributes,$binsize)
	Function: create statement handle for range/overlap queries
	Returns : a DBI statement handle
	Args	: see below
	Status	: Protected

       This method constructs the statement handle for this module's central
       query: given a range and/or a list of feature types, fetch their GFF
       records.	 It overrides a method in dbi.pm so that the overlaps query
       can write SQL optimized for Postgres.  Specifically, instead of writing
       the bin related section as a set of ORs, each bin piece is place in a
       separate select and then they are UNIONed together.  This subroutine
       requires several replacements for other subroutines in dbi.pm.  In this
       module, they are named the same as those in dbi.pm but prefixed with
       "pg_".

       The positional arguments are as follows:

	 Argument		Description

	 $isrange		A flag indicating that this is a range.
				query.	Otherwise an overlap query is
				assumed.

	 $refseq		The reference sequence name (undef if no range).

	 $refclass		The reference sequence class (undef if no range).

	 $start			The start of the range (undef if none).

	 $stop			The stop of the range (undef if none).

	 $types			Array ref containing zero or feature types in the
				format [method,source].

	 $order_by_group	A flag indicating that statement handler should group
				the features by group id (handy for iterative fetches)

	 $attributes		A hash containing select attributes.

	 $binsize		A bin size for generating tables of feature density.

   search_notes
       This PostgreSQL adaptor does not implement the search notes method
       because it can be very slow (although the code for the method is
       contained in this method but commented out).  There is, however, a
       PostgreSQL adaptor that does implement it in a more efficient way:
       Bio::DB::GFF::Adaptor::dbi::pg_fts, which inherits from this adaptor
       and uses the optional PostgreSQL module TSearch2 for full text
       indexing.  See that adaptor's documentation for more information.

       See also Bio::DB::GFF

	Title	: search_notes
	Usage	: @search_results = $db->search_notes("full text search string",$limit)
	Function: Search the notes for a text string, using mysql full-text search
	Returns : array of results
	Args	: full text search string, and an optional row limit
	Status	: public

       This is a replacement for the mysql-specific method.  Given a search
       string, it performs a ILIKE search of the notes table and returns an
       array of results.  Each row of the returned array is a arrayref
       containing the following fields:

	 column 1     A Bio::DB::GFF::Featname object, suitable for passing to segment()
	 column 2     The text of the note
	 column 3     A relevance score.

       Note that for large databases this can be very slow and may result in
       time out or 500-cgi errors.  If this is happening on a regular basis,
       you should look into using Bio::DB::GFF::Adaptor::dbi::pg_fts which
       implements the TSearch2 full text indexing scheme.

   make_meta_set_query
	Title	: make_meta_set_query
	Usage	: $sql = $db->make_meta_set_query
	Function: return SQL fragment for setting a meta parameter
	Returns : SQL fragment
	Args	: none
	Status	: public

       By default this does nothing; meta parameters are not stored or
       retrieved.

   make_features_by_name_where_part
	Title	: make_features_by_name_where_part
	Usage	: $db->make_features_by_name_where_part
	Function: Overrides a function in Bio::DB::GFF::Adaptor::dbi to insure
		  that searches will be case insensitive. It creates the SQL
		  fragment needed to select a feature by its group name & class
	Returns : a SQL fragment and bind arguments
	Args	: see below
	Status	: Protected

perl v5.14.1			  2011-07-22 Bio::DB::GFF::Adaptor::dbi::pg(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net