Bio::DB::SeqFeature::Store::Loader man page on Pidora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Pidora logo
[printable version]

Bio::DB::SeqFeature::SUser:Contributed PeBio::DB::SeqFeature::Store::Loader(3)

NAME
       Bio::DB::SeqFeature::Store::Loader -- Loader

SYNOPSIS
	# non-instantiable base class

DESCRIPTION
       This is the base class for Bio::DB::SeqFeature::Loader::GFF3Loader,
       Bio::DB::SeqFeature::Loader::GFFLoader, and
       Bio::DB::SeqFeature::FeatureFileLoader. Please see the manual pages for
       these modules.

   new
	Title	: new
	Usage	: $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(@options)
	Function: create a new parser
	Returns : a Bio::DB::SeqFeature::Store::GFF3Loader gff3 parser and loader
	Args	: several - see below
	Status	: public

       This method creates a new GFF3 loader and establishes its connection
       with a Bio::DB::SeqFeature::Store database. Arguments are -name=>$value
       pairs as described in this table:

	Name		   Value
	----		   -----

	-store		   A writeable Bio::DB::SeqFeature::Store database handle.

	-seqfeature_class  The name of the type of Bio::SeqFeatureI object to create
			     and store in the database (Bio::DB::SeqFeature by default)

	-sf_class	   A shorter alias for -seqfeature_class

	-verbose	   Send progress information to standard error.

	-fast		   If true, activate fast loading (see below)

	-chunk_size	   Set the storage chunk size for nucleotide/protein sequences
			      (default 2000 bytes)

	-tmp		   Indicate a temporary directory to use when loading non-normalized
			      features.

	-map_coords	   A code ref that will transform a list of ($ref,[$start1,$end1]...)
			      coordinates into a list of ($newref,[$newstart1,$newend1]...)

	-index_subfeatures Indicate true if subfeatures should be indexed. Default is true.

       When you call new(), a connection to a Bio::DB::SeqFeature::Store
       database should already have been established and the database
       initialized (if appropriate).

       Some combinations of Bio::SeqFeatures and Bio::DB::SeqFeature::Store
       databases support a fast loading mode. Currently the only reliable
       implementation of fast loading is the combination of DBI::mysql with
       Bio::DB::SeqFeature. The other important restriction on fast loading is
       the requirement that a feature that contains subfeatures must occur in
       the GFF3 file before any of its subfeatures. Otherwise the subfeatures
       that occurred before the parent feature will not be attached to the
       parent correctly. This restriction does not apply to normal (slow)
       loading.

       If you use an unnormalized feature class, such as
       Bio::SeqFeature::Generic, then the loader needs to create a temporary
       database in which to cache features until all their parts and subparts
       have been seen. This temporary databases uses the "berkeleydb" adaptor.
       The -tmp option specifies the directory in which that database will be
       created. If not present, it defaults to the system default tmp
       directory specified by File::Spec->tmpdir().

       The -chunk_size option allows you to tune the representation of
       DNA/Protein sequence in the Store database. By default, sequences are
       split into 2000 base/residue chunks and then reassembled as needed.
       This avoids the problem of pulling a whole chromosome into memory in
       order to fetch a short subsequence from somewhere in the middle.
       Depending on your usage patterns, you may wish to tune this parameter
       using a chunk size that is larger or smaller than the default.

   load
	Title	: load
	Usage	: $count = $loader->load(@ARGV)
	Function: load the indicated files or filehandles
	Returns : number of feature lines loaded
	Args	: list of files or filehandles
	Status	: public

       Once the loader is created, invoke its load() method with a list of
       GFF3 or FASTA file paths or previously-opened filehandles in order to
       load them into the database. Compressed files ending with .gz, .Z and
       .bz2 are automatically recognized and uncompressed on the fly. Paths
       beginning with http: or ftp: are treated as URLs and opened using the
       LWP GET program (which must be on your path).

       FASTA files are recognized by their initial ">" character. Do not feed
       the loader a file that is neither GFF3 nor FASTA; I don't know what
       will happen, but it will probably not be what you expect.

   accessors
       The following read-only accessors return values passed or created
       during new():

	store()		 the long-term Bio::DB::SeqFeature::Store object

	tmp_store()	 the temporary Bio::DB::SeqFeature::Store object used
			   during loading

	sfclass()	 the Bio::SeqFeatureI class

	fast()		 whether fast loading is active

	seq_chunk_size() the sequence chunk size

	verbose()	 verbose progress messages

   Internal Methods
       The following methods are used internally and may be overidden by
       subclasses.

       default_seqfeature_class
	     $class = $loader->default_seqfeature_class

	   Return the default SeqFeatureI class (Bio::DB::SeqFeature).

       subfeatures_normalized
	     $flag = $loader->subfeatures_normalized([$new_flag])

	   Get or set a flag that indicates that the subfeatures are
	   normalized. This is deduced from the SeqFeature class information.

       subfeatures_in_table
	     $flag = $loader->subfeatures_in_table([$new_flag])

	   Get or set a flag that indicates that feature/subfeature
	   relationships are stored in a table. This is deduced from the
	   SeqFeature class and Store information.

       load_fh
	     $count = $loader->load_fh($filehandle)

	   Load the GFF3 data at the other end of the filehandle and return
	   true if successful. Internally, load_fh() invokes:

	     start_load();
	     do_load($filehandle);
	     finish_load();

       start_load, finish_load
	   These methods are called at the start and end of a filehandle load.

       do_load
	     $count = $loader->do_load($fh)

	   This is called by load_fh() to load the GFF3 file's filehandle and
	   return the number of lines loaded.

       load_line
	       $loader->load_line($data);

	   Load a line of a GFF3 file. You must bracket this with calls to
	   start_load() and finish_load()!

	       $loader->start_load();
	       $loader->load_line($_) while <FH>;
	       $loader->finish_load();

       handle_feature
	     $loader->handle_feature($data_line)

	   This method is called to process a single data line. It manipulates
	   information stored a data structure called $self->{load_data}.

       handle_meta
	     $loader->handle_meta($data_line)

	   This method is called to process a single data line. It manipulates
	   information stored a data structure called $self->{load_data}.

       store_current_feature
	     $loader->store_current_feature()

	   This method is called to store the currently active feature in the
	   database. It uses a data structure stored in $self->{load_data}.

       parse_attributes
	    ($reserved,$unreserved) = $loader->parse_attributes($attribute_line)

	   This method parses the information contained in the $attribute_line
	   into two hashrefs, one containing the values of reserved attribute
	   tags (e.g. ID) and the other containing the values of unreserved
	   ones.

       start_or_finish_sequence
	     $loader->start_or_finish_sequence('Chr9')

	   This method is called at the beginning and end of a fasta section.

       load_sequence
	     $loader->load_sequence('gatttcccaaa')

	   This method is called to load some amount of sequence after
	   start_or_finish_sequence() is first called.

       open_fh
	    my $io_file = $loader->open_fh($filehandle_or_path)

	   This method opens up the indicated file or pipe, using some
	   intelligence to recognized compressed files and URLs and doing the
	   right thing.

       loaded_ids
	    my $ids    = $loader->loaded_ids;
	    my $id_cnt = @$ids;

	   After performing a load, this returns an array ref containing all
	   the feature primary ids that were created during the load.

       local_ids
	    my $ids    = $self->local_ids;
	    my $id_cnt = @$ids;

	   After performing a load, this returns an array ref containing all
	   the load file IDs that were contained within the file just loaded.

       time
	    my $time = $loader->time

	   This method returns the current time in seconds, using Time::HiRes
	   if available.

       unescape
	    my $unescaped = GFF3Loader::unescape($escaped)

	   This is an internal utility.	 It is the same as
	   CGI::Util::unescape, but doesn't change pluses into spaces and
	   ignores unicode escapes.

BUGS
       This is an early version, so there are certainly some bugs. Please use
       the BioPerl bug tracking system to report bugs.

SEE ALSO
       bioperl, Bio::DB::SeqFeature::Store, Bio::DB::SeqFeature::Segment,
       Bio::DB::SeqFeature::NormalizedFeature,
       Bio::DB::SeqFeature::Store::GFF3Loader,
       Bio::DB::SeqFeature::Store::DBI::mysql,
       Bio::DB::SeqFeature::Store::berkeleydb

AUTHOR
       Lincoln Stein <lstein@cshl.org>.

       Copyright (c) 2006 Cold Spring Harbor Laboratory.

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.14.1			  2011-07Bio::DB::SeqFeature::Store::Loader(3)
[top]

List of man pages available for Pidora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net