Boulder::Genbank man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

Boulder::Genbank(3)   User Contributed Perl Documentation  Boulder::Genbank(3)

NAME
       Boulder::Genbank - Fetch Genbank data records as parsed Boulder Stones

SYNOPSIS
	 use Boulder::Genbank

	 # network access via Entrez
	  $gb = Boulder::Genbank->newFh( qw(M57939 M28274 L36028) );

	  while ($data = <$gb>) {
	      print $data->Accession;

	      @introns = $data->features->Intron;
	      print "There are ",scalar(@introns)," introns.\n";
	      $dna = $data->Sequence;
	      print "The dna is ",length($dna)," bp long.\n";

	      my @features = $data->features(-type=>[ qw(Exon Source Satellite) ],
					     -pos=>[90,310] );
	      foreach (@features) {
		 print $_->Type,"\n";
		 print $_->Position,"\n";
		 print $_->Gene,"\n";
	     }
	   }

	 # another syntax
	 $gb = new Boulder::Genbank(-accessor=>'Entrez',
				    -fetch => [qw/M57939 M28274 L36028/]);

	 # local access via Yank
	 $gb = new Boulder::Genbank(-accessor=>'Yank',
				    -fetch=>[qw/M57939 M28274 L36028/]);
	 while (my $s = $gb->get) {
	    # etc.
	 }

	 # parse a file of Genbank records
	 $gb = new Boulder::Genbank(-accessor=>'File',
				    -fetch => '/usr/local/db/gbpri3.seq');
	 while (my $s = $gb->get) {
	    # etc.
	 }

	 # parse flatfile records yourself
	 open (GB,"/usr/local/db/gbpri3.seq");
	 local $/ = "//\n";
	 while (<GB>) {
	    my $s = Boulder::Genbank->parse($_);
	    # etc.
	 }

DESCRIPTION
       Boulder::Genbank provides retrieval and parsing services for NCBI
       Genbank-format records.	It returns Genbank entries in Stone format,
       allowing easy access to the various fields and values.
       Boulder::Genbank is a descendent of Boulder::Stream, and provides a
       stream-like interface to a series of Stone objects.

       >> IMPORTANT NOTE <<

       As of January 2002, NCBI has changed their Batch Entrez interface.  I
       have modified Boulder::Genbank so as to use a "demo" interface, which
       fixes things, but this isn't guaranteed in the long run.

       I have written to NCBI, and they may fix this -- or they may not.

       >> IMPORTANT NOTE <<

       Access to Genbank is provided by three different accessors, which
       together give access to remote and local Genbank databases.  When you
       create a new Boulder::Genbank stream, you provide one of the three
       accessors, along with accessor-specific parameters that control what
       entries to fetch.  The three accessors are:

       Entrez
	   This provides access to NetEntrez, accessing the most recent
	   Genbank information directly from NCBI's Web site.  The parameters
	   passed to this accessor are either a series of Genbank accession
	   numbers, or an Entrez query (see
	   http://www.ncbi.nlm.nih.gov/Entrez/linking.html).  If you provide a
	   list of accession numbers, the stream will return a series of
	   stones corresponding to the numbers.	 Otherwise, if you provided an
	   Entrez query, the entries returned will be in the order returned by
	   Entez.

       File
	   This provides access to local Genbank entries by reading from a
	   flat file (typically one of the .seq files downloadable from NCBI's
	   Web site).  The stream will return a Stone corresponding to each of
	   the entries in the file, starting from the top of the file and
	   working downward.  The parameter in this case is the path to the
	   local file.

       Yank
	   This provides access to local Genbank entries using Will Fitzhugh's
	   Yank program.  Yank provides fast indexed access to a Genbank flat
	   file using the accession number as the key.	The parameter passed
	   to the Yank accessor is a list of accession numbers.	 Stones will
	   be returned in the requested order.	By default the yank binary
	   lives in /usr/local/bin/yank.  To support other locations, you may
	   define the environment variable YANK to contain the full path.

       It is also possible to parse a single Genbank entry from a text string
       stored in a scalar variable, returning a Stone object.

   Boulder::Genbank methods
       This section lists the public methods that the Boulder::Genbank class
       makes available.

       new()
	      # Network fetch via Entrez, with accession numbers
	      $gb=new Boulder::Genbank(-accessor  =>  'Entrez',
				       -fetch	  =>  [qw/M57939 M28274 L36028/]);

	      # Same, but shorter and uses -> operator
	      $gb = Boulder::Genbank->new qw(M57939 M28274 L36028);

	      # Network fetch via Entrez, with a query

	      # Network fetch via Entrez, with a query
	      $query = 'Homo sapiens[Organism] AND EST[Keyword]';
	      $gb=new Boulder::Genbank(-accessor  =>  'Entrez',
				       -fetch	  =>  $query);

	      # Local fetch via Yank, with accession numbers
	      $gb=new Boulder::Genbank(-accessor  =>  'Yank',
				       -fetch	  =>  [qw/M57939 M28274 L36028/]);

	      # Local fetch via File
	      $gb=new Boulder::Genbank(-accessor  =>  'File',
				       -fetch	  =>  '/usr/local/genbank/gbpri3.seq');

	   The new() method creates a new Boulder::Genbank stream on the
	   accessor provided.  The three possible accessors are Entrez, Yank
	   and File.  If successful, the method returns the stream object.
	   Otherwise it returns undef.

	   new() takes the following arguments:

		   -accessor	   Name of the accessor to use
		   -fetch	   Parameters to pass to the accessor
		   -proxy	   Path to an HTTP proxy, used when using
				    the Entrez accessor over a firewall.

	   Specify the accessor to use with the -accessor argument.  If not
	   specified, it defaults to Entrez.

	   -fetch is an accessor-specific argument.  The possibilities are:

	   For Entrez, the -fetch argument may point to a scalar, in which
	   case it is interpreted as an Entrez query string.  See
	   http://www.ncbi.nlm.nih.gov/Entrez/linking.html for a description
	   of the query syntax.	 Alternatively, -fetch may point to an array
	   reference, in which case it is interpreted as a list of accession
	   numbers to retrieve.	 If -fetch points to a hash, it is interpreted
	   as extended information.  See "Extended Entrez Parameters" below.

	   For Yank, the -fetch argument must point to an array reference
	   containing the accession numbers to retrieve.

	   For File, the -fetch argument must point to a string-valued scalar,
	   which will be interpreted as the path to the file to read Genbank
	   entries from.

	   For Entrez (and Entrez only) Boulder::Genbank allows you to use a
	   shortcut syntax in which you provde new() with a list of accession
	   numbers:

	     $gb = new Boulder::Genbank('M57939','M28274','L36028');

       newFh()
	   This works like new(), but returns a filehandle.  To recover each
	   GenBank record read from the filehandle with the <> operator:

	     $fh = Boulder::GenBank->newFh('M57939','M28274','L36028');
	     while ($record = <$fh>) {
		print $record->asString;
	     }

       get()
	   The get() method is inherited from Boulder::Stream, and simply
	   returns the next parsed Genbank Stone, or undef if there is nothing
	   more to fetch.  It has the same semantics as the parent class,
	   including the ability to restrict access to certain top-level tags.

	   The object returned is a Stone::GB_Sequence object, which is a
	   descendent of Stone.

       put()
	   The put() method is inherited from the parent Boulder::Stream
	   class, and will write the passed Stone to standard output in
	   Boulder format.  This means that it is currently not possible to
	   write a Boulder::Genbank object back into Genbank flatfile form.

   Extended Entrez Parameters
       The Entrez accessor recognizes extended parameters that allow you the
       ability to customize the search.	 Instead of passing a query string
       scalar or a list of accession numbers as the -fetch argument, pass a
       hash reference.	The hashref should contain one or more of the
       following keys:

       -query
	   The Entrez query to process.

       -accession
	   The list of accession numbers to fetch, as an array ref.

       -db The database to search.  This is a single-letter database code
	   selected from the following list:

	     m	MEDLINE
	     p	Protein
	     n	Nucleotide
	     s	Popset

       -proxy
	   An HTTP proxy to use.  For example:

	      -proxy => http://www.firewall.com:9000

	   If you think you need this, get the correct URL from your system
	   administrator.

       As an example, here's how to search for ESTs from Oryza sativa that
       have been entered or modified since 1999.

	 my $gb = new Boulder::Genbank( -accessor=>Entrez,
					-query=>'Oryza sativa[Organism] AND EST[Keyword] AND 1999[MDAT]',
					-db   => 'n'
				       });

METHODS DEFINED BY THE GENBANK STONE OBJECT
       Each record returned from the Boulder::Genbank stream defines a set of
       methods that correspond to features and other fields in the Genbank
       flat file record.  Stone::GB_Sequence gives the full details, but they
       are listed for reference here:

   $length = $entry->length
       Get the length of the sequence.

   $start = $entry->start
       Get the start position of the sequence, currently always "1".

   $end = $entry->end
       Get the end position of the sequence, currently always the same as the
       length.

   @feature_list = $entry->features(-pos=>[50,450],-type=>['CDS','Exon'])
       features() will search the entry feature list for those features that
       meet certain criteria.  The criteria are specified using the -pos
       and/or -type argument names, as shown below.

       -pos
	   Provide a position or range of positions which the feature must
	   overlap.  A single position is specified in this way:

	      -pos => 1500;	    # feature must overlap postion 1500

	   or a range of positions in this way:

	      -pos => [1000,1500];  # 1000 to 1500 inclusive

	   If no criteria are provided, then features() returns all the
	   features, and is equivalent to calling the Features() accessor.

       -type, -types
	   Filter the list of features by type or a set of types.  Matches are
	   case-insensitive, so "exon", "Exon" and "EXON" are all equivalent.
	   You may call with a single type as in:

	      -type => 'Exon'

	   or with a list of types, as in

	      -types => ['Exon','CDS']

	   The names "-type" and "-types" can be used interchangeably.

   $seqObj = $entry->bioSeq;
       Returns a Bio::Seq object from the Bioperl project.  Dies with an error
       message unless the Bio::Seq module is installed.

OUTPUT TAGS
       The tags returned by the parsing operation are taken from the NCBI
       ASN.1 schema.  For consistency, they are normalized so that the initial
       letter is capitalized, and all subsequent letters are lowercase.	 This
       section contains an abbreviated list of the most useful/common tags.
       See "The NCBI Data Model", by James Ostell and Jonathan Kans in
       "Bioinformatics: A Practical Guide to the Analysis of Genes and
       Proteins" (Eds. A. Baxevanis and F. Ouellette), pp 121-144 for the full
       listing.

   Top-Level Tags
       These are tags that appear at the top level of the parsed Genbank
       entry.

       Accession
	   The accession number of this entry.	Because of the vagaries of the
	   Genbank data model, an entry may have multiple accession numbers
	   (e.g. after a merging operation).  Accession may therefore be a
	   multi-valued tag.

	   Example:

		 my $accessionNo = $s->Accession;

       Authors
	   The list of authors, as they appear on the AUTHORS line of the
	   Genbank record.  No attempt is made to parse them into individual
	   authors.

       Basecount
	   The nucleotide basecount for the entry.  It is presented as a
	   Boulder Stone with keys "a", "c", "t" and "g".  Example:

		my $A = $s->Basecount->A;
		my $C = $s->Basecount->C;
		my $G = $s->Basecount->G;
		my $T = $s->Basecount->T;
		print "GC content is ",($G+$C)/($A+$C+$G+$T),"\n";

       Blob
	   The entire flatfile record as an unparsed chunk of text (a "blob").
	   This is a handy way of reassembling the record for human
	   inspection.

       Comment
	   The COMMENT line from the Genbank record.

       Definition
	   The DEFINITION line from the Genbank record, unmodified.

       Features
	   The FEATURES table.	This is a complex stone object with multiple
	   subtags.  See the "The Features Tag" for details.

       Journal
	   The JOURNAL line from the Genbank record, unmodified.

       Keywords
	   The KEYWORDS line from the Genbank record, unmodified.  No attempt
	   is made to parse the keywords into separate values.

	   Example:

	       my $keywords = $s->Keywords

       Locus
	   The LOCUS line from the Genbank record.  It is not further parsed.

       Medline, Nid
	   References to other database accession numbers.

       Organism
	   The taxonomic name of the organism from which this entry was
	   derived. This line is taken from the Genbank entry unmodified.  See
	   the NCBI data model documentation for an explanation of their
	   taxonomic syntax.

       Reference
	   The REFERENCE line from the Genbank entry.  There are often
	   multiple Reference lines.  Example:

	     my @references = $s->Reference;

       Sequence
	   The DNA or RNA sequence of the entry.  This is presented as a
	   single lower-case string, with all base numbers and formatting
	   characters removed.

       Source
	   The entry's SOURCE field; often giving clues on how the sequencing
	   was performed.

       Title
	   The TITLE field from the paper describing this entry, if any.

   The Features Tag
       The Features tag points to a Stone record that contains multiple
       subtags.	 Each subtag is the name of a feature which points, in turn,
       to a Stone that describes the feature's location and other attributes.
       The full list of feature is beyond this document, but the following are
       the features that are most often seen:

	       Cds	       a CDS
	       Intron	       an intron
	       Exon	       an exon
	       Gene	       a gene
	       Mrna	       an mRNA
	       Polya_site      a putative polyadenylation signal
	       Repeat_unit     a repetitive region
	       Source	       More information about the organism and cell
			       type the sequence was derived from
	       Satellite       a microsatellite (dinucleotide repeat)

       Each feature will contain one or more of the following subtags:

       DB_xref
	   A cross-reference to another database in the form
	   DB_NAME:accession_number.  See the NCBI Web site for a description
	   of these cross references.

       Evidence
	   The evidence for this feature, either "experimental" or
	   "predicted".

       Gene
	   If the feature involves a gene, this will be the gene's name (or
	   one of its names).  This subtag is often seen in "Gene" and Cds
	   features.

	   Example:

		   foreach ($s->Features->Cds) {
		      my $gene = $_->Gene;
		      my $position = $_->Position;
		      Print "Gene $gene ($position)\n";
		   }

       Map If the feature is mapped, this provides a map position, usually as
	   a cytogenetic band.

       Note
	   A grab-back for various text notes.

       Number
	   When multiple features of this type occur, this field is used to
	   number them.	 Ordinarily this field is not needed because
	   Boulder::Genbank preserves the order of features.

       Organism
	   If the feature is Source, this provides the source organism.

       Position
	   The position of this feature, usually expresed as a range
	   (1970..1975).

       Product
	   The protein product of the feature, if applicable, as a text
	   string.

       Translation
	   The protein translation of the feature, if applicable.

SEE ALSO
       Boulder, Boulder::Blast

AUTHOR
       Lincoln Stein <lstein@cshl.org>.

       Copyright (c) 1997-2000 Lincoln D. Stein

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.  See DISCLAIMER.txt for
       disclaimers of warranty.

EXAMPLE GENBANK OBJECT
       The following is an excerpt from a moderately complex Genbank Stone.
       The Sequence line and several other long lines have been truncated for
       readability.

	Authors=Spritz,R.A., Strunk,K., Surowy,C.S.O., Hoch,S., Barton,D.E. and Francke,U.
	Authors=Spritz,R.A., Strunk,K., Surowy,C.S. and Mohrenweiser,H.W.
	Locus=HUMRNP7011   2155 bp    DNA	      PRI	03-JUL-1991
	Accession=M57939
	Accession=J04772
	Accession=M57733
	Keywords=ribonucleoprotein antigen.
	Sequence=aagcttttccaggcagtgcgagatagaggagcgcttgagaaggcaggttttgcagcagacggcagtgacagcccag...
	Definition=Human small nuclear ribonucleoprotein (U1-70K) gene, exon 10 and 11.
	Journal=Nucleic Acids Res. 15, 10373-10391 (1987)
	Journal=Genomics 8, 371-379 (1990)
	Nid=g337441
	Medline=88096573
	Medline=91065657
	Features={
	  Polya_site={
	    Evidence=experimental
	    Position=1989
	    Gene=U1-70K
	  }
	  Polya_site={
	    Position=1990
	    Gene=U1-70K
	  }
	  Polya_site={
	    Evidence=experimental
	    Position=1992
	    Gene=U1-70K
	  }
	  Polya_site={
	    Evidence=experimental
	    Position=1998
	    Gene=U1-70K
	  }
	  Source={
	    Organism=Homo sapiens
	    Db_xref=taxon:9606
	    Position=1..2155
	    Map=19q13.3
	  }
	  Cds={
	    Codon_start=1
	    Product=ribonucleoprotein antigen
	    Db_xref=PID:g337445
	    Position=join(M57929:329..475,M57930:183..245,M57930:358..412, ...
	    Gene=U1-70K
	    Translation=MTQFLPPNLLALFAPRDPIPYLPPLEKLPHEKHHNQPYCGIAPYIREFEDPRDAPPPTR...
	  }
	  Cds={
	    Codon_start=1
	    Product=ribonucleoprotein antigen
	    Db_xref=PID:g337444
	    Evidence=experimental
	    Position=join(M57929:329..475,M57930:183..245,M57930:358..412, ...
	    Gene=U1-70K
	    Translation=MTQFLPPNLLALFAPRDPIPYLPPLEKLPHEKHHNQPYCGIAPYIREFEDPR...
	  }
	  Polya_signal={
	    Position=1970..1975
	    Note=putative
	    Gene=U1-70K
	  }
	  Intron={
	    Evidence=experimental
	    Position=1100..1208
	    Gene=U1-70K
	  }
	  Intron={
	    Number=10
	    Evidence=experimental
	    Position=1100..1181
	    Gene=U1-70K
	  }
	  Intron={
	    Number=9
	    Evidence=experimental
	    Position=order(M57937:702..921,1..1011)
	    Note=2.1 kb gap
	    Gene=U1-70K
	  }
	  Intron={
	    Position=order(M57935:272..406,M57936:1..284,M57937:1..599, <1..>1208)
	    Gene=U1-70K
	  }
	  Intron={
	    Evidence=experimental
	    Position=order(M57935:284..406,M57936:1..284,M57937:1..599, <1..>1208)
	    Note=first gap-0.14 kb, second gap-0.62 kb
	    Gene=U1-70K
	  }
	  Intron={
	    Number=8
	    Evidence=experimental
	    Position=order(M57935:272..406,M57936:1..284,M57937:1..599, <1..>1181)
	    Note=first gap-0.14 kb, second gap-0.62 kb
	    Gene=U1-70K
	  }
	  Exon={
	    Number=10
	    Evidence=experimental
	    Position=1012..1099
	    Gene=U1-70K
	  }
	  Exon={
	    Number=11
	    Evidence=experimental
	    Position=1182..(1989.1998)
	    Gene=U1-70K
	  }
	  Exon={
	    Evidence=experimental
	    Position=1209..(1989.1998)
	    Gene=U1-70K
	  }
	  Mrna={
	    Product=ribonucleoprotein antigen
	    Position=join(M57928:358..668,M57929:319..475,M57930:183..245, ...
	    Gene=U1-70K
	  }
	  Mrna={
	    Product=ribonucleoprotein antigen
	    Citation=[2]
	    Evidence=experimental
	    Position=join(M57928:358..668,M57929:319..475,M57930:183..245, ...
	    Gene=U1-70K
	  }
	  Gene={
	    Position=join(M57928:207..719,M57929:1..562,M57930:1..577, ...
	    Gene=U1-70K
	  }
	}
	Reference=1  (sites)
	Reference=2  (bases 1 to 2155)
	=

POD ERRORS
       Hey! The above document had some coding errors, which are explained
       below:

       Around line 342:
	   You forgot a '=back' before '=head2'

       Around line 347:
	   =back without =over

perl v5.14.1			  2002-11-05		   Boulder::Genbank(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net