logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

Bio::ASN1::EntrezGene::Indexer - Indexes NCBI Sequence files.

Author

       Dr. Mingyi Liu <mingyiliu@gmail.com>

Citation

       Liu, Mingyi, and Andrei Grigoriev. "Fast parsers for Entrez Gene."  Bioinformatics 21, no. 14 (2005):
       3189-3190.

Description

       Bio::ASN1::EntrezGene::Indexer is a Perl Indexer for NCBI Entrez Gene genome databases. It processes an
       ASN.1-formatted Entrez Gene record and stores the file position for each record in a way compliant with
       Bioperl standard (in fact its a subclass of Bioperl's index objects).

       Note that this module does not parse record, because it needs to run fast and grab only the gene ids.
       For parsing record, use Bio::ASN1::EntrezGene, or better yet, use Bio::SeqIO, format 'entrezgene'.

       It takes this module (version 1.07) 21 seconds to index the human genome Entrez Gene file (Apr. 5/2005
       download) on one 2.4 GHz Intel Xeon processor.

Feedback

Mailinglists
       User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments
       and suggestions preferably to the Bioperl mailing list.  Your participation is much appreciated.

         bioperl-l@bioperl.org              - General discussion
         http://bioperl.org/Support.html    - About the mailing lists

   Support
       Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org

       rather than to the module maintainer directly. Many experienced and reponsive experts will be able look
       at the problem and quickly address it. Please include a thorough description of the problem with code and
       data examples if at all possible.

   Reportingbugs
       Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution.
       Bug reports can be submitted via the web:

         https://github.com/bioperl/bio-asn1-entrezgene/issues

Installation

       Same as Bio::ASN1::EntrezGene

Internal Methods

_version_type_stamp_index_file_file_format_file_handle
         Title   : _file_handle
         Usage   : $fh = $index->_file_handle( INT )
         Function: Returns an open filehandle for the file
                   index INT.  On opening a new filehandle it
                   caches it in the @{$index->_filehandle} array.
                   If the requested filehandle is already open,
                   it simply returns it from the array.
         Example : $fist_file_indexed = $index->_file_handle( 0 );
         Returns : ref to a filehandle
         Args    : INT
         Notes   : This function is copied from Bio::Index::Abstract. Once that module
                     changes file handle code like I do below to fit perl 5.005_03, this
                     sub would be removed from this module

Methods

fetch
         Parameters: $geneid - id for the Entrez Gene record to be retrieved
         Example:    my $hash = $indexer->fetch(10); # get Entrez Gene #10
         Function:   fetch the data for the given Entrez Gene id.
         Returns:    A Bio::Seq object produced by Bio::SeqIO::entrezgene
         Notes:      One needs to have Bio::SeqIO::entrezgene installed before
                       calling this function!

   fetch_hash
         Parameters: $geneid - id for the Entrez Gene record to be retrieved
         Example:    my $hash = $indexer->fetch_hash(10); # get Entrez Gene #10
         Function:   fetch a hash produced by Bio::ASN1::EntrezGene for given Entrez
                       Gene id.
         Returns:    A data structure containing all data items from the Entrez
                       Gene record.
         Notes:      Alternative to fetch()

Name

       Bio::ASN1::EntrezGene::Indexer - Indexes NCBI Sequence files.

Operation Systems Supported

       Any OS that Perl & Bioperl run on.

Prerequisite

       Bio::ASN1::EntrezGene, Bioperl version that contains Stefan Kirov's entrezgene.pm and all dependencies
       therein.

See Also

       For details on various parsers I generated for Entrez Gene, example scripts that uses/benchmarks the
       modules, please see <http://sourceforge.net/projects/egparser/>.  Those other parsers etc. are included
       in V1.05 download.

Synopsis

         use Bio::ASN1::EntrezGene::Indexer;

         # creating & using the index is just a few lines
         my $inx = Bio::ASN1::EntrezGene::Indexer->new(
           -filename => 'entrezgene.idx',
           -write_flag => 'WRITE'); # needed for make_index call, but if opening
                                    # existing index file, don't set write flag!
         $inx->make_index('Homo_sapiens', 'Mus_musculus', 'Rattus_norvegicus');
         my $seq = $inx->fetch(10); # Bio::Seq obj for Entrez Gene #10
         # alternatively, if one prefers just a data structure instead of objects
         $seq = $inx->fetch_hash(10); # a hash produced by Bio::ASN1::EntrezGene
                                   # that contains all data in the Entrez Gene record

         # note that in case you wonder, you can get the files 'Homo_sapiens'
         # from NCBI Entrez Gene ftp download, DATA/ASN/Mammalia directory

Version

       version 1.73

See Also