logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

RG::Blast::Parser - fast NCBI BLAST parser

Author

       Laszlo Kajan, <lkajan@rostlab.org>

Constructor

       new( [FILEHANDLE, [NAME]] )
           Creates an "RG::Blast::Parser".  Blast results are read from FILEHANDLE, STDIN by default.  The input
           stream may be named NAME in error messages (default: "STDIN").

Description

       This package contains perl binding for a very fast C/C++ library for NCBI BLAST -m 0 (default) output
       parsing.  BLAST results are returned in a convenient hash structure.

       Multiple results may be concatenated for input.  One result is parsed and returned at a time.

Methods

       parse( [TRACE_PARSING, [TRACE_SCANNING]] )
           Parse  one  complete BLAST result and return it.  If no results on input stream, returns "undef".  In
           case of parser error it die()s with an (at present not very useful) error message.

           The following structure is returned in a hash reference:

             {
               blast_version =>    STRING,
               references =>       [ STRING, ... ],
               rounds => [
                   {
                       oneline_idx =>      NUM,    # index of first one-line description of
                                                   # this round in "onelines" array
                       oneline_cnt =>      NUM,    # number of one-line descriptions of
                                                   # this round
                       hit_idx =>          NUM,    # index of first hit of this round in
                                                   # "hits" array
                       hit_cnt =>          NUM,    # number of hits of this round
                       oneline_new_idx =>  NUM|undef# index of first new (not-seen-before)
                                                   # one-line description of this round
                                                   # in "onelines" array
                       oneline_new_cnt =>  NUM     # number of new one-line descriptions of
                                                   # this round
                   }, ...
               ],
               q_name =>       STRING,
               q_desc =>       STRING|undef,
               q_length =>     NUM,
               db_name =>      STRING,
               db_nseq =>      NUM,
               db_nletter =>   NUM,
               onelines =>     [                   # one-line descriptions from all rounds
                   {
                       name =>         STRING,
                       desc =>         STRING|undef,
                       bit_score =>    NUM,
                       e_value =>      NUM
                   }, ...
               ],
               converged =>    BOOLEAN,
               hits =>         [                   # hits from all rounds
                   {
                       name =>         STRING,
                       desc =>         STRING|undef,
                       length =>       NUM,
                       hsps =>         [
                           {
                               bit_score =>    NUM,
                               raw_score =>    NUM,
                               e_value =>      NUM,
                               method =>       STRING,
                               identities =>   NUM,
                               positives =>    NUM,
                               gaps =>         NUM,
                               q_strand =>     STRING|undef,
                               s_strand =>     STRING|undef,
                               q_frame =>      NUM|undef,
                               s_frame =>      NUM|undef,
                               q_start =>      NUM,
                               q_ali =>        STRING,
                               q_end =>        NUM,
                               match_line =>   STRING,
                               s_start =>      NUM,
                               s_ali =>        STRING,
                               s_end =>        NUM
                           }, ...
                       ]
                   }, ...
               ],
               tail =>         STRING              # bulk text after the last hit / one-line
                                                   # description
             }

           If you want tracing for parsing and scanning, you can enable them using the parameters of this call.

       result()
           Returns the last BLAST result parsed or "undef" if no last result.

       get_trace_scanning()
           Returns scan trace state as a Boolean value.

       set_trace_scanning( BOOLEAN )
           Set scan trace - debugging aid.

Name

       RG::Blast::Parser - fast NCBI BLAST parser

See Also

Zerg(3pm), Zerg::Report(3pm)

Synopsis

         use Data::Dumper;
         use RG::Blast::Parser;

         my $parser = RG::Blast::Parser->new(); # read from STDIN

         open( EXAMPLE, '<', '/usr/share/doc/librg-blast-parser-perl/examples/converged.ali' ) || confess($!);
         my $parser = RG::Blast::Parser->new( \*EXAMPLE, "converged.ali" ); # read from EXAMPLE, use name "converged.ali" in error messages

         while( my $res = $parser->parse() )
         {
           print Dumper( $res );
         }

         eval {
           my $res = $parser->parse();
           # ...
         };
         if( $@ && $@ =~ /^parser error/ ) { warn("failed to parse blast result - exception caught"); }

See Also