logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

rabema_build_gold_standard - RABEMA Gold Standard Builder

Description

       This  program  allows  one  to  build  a RABEMA gold standard.  The input is a reference FASTA file and a
       perfect SAM/BAM map (e.g. created using RazerS 3 in full-sensitivity mode).

       The input SAM/BAM file must be sortedbycoordinate.   The  program  will  create  a  FASTA  index  file
       REF.fa.fai for fast random access to the reference.

Examples

rabema_build_gold_standard-e4-oOUT.gsi-sIN.sam-rREF.fa
              Build gold standard from a SAM file IN.sam with all mapping locations and a FASTA reference REF.fa
              to GSI file OUT.gsi with a maximal error rate of 4.

       rabema_build_gold_standard--distance-metricedit-e4-oOUT.gsi-bIN.bam-rREF.fa
              Same as above, but using Hamming instead of edit distance and BAM as the input.

       rabema_build_gold_standard--oracle-mode-oOUT.gsi-sIN.sam-rREF.fa
              Build gold standard from a SAM file IN.sam with the original sample position, e.g.  as exported by
              read simulator Mason.

Memory Requirements

       From  version  1.1,  great  care has been taken to keep the memory requirements as low as possible. There
       memory required is two times the size of the largest chromosome plus some constant memory for each match.

       For example, the memory usage for 100bp human genome reads at 5% error rate was 1.7GB. Of  this,  roughly
       400GB came from the chromosome and 1.3GB from the matches.

Name

       rabema_build_gold_standard - RABEMA Gold Standard Builder

Options

-h, --help
              Display the help message.

       --version
              Display version information.

       -v, --verbose
              Enable verbose output.

       -vv, --very-verbose
              Enable even more verbose output.

   Input/Output:-o, --out-gsiOUTPUT_FILE
              Path  to  write  the  resulting  GSI  file  to. Valid filetype is: .gsi[.*], where * is any of the
              following extensions: gz for transparent (de)compression.

       -r, --referenceINPUT_FILE
              Path to load reference FASTA from. Valid filetypes are: .sam[.*],  .raw[.*],  .gbk[.*],  .frn[.*],
              .fq[.*], .fna[.*], .ffn[.*], .fastq[.*], .fasta[.*], .faa[.*], .fa[.*], .embl[.*], and .bam, where
              * is any of the following extensions: gz, bz2, and bgzf for transparent (de)compression.

       -b, --in-bamINPUT_FILE
              Path  to  load the "perfect" SAM/BAM file from. Valid filetypes are: .sam[.*] and .bam, where * is
              any of the following extensions: gz, bz2, and bgzf for transparent (de)compression.

   GoldStandardParameters:--oracle-mode
              Enable oracle mode.  This is used for simulated data when the input SAM/BAM file gives exactly one
              position that is considered as the true sample position.

       --match-N
              When set, N matches all characters without penalty.

       --distance-metricSTRING
              Set distance metric.  Valid values: hamming, edit.   Default:  edit.  One  of  hamming  and  edit.
              Default: edit.

       -e, --max-errorINTEGER
              Maximal  error  rate  to  build  gold  standard  for in percent.  This parameter is an integer and
              relative to the read length.  In case of oracle mode, the error rate for the read at the  sampling
              position is used and RATE is used as a cutoff threshold. Default: 0.

References

       M.  Holtgrewe,  A.-K.  Emde,  D.  Weese and K. Reinert.  A Novel And Well-Defined Benchmarking Method For
       Second Generation Read Mapping, BMC Bioinformatics 2011, 12:210.

       http://www.seqan.de/rabema
              RABEMA Homepage

       http://www.seqan.de/mason
              Mason Homepage

rabema_build_gold_standard 1.2.10 [tarball]                                        RABEMA_BUILD_GOLD_STANDARD(1)

Return Values

       A return value of 0 indicates success, any other value indicates an error.

Synopsis

rabema_build_gold_standard [OPTIONS] --out-gsiOUT.gsi--referenceREF.fa--in-bamPERFECT.{sam,bam}
return

See Also