logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

metabat1 - MetaBAT: Metagenome Binning based on Abundance and Tetranucleotide frequency (version 1)

Author

        This manpage was written by Andreas Tille for the Debian distribution and
        can be used for any other usage of the program.

metabat1 2.15                                       May 2020                                         METABAT1(1)

Description

       MetaBAT:  Metagenome  Binning  based  on  Abundance and Tetranucleotide frequency (version 1) by Don Kang
       (ddkang@lbl.gov), Jeff Froula, Rob Egan, and Zhong Wang (zhongwang@lbl.gov)

Name

       metabat1 - MetaBAT: Metagenome Binning based on Abundance and Tetranucleotide frequency (version 1)

Options

-h [ --help ]
              produce help message

       -i [ --inFile ] arg
              Contigs in (gzipped) fasta file format [Mandatory]

       -o [ --outFile ] arg
              Base file name for each bin. The default output is fasta format. Use  -l  option  to  output  only
              contig names [Mandatory]

       -a [ --abdFile ] arg
              A  file having mean and variance of base coverage depth (tab delimited; the first column should be
              contig names, and the first row will be considered as the header and be skipped) [Optional]

       --cvExt
              When a coverage file without variance (from third party tools) is used  instead  of  abdFile  from
              jgi_summarize_bam_contig_depths

       -p [ --pairFile ] arg
              A  file  having  paired reads mapping information. Use it to increase sensitivity. (tab delimited;
              should have 3 columns of contig index (ordered by), its mate contig  index,  and  supporting  mean
              read coverage.  The first row will be considered as the header and be skipped) [Optional]

       --p1 arg (=0)
              Probability  cutoff  for  bin  seeding.  It mainly controls the number of potential bins and their
              specificity. The higher, the more (specific) bins would be. (Percentage; Should be between  0  and
              100)

       --p2 arg (=0)
              Probability cutoff for secondary neighbors. It supports p1 and better be close to p1. (Percentage;
              Should be between 0 and 100)

       --minProb arg (=0)
              Minimum  probability  for binning consideration. It controls sensitivity.  Usually it should be >=
              75. (Percentage; Should be between 0 and 100)

       --minBinned arg (=0)
              Minimum proportion of already  binned  neighbors  for  one's  membership  inference.  It  contorls
              specificity. Usually it would be <= 50 (Percentage; Should be between 0 and 100)

       --verysensitive
              For  greater sensitivity, especially in a simple community. It is the shortcut for --p1 90 --p2 85
              --pB 20 --minProb 75 --minBinned 20 --minCorr 90

       --sensitive
              For better sensitivity [default]. It is the shortcut for --p1 90 --p2  90  --pB  20  --minProb  80
              --minBinned 40 --minCorr 92

       --specific
              For  better  specificity.  Different  from  --sensitive when using correlation binning or ensemble
              binning. It is the shortcut for --p1 90 --p2 90 --pB 30 --minProb 80 --minBinned 40 --minCorr 96

       --veryspecific
              For greater specificity. No correlation binning for short contig recruiting. It  is  the  shortcut
              for --p1 90 --p2 90 --pB 40 --minProb 80 --minBinned 40

       --superspecific
              For  the best specificity. It is the shortcut for --p1 95 --p2 90 --pB 50 --minProb 80 --minBinned
              20

       --minCorr arg (=0)
              Minimum pearson correlation  coefficient  for  binning  missed  contigs  to  increase  sensitivity
              (Helpful  when  there  are  many  samples).  Should  be  very high (>=90) to reduce contamination.
              (Percentage; Should be between 0 and 100; 0 disables)

       --minSamples arg (=10)
              Minimum number of sample sizes for considering correlation based recruiting

       -x [ --minCV ] arg (=1)
              Minimum mean coverage of a contig to consider for abundance distance calculation in each library

       --minCVSum arg (=2)
              Minimum total mean coverage of a contig (sum of all libraries) to consider for abundance  distance
              calculation

       -s [ --minClsSize ] arg (=200000) Minimum size of a bin to be considered as the output

       -m [ --minContig ] arg (=2500)
              Minimum  size of a contig to be considered for binning (should be >=1500; ideally >=2500). If # of
              samples >= minSamples, small contigs (>=1000) will be given a chance to be recruited  to  existing
              bins by default.

       --minContigByCorr arg (=1000)
              Minimum  size  of  a  contig  to  be considered for recruiting by pearson correlation coefficients
              (activated only if # of samples >= minSamples; disabled when minContigByCorr > minContig)

       -t [ --numThreads ] arg (=0)
              Number of threads to use (0: use all cores)

       --minShared arg (=50)
              Percentage cutoff for merging fuzzy contigs

       --fuzzy
              Binning with fuzziness which assigns multiple memberships of a contig to bins (activated only with
              --pairFile at the moment)

       -l [ --onlyLabel ]
              Output only sequence labels as a list in a column without sequences

       -S [ --sumLowCV ]
              If set, then every sample that falls below the minCV will be used in an aggregate sample

       -V [ --maxVarRatio ] arg (=0)
              Ignore any contigs where variance / mean exceeds this ratio (0 disables)

       --saveTNF arg
              File to save (or load if exists) TNF matrix for each contig in input

       --saveDistance arg
              File to save (or load if exists) distance graph at lowest probability cutoff

       --saveCls
              Save cluster memberships as a matrix format

       --unbinned
              Generate [outFile].unbinned.fa file for unbinned contigs

       --noBinOut
              No bin output. Usually combined with --saveCls to check only contig memberships

       -B [ --B ] arg (=20)
              Number of bootstrapping for ensemble binning (Recommended to be >=20)

       --pB arg (=50)
              Proportion of shared membership in bootstrapping. Major control for  sensitivity/specificity.  The
              higher, the specific. (Percentage; Should be between 0 and 100)

       --seed arg (=0)
              For  reproducibility  in ensemble binning, though it might produce slightly different results. (0:
              use random seed)

       --keep Keep the intermediate files for later usage

       -d [ --debug ]
              Debug output

       -v [ --verbose ]
              Verbose output

See Also