logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

andi - estimates evolutionary distances

Acknowledgments

       1)  andi:  Haubold,  B.  Klötzl,  F.  and  Pfaffelhuber, P. (2015). andi: Fast and accurate estimation of
       evolutionary distances between closely related genomes, Bioinformatics 31.8.
       2) Algorithms: Ohlebusch, E. (2013). Bioinformatics Algorithms. Sequence Analysis, Genome Rearrangements,
       and Phylogenetic Reconstruction. pp 118f.
       3) SA construction: Mori, Y. (2005). libdivsufsort, unpublished.
       4) Bootstrapping: Klötzl, F. and Haubold, B. (2016). Support Values for Genome Phylogenies, Life 6.1.

Bugs

ReportingBugs
       Please report bugs to <kloetzl@evolbio.mpg.de> or at <https://github.com/EvolBioInf/andi>.

0.14                                               2020-01-09                                            ANDI(1)

Description

andi  estimates  the evolutionary distance between closely related genomes. For this andi reads the input
       sequences from FASTA files and computes the pairwise anchor distance. The idea behind this  is  explained
       in a paper by Haubold et al. (2015).

Name

       andi - estimates evolutionary distances

Options

-bINT, --bootstrap=INT
              Compute  multiple  distance matrices, with n-1 bootstrapped from the first. See the paper Klötzl &
              Haubold (2016) for a detailed explanation.

       --file-of-filenames=FILE
              Usually, andi is called with  the  filenames  as  commandline  arguments.  With  this  option  the
              filenames  may also be read from a file itself, with one name per line. Use a single dash ('-') to
              read from stdin.

       -j, --join
              Use this mode if each of your FASTA files represents one assembly with numerous contigs. andi will
              then treat all of the contained sequences per file as a single genome. In this mode at  least  one
              filename  must  be  provided  via  command  line arguments. For the output the filename is used to
              identify each sequence.

       -l, --low-memory
              In multithreaded mode, andi requires memory linear to the amount of threads. The low  memory  mode
              changes this to a constant demand independent from the used number of threads. Unfortunately, this
              comes at a significant runtime cost.

       -mMODEL, --model=MODEL
              Set  the  nucleotide  evolution model to one of 'Raw', 'JC', 'Kimura', or 'LogDet'. By default the
              Jukes-Cantor correction is used.

       -pFLOAT
              Significance of an anchor; default: 0.025.

       --progress[=WHEN]
              Print a progress bar. WHEN can be 'auto' (default if omitted), 'always', or 'never'.

       -tINT, --threads=INT
              The number of threads to be used; by default, all available processors are used.
              Multithreading is only available if andi was compiled with OpenMP support.

       --truncate-names
              By default andi outputs the full names of sequences, optionally padded with spaces,  if  they  are
              shorter than ten characters. Names longer than ten characters may lead to problems with downstream
              tools. With this switch names will be truncated.

       -v, --verbose
              Prints  additional  information,  including the amount of found homology. Apply multiple times for
              extra verboseness.

       -h, --help
              Prints the synopsis and an explanation of available options.

       --version
              Outputs version information and acknowledgments.

Output

       The  output  is  a  symmetrical distance matrix in PHYLIP format, with each entry representing divergence
       with a positive real number. A distance of zero means that two sequences  are  identical,  whereas  other
       values are estimates for the nucleotide substitution rate (Jukes-Cantor corrected). For technical reasons
       the  comparison  might  fail  and  no estimate can be computed. In such cases nan is printed. This either
       means that the input sequences were too short (<200bp) or too diverse (K>0.5)  for  our  method  to  work
       properly.

Synopsis

andi [OPTIONS...] FILES...

See Also