logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

Recent versions (v7.1xx; 2013 Jan.) have more features than those described below. See also the tips

Authors

KazutakaKatoh <kazutaka.katoh_at_aist.go.jp>
           Wrote Mafft.

       CharlesPlessy <charles-debian-nospam_at_plessy.org>
           Wrote this manpage in DocBook XML for the Debian distribution, using Mafft's homepage as a template.

Description

MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of
       multiple alignment methods.

   Accuracy-orientedmethods:
       •   L-INS-i (probably most accurate; recommended for <200 sequences; iterative refinement method
           incorporating local pairwise alignment information):

           mafft--localpair--maxiterate1000input [> output]

           linsiinput [> output]

       •   G-INS-i (suitable for sequences of similar lengths; recommended for <200 sequences; iterative
           refinement method incorporating global pairwise alignment information):

           mafft--globalpair--maxiterate1000input [> output]

           ginsiinput [> output]

       •   E-INS-i (suitable for sequences containing large unalignable regions; recommended for <200
           sequences):

           mafft--ep0--genafpair--maxiterate1000input [> output]

           einsiinput [> output]

                 For E-INS-i, the --ep0 option is recommended to allow large gaps.

   Speed-orientedmethods:
       •   FFT-NS-i (iterative refinement method; two cycles only):

           mafft--retree2--maxiterate2input [> output]

           fftnsiinput [> output]

       •   FFT-NS-i (iterative refinement method; max. 1000 iterations):

           mafft--retree2--maxiterate1000input [> output]

       •   FFT-NS-2 (fast; progressive method):

           mafft--retree2--maxiterate0input [> output]

           fftnsinput [> output]

       •   FFT-NS-1 (very fast; recommended for >2000 sequences; progressive method with a rough guide tree):

           mafft--retree1--maxiterate0input [> output]

       •   NW-NS-i (iterative refinement method without FFT approximation; two cycles only):

           mafft--retree2--maxiterate2--nofftinput [> output]

           nwnsiinput [> output]

       •   NW-NS-2 (fast; progressive method without the FFT approximation):

           mafft--retree2--maxiterate0--nofftinput [> output]

           nwnsinput [> output]

       •   NW-NS-PartTree-1 (recommended for ~10,000 to ~50,000 sequences; progressive method with the PartTree
           algorithm):

           mafft--retree1--maxiterate0--nofft--parttreeinput [> output]

   Group-to-groupalignmentsmafft-profilegroup1group2 [> output]

           or:

           mafft--maxiterate1000--seedgroup1--seedgroup2 /dev/null [> output]

Envionment

MAFFT_BINARIES
           Indicates the location of the binary files used by mafft. By default, they are searched in
           /usr/local/lib/mafft, but on Debian systems, they are searched in /usr/lib/mafft.

       FASTA_4_MAFFT
           This variable can be set to indicate to mafft the location to the fasta34 program if it is not in the
           PATH.

Files

       Mafft stores the input sequences and other files in a temporary directory, which by default is located in
       /tmp.

Name

       mafft - Multiple alignment program for amino acid or nucleotide sequences

Options

Algorithm--auto
           Automatically selects an appropriate strategy from L-INS-i, FFT-NS-i and FFT-NS-2, according to data
           size.  Default: off (always FFT-NS-2)

       --6merpair
           Distance is calculated based on the number of shared 6mers.  Default: on

       --globalpair
           All pairwise alignments are computed with the Needleman-Wunsch algorithm.  More accurate but slower
           than --6merpair.  Suitable for a set of globally alignable sequences.  Applicable to up to ~200
           sequences.  A combination with --maxiterate 1000 is recommended (G-INS-i).  Default: off (6mer
           distance is used)

       --localpair
           All pairwise alignments are computed with the Smith-Waterman algorithm.  More accurate but slower
           than --6merpair.  Suitable for a set of locally alignable sequences.  Applicable to up to ~200
           sequences.  A combination with --maxiterate 1000 is recommended (L-INS-i).  Default: off (6mer
           distance is used)

       --genafpair
           All pairwise alignments are computed with a local algorithm with the generalized affine gap cost
           (Altschul 1998).  More accurate but slower than --6merpair.  Suitable when large internal gaps are
           expected.  Applicable to up to ~200 sequences.  A combination with --maxiterate 1000 is recommended
           (E-INS-i).  Default: off (6mer distance is used)

       --fastapair
           All pairwise alignments are computed with FASTA (Pearson and Lipman 1988).  FASTA is required.
           Default: off (6mer distance is used)

       --weightinumber
           Weighting factor for the consistency term calculated from pairwise alignments.  Valid when either of
           --globalpair, --localpair,  --genafpair, --fastapair or --blastpair is selected.  Default: 2.7

       --retreenumber
           Guide tree is built number times in the progressive stage.  Valid with 6mer distance.  Default: 2

       --maxiteratenumbernumber cycles of iterative refinement are performed.  Default: 0

       --fft
           Use FFT approximation in group-to-group alignment.  Default: on

       --nofft
           Do not use FFT approximation in group-to-group alignment.  Default: off

       --noscore
           Alignment score is not checked in the iterative refinement stage.  Default: off (score is checked)

       --memsave
           Use the Myers-Miller (1988) algorithm.  Default: automatically turned on when the alignment length
           exceeds 10,000 (aa/nt).

       --parttree
           Use a fast tree-building method (PartTree, Katoh and Toh 2007) with the 6mer distance.  Recommended
           for a large number (> ~10,000) of sequences are input.  Default: off

       --dpparttree
           The PartTree algorithm is used with distances based on DP.  Slightly more accurate and slower than
           --parttree.  Recommended for a large number (> ~10,000) of sequences are input.   Default: off

       --fastaparttree
           The PartTree algorithm is used with distances based on FASTA.  Slightly more accurate and slower than
           --parttree.  Recommended for a large number (> ~10,000) of sequences are input.  FASTA is required.
           Default: off

       --partsizenumber
           The number of partitions in the PartTree algorithm.  Default: 50

       --groupsizenumber
           Do not make alignment larger than number sequences. Valid only with the --*parttree options.
           Default: the number of input sequences

   Parameter--opnumber
           Gap opening penalty at group-to-group alignment.  Default: 1.53

       --epnumber
           Offset value, which works like gap extension penalty, for group-to-group alignment.  Default: 0.123

       --lopnumber
           Gap opening penalty at local pairwise alignment.  Valid when the --localpair or --genafpair option is
           selected.  Default: -2.00

       --lepnumber
           Offset value at local pairwise alignment.  Valid when the --localpair or --genafpair option is
           selected.  Default: 0.1

       --lexpnumber
           Gap extension penalty at local pairwise alignment.  Valid when the --localpair or --genafpair option
           is selected.  Default: -0.1

       --LOPnumber
           Gap opening penalty to skip the alignment.  Valid when the --genafpair option is selected.   Default:
           -6.00

       --LEXPnumber
           Gap extension penalty to skip the alignment.  Valid when the --genafpair option is selected.
           Default: 0.00

       --blnumber
           BLOSUM number matrix (Henikoff and Henikoff 1992) is used.  number=30, 45, 62 or 80.  Default: 62

       --jttnumber
           JTT PAM number (Jones et al. 1992) matrix is used.  number>0.  Default: BLOSUM62

       --tmnumber
           Transmembrane PAM number (Jones et al. 1994) matrix is used.  number>0.  Default: BLOSUM62

       --aamatrixmatrixfile
           Use a user-defined AA scoring matrix.  The format of matrixfile is the same to that of BLAST.
           Ignored when nucleotide sequences are input.   Default: BLOSUM62

       --fmodel
           Incorporate the AA/nuc composition information into the scoring matrix.  Default: off

   Output--clustalout
           Output format: clustal format.  Default: off (fasta format)

       --inputorder
           Output order: same as input.  Default: on

       --reorder
           Output order: aligned.  Default: off (inputorder)

       --treeout
           Guide tree is output to the input.tree file.  Default: off

       --quiet
           Do not report progress.  Default: off

   Input--nuc
           Assume the sequences are nucleotide.  Default: auto

       --amino
           Assume the sequences are amino acid.  Default: auto

       --seedalignment1 [--seedalignment2--seedalignment3 ...]
           Seed alignments given in alignment_n (fasta format) are aligned with sequences in input.  The
           alignment within every seed is preserved.

References

InEnglish
       •   Katoh and Toh (Bioinformatics 23:372-374, 2007) PartTree: an algorithm to build an approximate tree
           from a large number of unaligned sequences (describes the PartTree algorithm).

       •   Katoh, Kuma, Toh and Miyata (Nucleic Acids Res. 33:511-518, 2005) MAFFT version 5: improvement in
           accuracy of multiple sequence alignment (describes [ancestral versions of] the G-INS-i, L-INS-i and
           E-INS-i strategies)

       •   Katoh, Misawa, Kuma and Miyata (Nucleic Acids Res. 30:3059-3066, 2002) MAFFT: a novel method for
           rapid multiple sequence alignment based on fast Fourier transform (describes the FFT-NS-1, FFT-NS-2
           and FFT-NS-i strategies)

   InJapanese
       •   Katoh and Misawa (Seibutsubutsuri 46:312-317, 2006) Multiple Sequence Alignments: the Next Generation

       •   Katoh and Kuma (Kagaku to Seibutsu 44:102-108, 2006) Jissen-teki Multiple Alignment

See Also

mafft-homologs(1)

Synopsis

mafft [options] input [> output]

       linsiinput [> output]

       ginsiinput [> output]

       einsiinput [> output]

       fftnsiinput [> output]

       fftnsinput [> output]

       nwnsinput [> output]

       nwnsiinput [> output]

       mafft-profilegroup1group2 [> output]

                     input, group1 and group2 must be in FASTA format.

This Manual Is For V6.2Xx (2007)

       Recent versions (v7.1xx; 2013 Jan.) have more features than those described below.  See also the tips
       page at http://mafft.cbrc.jp/alignment/software/tips0.html

See Also