--amino
Force the sequence alignment to be interpreted as amino acid sequences. Normally HMMER autodetects
whether the alignment is protein or DNA, but sometimes alignments are so small that autodetection
is ambiguous. See --nucleic.--archpri<x>
Set the "architecture prior" used by MAP architecture construction to <x>, where <x> is a
probability between 0 and 1. This parameter governs a geometric prior distribution over model
lengths. As <x> increases, longer models are favored a priori. As <x> decreases, it takes more
residue conservation in a column to make a column a "consensus" match column in the model
architecture. The 0.85 default has been chosen empirically as a reasonable setting.
--binary
Write the HMM to hmmfile in HMMER binary format instead of readable ASCII text.
--cfile<f>
Save the observed emission and transition counts to <f> after the architecture has been determined
(e.g. after residues/gaps have been assigned to match, delete, and insert states). This option is
used in HMMER development for generating data files useful for training new Dirichlet priors. The
format of count files is documented in the User's Guide.
--fast Quickly and heuristically determine the architecture of the model by assigning all columns will
more than a certain fraction of gap characters to insert states. By default this fraction is 0.5,
and it can be changed using the --gapmax option. The default construction algorithm is a maximum
a posteriori (MAP) algorithm, which is slower.
--gapmax<x>
Controls the --fast model construction algorithm, but if --fast is not being used, has no effect.
If a column has more than a fraction <x> of gap symbols in it, it gets assigned to an insert
column. <x> is a frequency from 0 to 1, and by default is set to 0.5. Higher values of <x> mean
more columns get assigned to consensus, and models get longer; smaller values of <x> mean fewer
columns get assigned to consensus, and models get smaller. <x>--hand Specify the architecture of the model by hand: the alignment file must be in SELEX or Stockholm
format, and the reference annotation line (#=RF in SELEX, #=GC RF in Stockholm) is used to specify
the architecture. Any column marked with a non-gap symbol (such as an 'x', for instance) is
assigned as a consensus (match) column in the model.
--idlevel<x>
Controls both the determination of effective sequence number and the behavior of the --wblosum
weighting option. The sequence alignment is clustered by percent identity, and the number of
clusters at a cutoff threshold of <x> is used to determine the effective sequence number. Higher
values of <x> give more clusters and higher effective sequence numbers; lower values of <x> give
fewer clusters and lower effective sequence numbers. <x> is a fraction from 0 to 1, and by
default is set to 0.62 (corresponding to the clustering level used in constructing the BLOSUM62
substitution matrix).
--informat<s>
Assert that the input seqfile is in format <s>; do not run Babelfish format autodection. This
increases the reliability of the program somewhat, because the Babelfish can make mistakes;
particularly recommended for unattended, high-throughput runs of HMMER. Valid format strings
include FASTA, GENBANK, EMBL, GCG, PIR, STOCKHOLM, SELEX, MSF, CLUSTAL, and PHYLIP. See the User's
Guide for a complete list.
--noeff
Turn off the effective sequence number calculation, and use the true number of sequences instead.
This will usually reduce the sensitivity of the final model (so don't do it without good reason!)
--nucleic
Force the alignment to be interpreted as nucleic acid sequence, either RNA or DNA. Normally HMMER
autodetects whether the alignment is protein or DNA, but sometimes alignments are so small that
autodetection is ambiguous. See --amino.--null<f>
Read a null model from <f>. The default for protein is to use average amino acid frequencies from
Swissprot 34 and p1 = 350/351; for nucleic acid, the default is to use 0.25 for each base and p1 =
1000/1001. For documentation of the format of the null model file and further explanation of how
the null model is used, see the User's Guide.
--pam<f>
Apply a heuristic PAM- (substitution matrix-) based prior on match emission probabilities instead
of the default mixture Dirichlet. The substitution matrix is read from <f>. See --pamwgt.
The default Dirichlet state transition prior and insert emission prior are unaffected. Therefore
in principle you could combine --prior with --pam but this isn't recommended, as it hasn't been
tested. ( --pam itself hasn't been tested much!)
--pamwgt<x>
Controls the weight on a PAM-based prior. Only has effect if --pam option is also in use. <x> is
a positive real number, 20.0 by default. <x> is the number of "pseudocounts" contriubuted by the
heuristic prior. Very high values of <x> can force a scoring system that is entirely driven by the
substitution matrix, making HMMER somewhat approximate Gribskov profiles.
--pbswitch<n>
For alignments with a very large number of sequences, the GSC, BLOSUM, and Voronoi weighting
schemes are slow; they're O(N^2) for N sequences. Henikoff position-based weights (PB weights) are
more efficient. At or above a certain threshold sequence number <n>hmm2build will switch from
GSC, BLOSUM, or Voronoi weights to PB weights. To disable this switching behavior (at the cost of
compute time, set <n> to be something larger than the number of sequences in your alignment. <n>
is a positive integer; the default is 1000.
--prior<f>
Read a Dirichlet prior from <f>, replacing the default mixture Dirichlet. The format of prior
files is documented in the User's Guide, and an example is given in the Demos directory of the
HMMER distribution.
--swentry<x>
Controls the total probability that is distributed to local entries into the model, versus
starting at the beginning of the model as in a global alignment. <x> is a probability from 0 to
1, and by default is set to 0.5. Higher values of <x> mean that hits that are fragments on their
left (N or 5'-terminal) side will be penalized less, but complete global alignments will be
penalized more. Lower values of <x> mean that fragments on the left will be penalized more, and
global alignments on this side will be favored. This option only affects the configurations that
allow local alignments, e.g. -s and -f; unless one of these options is also activated, this
option has no effect. You have independent control over local/global alignment behavior for the
N/C (5'/3') termini of your target sequences using --swentry and --swexit.--swexit<x>
Controls the total probability that is distributed to local exits from the model, versus ending an
alignment at the end of the model as in a global alignment. <x> is a probability from 0 to 1, and
by default is set to 0.5. Higher values of <x> mean that hits that are fragments on their right
(C or 3'-terminal) side will be penalized less, but complete global alignments will be penalized
more. Lower values of <x> mean that fragments on the right will be penalized more, and global
alignments on this side will be favored. This option only affects the configurations that allow
local alignments, e.g. -s and -f; unless one of these options is also activated, this option has
no effect. You have independent control over local/global alignment behavior for the N/C (5'/3')
termini of your target sequences using --swentry and --swexit.--verbose
Print more possibly useful stuff, such as the individual scores for each sequence in the
alignment.
--wblosum
Use the BLOSUM filtering algorithm to weight the sequences, instead of the default. Cluster the
sequences at a given percentage identity (see --idlevel); assign each cluster a total weight of
1.0, distributed equally amongst the members of that cluster.
--wgsc Use the Gerstein/Sonnhammer/Chothia ad hoc sequence weighting algorithm. This is already the
default, so this option has no effect (unless it follows another option in the -\-w family, in
which case it overrides it).
--wme Use the Krogh/Mitchison maximum entropy algorithm to "weight" the sequences. This supersedes the
Eddy/Mitchison/Durbin maximum discrimination algorithm, which gives almost identical weights but
is less robust. ME weighting seems to give a marginal increase in sensitivity over the default GSC
weights, but takes a fair amount of time.
--wnone
Turn off all sequence weighting.
--wpb Use the Henikoff position-based weighting scheme.
--wvoronoi
Use the Sibbald/Argos Voronoi sequence weighting algorithm in place of the default GSC weighting.