Treemodels--rho, -R <rho>
Set the *scale* (overall evolutionary rate) of the model for the conserved state to be <rho> times
that of the model for the non-conserved state (0 < <rho> < 1; default 0.3). If used with
--estimate-trees or --estimate-rho, the specified value will be used for initialization only (rho
will be estimated). This option is ignored if two tree models are given.
--estimate-trees, -T <fname_root> Estimate free parameters of tree models and write new models to
<fname_root>.cons.mod and <fname_root>.noncons.mod.
--estimate-rho, -O <fname_root>
Like --estimate-trees, but estimate only the parameter rho.
--gc, -G <val> (Optionally use with --estimate-trees or --estimate-rho) Assume a background nucleotide
distribution consistent with the given average G+C content (0 < <val> < 1) when estimating tree
models. (The frequencies of G and C will be set to <val>/2 and the frequencies of A and T will be
set to (1-<val>)/2.) This option overrides the default behavior of estimating the background
distribution from the data (if --estimate-trees) or obtaining them from the input model (if
--estimate-rho).
--nrates, -k <nrates> | <nrates_conserved,nrates_nonconserved> (Optionally use with a discrete-gamma
model and --estimate-trees) Assume the specified number of rate categories, instead of the number
given in the *.mod file. The shape parameter 'alpha' will be as given in the *.mod file. In the
case of the default two-state HMM, two values can be specified, for the numbers of rates for the
conserved and the nonconserved states, resp.
State-transitionparameters--transitions, -t [~]<mu>,<nu>
Fix the transition probabilities of the two-state HMM as specified, rather than estimating them by
maximum likelihood. Alternatively, if first character of argument is '~', estimate parameters,
but initialize to specified values. The argument <mu> is the probability of transitioning from
the conserved to the non-conserved state, and <nu> is the probability of the reverse transition.
The probabilities of self transitions are thus 1-<mu> and 1-<nu> and the expected lengths of
conserved and nonconserved elements are 1/<mu> and 1/<nu>, respectively.
--target-coverage, -C <gamma>
(Alternative to --transitions) Constrain transition parameters such that the expected fraction of
sites in conserved elements is <gamma> (0 < <gamma> < 1). This is a *prior* rather than
*posterior* expectation and assumes stationarity of the state-transition process. Adding this
constraint causes the ratio mu/nu to be fixed at (1-<gamma>)/<gamma>. If used with
--expected-length, the transition probabilities will be completely fixed; otherwise the
expected-length parameter <omega> will be estimated by maximum likelihood. --expected-length, -E
[~]<omega> {--expected-lengths also allowed, for backward compatibility}
(For use with --target-coverage, alternative to --transitions) Set transition probabilities such
that the expected length of a conserved element is <omega>. Specifically, the parameter mu is set
to 1/<omega>. If preceded by '~', <omega> will be estimated, but will be initialized to the
specified value.
Input/output--msa-format, -i PHYLIP|FASTA|MPM|SS|MAF
Alignment file format.
Default is to guess format based on
file contents.
Note that the msa_view program can be used to
convert between formats.
--viterbi [alternatively --most-conserved], -V <fname> Predict discrete elements using the Viterbi
algorithm and write to specified file. Output is in BED format, unless <fname> has suffix ".gff",
in which case output is in GFF.
--score, -s (Optionally use with --viterbi) Assign a log-odds score to each prediction.
--lnl, -L <fname>
Compute total log likelihood using the forward algorithm and write to specified file.
--no-post-probs, -n Suppress output of posterior probabilities. Useful if only discrete elements or
likelihood is of interest.
--log, -g <log_fname>
(Optionally use when estimating free parameters) Write log of optimization procedure to specified
file.
--refidx, -r <refseq_idx> Use coordinate frame of specified sequence in output. Default
value is 1, first sequence in alignment; 0 indicates coordinate frame of entire multiple
alignment.
--seqname, -N <name> (Optionally use with --viterbi) Use specified string for 'seqname' (GFF) or 'chrom'
field in output file. Default is obtained from input file name (double filename root, e.g.,
"chr22" if input file is "chr22.35.ss").
--idpref, -P <name>
(Optionally use with --viterbi) Use specified string as prefix of generated ids in output file.
Can be used to ensure ids are unique. Default is obtained from input file name (single filename
root, e.g., "chr22.35" if input file is "chr22.35.ss").
--quiet, -q Proceed quietly (without updates to stderr).
--help, -h
Print this help message. (Indels) [experimental]
--indels, -I
Expand HMM state space to model indels as described in Siepel & Haussler (2004).
--max-micro-indel, -Y <length> (Optionally use with --indels) Maximum length of an alignment gap to be
considered a "micro-indel" and therefore addressed by the indel model. Gaps longer than this
threshold will be treated as missing data. Default value is 20.
--indel-params, -D [~]<alpha_0,beta_0,tau_0,alpha_1,beta_1,tau_1>
(Optionally use with --indels and default two-state HMM) Fix the indel parameters at (alpha_0,
beta_0, tau_0) for the conserved state and at (alpha_1, beta_1, tau_1) for the non-conserved
state, rather than estimating them by maximum likelihood. Alternatively, if first character of
argument is '~', estimate parameters, but initialize with specified values. Alpha_j is the rate
of insertion events per substitution per site in state j (typically ~0.05), beta_j is the rate of
deletion events per substitution per site in state j (typically ~0.05), and tau_j is approximately
the inverse of the expected indel length in state j (typically 0.2-0.5).
--indels-only, -J Like --indels but force the use of a single-state HMM. This option allows the effect
of the indel model in isolation to be observed. Implies --no-post-probs. Use with --lnl.
(Felsenstein/Churchill model) [rarely used]
--FC, -X
(Alternative to --hmm; specify only one *.mod file with this option) Use an HMM with a state for
every rate category in the given phylogenetic model, and transition probabilities defined by an
autocorrelation parameter lambda (as described by Felsenstein and Churchill, 1996). A rate
constant for each state (rate category) will be multiplied by the branch lengths of the
phylogenetic model, to create a "scaled" version of the model for that state. If the phylogenetic
model was estimated using Yang's discrete gamma method (-k option to phyloFit), then the rate
constants will be defined according to the estimated shape parameter 'alpha', as described by Yang
(1994). Otherwise, a nonparameteric model of rate variation must have been used (-K option to
phyloFit), and the rate constants will be as defined (explicitly) in the *.mod file. By default,
the parameter lambda will be estimated by maximum likelihood (see --lambda).
--lambda, -l [~]<lambda>
(Optionally use with --FC) Fix lambda at the specified value rather than estimating it by maximum
likelihood. Alternatively, if first character is '~', estimate but initialize at specified value.
Allowable range is 0-1. With k rate categories, the transition probability between state i and
state j will be lambda * I(i == j) + (1-lambda)/k, where I is the indicator function. Thus,
lambda = 0 implies no autocorrelation and lambda = 1 implies perfect autocorrelation. (Coding
potential) [experimental]
--coding-potential, -p
Use parameter settings that cause output to be interpretable as a coding potential score. By
default, a simplified version of exoniphy's phylo-HMM is used, with a noncoding (background)
state, a conserved non-coding (CNS) state, and states for the three codon positions. This option
implies --catmap "NCATS=4; CNS 1; CDS 2-4" --hmm <default-HMM-file> --states CDS --reflect-strand
background,CNS and a set of default *.mod files (all of which can be overridden). This option can
be used with or without --indels.
--extrapolate, -e <phylog.nh> | default
Extrapolate to a larger set of species based on the given phylogeny (Newick-format). The trees in
the given tree models (*.mod files) must be subtrees of the larger phylogeny. For each tree model
M, a copy will be created of the larger phylogeny, then scaled such that the total branch length
of the subtree corresponding to M's tree equals the total branch length of M's tree; this new
version will then be used in place of M's tree. (Any species name present in this tree but not in
the data will be ignored.) If the string "default" is given instead of a filename, then a
phylogeny for 25 vertebrate species, estimated from sequence data for Target 1 (CFTR) of the NISC
Comparative Sequencing Program (Thomas et al., 2003), will be assumed.
--alias, -A <alias_def>
Alias names in input alignment according to given definition, e.g., "hg17=human; mm5=mouse;
rn3=rat". Useful with default *.mod files, e.g., with --coding-potential. (Default models use
generic common names such as "human", "mouse", and "rat". This option allows a mapping to be
established between the leaves of trees in these files and the sequences of an alignment that uses
an alternative naming convention.)
CustomHMMs[rarelyused]--hmm, -H <hmm_fname>
Name of HMM file explicitly defining the probabilities of all state transitions. States in the
file must correspond in number and order to phylogenetic models in <mod_fname_list>. Expected
file format is as produced by 'hmm_train.'
--catmap, -c <fname>|<string> (Optionally use with --hmm) Mapping of feature types to category numbers.
Can give either a filename or an "inline" description of a simple category map, e.g., --catmap
"NCATS = 3 ; CDS 1-3".
--states, -S <state_list>
States of interest in the phylo-HMM, specified by number (indexing starts with 0), or if --catmap,
by category name. Default value is 1. Choosing --states "0,1,2" will cause output of the sum of
the posterior probabilities for states 0, 1, and 2, and/or of regions in which the Viterbi path
coincides with (any of) states 0, 1, or 2 (see --viterbi).
--reflect-strand, -U <pivot_states>
(Optionally use with --hmm) Given an HMM describing the forward strand, create a larger HMM that
allows for features on both strands by "reflecting" the original HMM about the specified "pivot"
states. The new HMM will be used for prediction on both strands. States can be specified by
number (indexing starts with 0), or if --catmap, by category name.
Missingdata[rarelyused]--require-informative, -M <states> Require "informative" columns (i.e., columns with more than two
non-missing-data characters, excluding sequences specified by --not-informative) in specified HMM
states, to help eliminate false positive predictions. States can be specified by number (indexing
starts with 0) or, if --catmap is used, by category name. Non-informative columns will be given
emission probabilities of zero. By default, this option is active, with <states> equal to the set
of states of interest for prediction (as specified by --states). Use "none" to disable
completely.
--not-informative, -F <list>
Do not consider the specified sequences (listed by name) when deciding whether a column is
informative. This option may be useful when sequences are present that are very close to the
reference sequence and thus do not contribute much in the way of phylogenetic information. E.g.,
one might use "--not-informative chimp" with a human-referenced multiple alignment including chimp
sequence, to avoid false-positive predictions based only on human/chimp alignments (can be a
problem, e.g., with --coding-potential).
--ignore-missing, -z
(For use when estimating transition probabilities) Ignore regions of missing data in all sequences
but the reference sequence (excluding sequences specified by --not-informative) when estimating
transition probabilities. Can help avoid too-low estimates of <mu> and <nu> or too-high estimates
of <lambda>. Warning: this option should not be used with --viterbi because coordinates in output
will be unrecognizable.