RNAalifold 2.6.4
calculate secondary structures for a set of aligned RNAs
Read aligned RNA sequences from stdin or file.aln and calculate their minimum free energy (mfe)
structure, partition function (pf) and base pairing probability matrix. Currently, input alignments have
to be in CLUSTAL, Stockholm, FASTA, or MAF format. The input format must be set manually in interactive
mode (default is Clustal), but will be determined automagically from the input file, if not expplicitly
set. It returns the mfe structure in bracket notation, its energy, the free energy of the thermodynamic
ensemble and the frequency of the mfe structure in the ensemble to stdout. It also produces Postscript
files with plots of the resulting secondary structure graph ("alirna.ps") and a "dot plot" of the base
pairing matrix ("alidot.ps"). The file "alifold.out" will contain a list of likely pairs sorted by
credibility, suitable for viewing with "AliDot.pl". Be warned that output file will overwrite any
existing files of the same name.
-h, --help
Print help and exit
--detailed-help
Print help, including all details and hidden options, and exit
--full-help
Print help, including hidden options, and exit
-V, --version
Print version and exit
-v, --verbose
Be verbose.
(default=off)
-q, --quiet
Be quiet. (default=off)
This option can be used to minimize the output of additional information and non-severe warnings
which otherwise might spam stdout/stderr.
I/OOptions:
Command line options for input and output (pre-)processing
-f, --input-format=C|S|F|M
File format of the input multiple sequence alignment (MSA).
If this parameter is set, the input is considered to be in a particular file format. Otherwise,
the program tries to determine the file format automatically, if an input file was provided in the
set of parameters. In case the input MSA is provided in interactive mode, or from a terminal
(TTY), the programs default is to assume CLUSTALW format. Currently, the following formats are
available: ClustalW ('C'), Stockholm 1.0 ('S'), FASTA/Pearson ('F'), and MAF ('M').
--mis Output "most informative sequence" instead of simple consensus: For each column of the alignment
output the set of nucleotides with frequency greater than average in IUPAC notation.
(default=off)
-j, --jobs[=number]
Split batch input into jobs and start processing in parallel using multiple threads. A value of 0
indicates to use as many parallel threads as computation cores are available.
(default=`0')
Default processing of input data is performed in a serial fashion, i.e. one alignment at a time.
Using this switch, a user can instead start the computation for many alignments in the input in
parallel. RNAalifold will create as many parallel computation slots as specified and assigns input
alignments of the input file(s) to the available slots. Note, that this increases memory
consumption since input alignments have to be kept in memory until an empty compute slot is
available and each running job requires its own dynamic programming matrices.
--unordered
Do not try to keep output in order with input while parallel processing is in place.
(default=off)
When parallel input processing (--jobs flag) is enabled, the order in which input is processed
depends on the host machines job scheduler. Therefore, any output to stdout or files generated by
this program will most likely not follow the order of the corresponding input data set. The
default of RNAalifold is to use a specialized data structure to still keep the results output in
order with the input data. However, this comes with a trade-off in terms of memory consumption,
since all output must be kept in memory for as long as no chunks of consecutive, ordered output
are available. By setting this flag, RNAalifold will not buffer individual results but print them
as soon as they have been computated.
--noconv
Do not automatically substitute nucleotide "T" with "U".
(default=off)
-n, --continuous-ids
Use continuous alignment ID numbering when no alignment ID can be retrieved from input data.
(default=off)
Due to its past, RNAalifold produces a specific set of output file names for the first input
alignment, "alirna.ps", "alidot.ps", etc. But for all further alignments in the input, it usually
adopts a naming scheme based on IDs, which may be retrieved from the input alignment's meta-data,
or generated by a prefix followed by an increasing counter. Setting this flag instructs RNAalifold
to use the ID naming scheme also for the first alignment.
--auto-id
Automatically generate an ID for each alignment.
(default=off)
The default mode of RNAalifold is to automatically determine an ID from the input alignment if the
input file format allows to do that. Alignment IDs are, for instance, usually given in Stockholm
1.0 formatted input. If this flag is active, RNAalifold ignores any IDs retrieved from the input
and automatically generates an ID for each alignment.
--id-prefix=STRING
Prefix for automatically generated IDs (as used in output file names).
(default=`alignment')
If this parameter is set, each alignment will be prefixed with the provided string. Hence, the
output files will obey the following naming scheme: "prefix_xxxx_ss.ps" (secondary structure
plot), "prefix_xxxx_dp.ps" (dot-plot), "prefix_xxxx_aln.ps" (annotated alignment), etc. where xxxx
is the alignment number beginning with the second alignment in the input. Use this setting in
conjunction with the --continuous-ids flag to assign IDs beginning with the first input alignment.
--id-delim=CHAR
Change the delimiter between prefix and increasing number for automatically generated IDs (as used
in output file names).
(default=`_')
This parameter can be used to change the default delimiter '_' between the prefix string and the
increasing number for automatically generated ID.
--id-digits=INT
Specify the number of digits of the counter in automatically generated alignment IDs.
(default=`4')
When alignments IDs are automatically generated, they receive an increasing number, starting with
1. This number will always be left-padded by leading zeros, such that the number takes up a
certain width. Using this parameter, the width can be specified to the users need. We allow
numbers in the range [1:18].
--id-start=LONG
Specify the first number in automatically generated alignment IDs.
(default=`1')
When alignment IDs are automatically generated, they receive an increasing number, usually
starting with 1. Using this parameter, the first number can be specified to the users
requirements. Note: negative numbers are not allowed. Note: Setting this parameter implies
continuous alignment IDs, i.e. it activates the --continuous-ids flag.
--filename-delim=CHAR
Change the delimiting character used in sanitized filenames.
(default=`ID-delimiter')
This parameter can be used to change the delimiting character used while sanitizing filenames,
i.e. replacing invalid characters. Note, that the default delimiter ALWAYS is the first character
of the "ID delimiter" as supplied through the --id-delim option. If the delimiter is a whitespace
character or empty, invalid characters will be simply removed rather than substituted. Currently,
we regard the following characters as illegal for use in filenames: backslash '\', slash '/',
question mark '?', percent sign '%', asterisk '*', colon ':', pipe symbol '|', double quote '"',
triangular brackets '<' and '>'.
Algorithms:
Select additional algorithms which should be included in the calculations.
-p, --partfunc[=INT]
Calculate the partition function and base pairing probability matrix in addition to the mfe
structure. Default is calculation of mfe structure only.
(default=`1')
In addition to the MFE structure we print a coarse representation of the pair probabilities in
form of a pseudo bracket notation, followed by the ensemble free energy, as well as the centroid
structure derived from the pair probabilities together with its free energy and distance to the
ensemble. Finally it prints the frequency of the mfe structure.
An additionally passed value to this option changes the behavior of partition function
calculation: -p0 deactivates the calculation of the pair probabilities, saving about 50% in
runtime. This prints the ensemble free energy 'dG=-kT ln(Z)'.
--betaScale=DOUBLE
Set the scaling of the Boltzmann factors. (default=`1.')
The argument provided with this option is used to scale the thermodynamic temperature in the
Boltzmann factors independently from the temperature of the individual loop energy contributions.
The Boltzmann factors then become 'exp(- dG/(kTn*betaScale))' where 'k' is the Boltzmann constant,
'dG' the free energy contribution of the state, 'T' the absolute temperature and 'n' the number of
sequences.
-S, --pfScale=DOUBLE
In the calculation of the pf use scale*mfe as an estimate for the ensemble free energy (used to
avoid overflows).
(default=`1.07')
The default is 1.07, useful values are 1.0 to 1.2. Occasionally needed for long sequences.
--MEA[=gamma]
Compute MEA (maximum expected accuracy) structure.
(default=`1.')
The expected accuracy is computed from the pair probabilities: each base pair '(i,j)' receives a
score '2*gamma*p_ij' and the score of an unpaired base is given by the probability of not forming
a pair. The parameter gamma tunes the importance of correctly predicted pairs versus unpaired
bases. Thus, for small values of gamma the MEA structure will contain only pairs with very high
probability. Using --MEA implies -p for computing the pair probabilities.
--sci Compute the structure conservation index (SCI) for the MFE consensus structure of the alignment.
(default=off)
-c, --circ
Assume a circular (instead of linear) RNA molecule.
(default=off)
--bppmThreshold=cutoff
Set the threshold/cutoff for base pair probabilities included in the postscript output.
(default=`1e-6')
By setting the threshold the base pair probabilities that are included in the output can be
varied. By default only those exceeding '1e-6' in probability will be shown as squares in the dot
plot. Changing the threshold to any other value allows for increase or decrease of data.
-g, --gquad
Incoorporate G-Quadruplex formation into the structure prediction algorithm.
(default=off)
-s, --stochBT=INT
Stochastic backtrack. Compute a certain number of random structures with a probability dependend
on the partition function. See -p option in RNAsubopt.
--stochBT_en=INT
same as -s option but also print out the energies and probabilities of the backtraced structures.
-N, --nonRedundant
Enable non-redundant sampling strategy.
(default=off)
StructureConstraints:
Command line options to interact with the structure constraints feature of this program
--maxBPspan=INT
Set the maximum base pair span.
(default=`-1')
-C, --constraint[=filename]
Calculate structures subject to constraints. The constraining structure will be read from
'stdin', the alignment has to be given as a file name on the command line.
(default=`')
The program reads first the sequence, then a string containing constraints on the structure
encoded with the symbols:
'.' (no constraint for this base)
'|' (the corresponding base has to be paired
'x' (the base is unpaired)
'<' (base i is paired with a base j>i)
'>' (base i is paired with a base j<i)
and matching brackets '(' ')' (base i pairs base j)
With the exception of '|', constraints will disallow all pairs conflicting with the constraint.
This is usually sufficient to enforce the constraint, but occasionally a base may stay unpaired in
spite of constraints. PF folding ignores constraints of type '|'.
--batch
Use constraints for all alignment records. (default=off)
Usually, constraints provided from input file are only applied to a single sequence alignment.
Therefore, RNAalifold will stop its computation and quit after the first input alignment was
processed. Using this switch, RNAalifold processes all sequence alignments in the input and
applies the same provided constraints to each of them.
--enforceConstraint
Enforce base pairs given by round brackets '(' ')' in structure constraint.
(default=off)
--SS_cons
Use consensus structures from Stockholm file ('#=GF SS_cons') as constraint.
(default=off)
Stockholm formatted alignment files have the possibility to store a secondary structure string in
one of if ('#=GC') column annotation meta tags. The corresponding tag name is usually 'SS_cons', a
consensus secondary structure. Activating this flag allows one to use this consensus secondary
structure from the input file as structure constraint. Currently, only the following characters
are interpreted:
'(' ')' [mathing parenthesis: column i pairs with column j]
'<' '>' [matching angular brackets: column i pairs with column j]
All other characters are not interpreted (yet). Note: Activating this flag implies --constraint.
--shape=file1,file2
Use SHAPE reactivity data to guide structure predictions.
Multiple shapefiles for the individual sequences in the alignment may be specified as a comma
separated list. An optional association of particular shape files to a specific sequence in the
alignment can be expressed by prepending the sequence number to the filename, e.g.
"5=seq5.shape,3=seq3.shape" will assign the reactivity values from file seq5.shape to the fifth
sequence in the alignment, and the values from file seq3.shape to sequence 3. If no assignment is
specified, the reactivity values are assigned to corresponding sequences in the order they are
given.
--shapeMethod=D[mX][bY]
Specify the method how to convert SHAPE reactivity data to pseudo energy contributions.
(default=`D')
Currently, the only data conversion method available is that of to Deigan et al 2009. This method
is the default and is recognized by a capital 'D' in the provided parameter, i.e.:
--shapeMethod="D" is the default setting. The slope 'm' and the intercept 'b' can be set to a
non-default value if necessary. Otherwise m=1.8 and b=-0.6 as stated in the paper mentionen
before. To alter these parameters, e.g. m=1.9 and b=-0.7, use a parameter string like this:
--shapeMethod="Dm1.9b-0.7". You may also provide only one of the two parameters like:
--shapeMethod="Dm1.9" or --shapeMethod="Db-0.7".
EnergyParameters:
Energy parameter sets can be adapted or loaded from user-provided input files
-T, --temp=DOUBLE
Rescale energy parameters to a temperature of temp C. Default is 37C.
(default=`37.0')
-P, --paramFile=paramfile
Read energy parameters from paramfile, instead of using the default parameter set.
Different sets of energy parameters for RNA and DNA should accompany your distribution. See the
RNAlib documentation for details on the file format. The placeholder file name 'DNA' can be used
to load DNA parameters without the need to actually specify any input file.
-4, --noTetra
Do not include special tabulated stabilizing energies for tri-, tetra- and hexaloop hairpins.
(default=off)
Mostly for testing.
--salt=DOUBLE
Set salt concentration in molar (M). Default is 1.021M.
ModelDetails:
Tweak the energy model and pairing rules additionally using the following parameters
-d, --dangles=INT
How to treat "dangling end" energies for bases adjacent to helices in free ends and multi-loops.
(default=`2')
With -d2 dangling energies will be added for the bases adjacent to a helix on both sides
in any case.
The option -d0 ignores dangling ends altogether (mostly for debugging).
--noLP Produce structures without lonely pairs (helices of length 1).
(default=off)
For partition function folding this only disallows pairs that can only occur isolated. Other pairs
may still occasionally occur as helices of length 1.
--noGU Do not allow GU pairs.
(default=off)
--noClosingGU
Do not allow GU pairs at the end of helices.
(default=off)
--cfactor=DOUBLE
Set the weight of the covariance term in the energy function
(default=`1.0')
--nfactor=DOUBLE
Set the penalty for non-compatible sequences in the covariance term of the energy function
(default=`1.0')
-E, --endgaps
Score pairs with endgaps same as gap-gap pairs.
(default=off)
-R, --ribosum_file=ribosumfile
use specified Ribosum Matrix instead of normal
energy model.
Matrixes to use should be 6x6 matrices, the order of the terms is 'AU', 'CG', 'GC', 'GU', 'UA',
'UG'.
-r, --ribosum_scoring
use ribosum scoring matrix. (default=off)
The matrix is chosen according to the minimal and maximal pairwise identities of the sequences in
the file.
--old use old energy evaluation, treating gaps as characters.
(default=off)
--nsp=STRING
Allow other pairs in addition to the usual AU,GC,and GU pairs.
Its argument is a comma separated list of additionally allowed pairs. If the first character is a
"-" then AB will imply that AB and BA are allowed pairs, e.g. --nsp="-GA" will allow GA and AG
pairs. Nonstandard pairs are given 0 stacking energy.
-e, --energyModel=INT
Set energy model.
Rarely used option to fold sequences from the artificial ABCD... alphabet, where A pairs B, C-D
etc. Use the energy parameters for GC (-e 1) or AU (-e 2) pairs.
--helical-rise=FLOAT
Set the helical rise of the helix in units of Angstrom.
(default=`2.8')
Use with caution! This value will be re-set automatically to 3.4 in case DNA parameters are loaded
via -P DNA and no further value is provided.
--backbone-length=FLOAT
Set the average backbone length for looped regions in units of Angstrom.
(default=`6.0')
Use with caution! This value will be re-set automatically to 6.76 in case DNA parameters are
loaded via -P DNA and no further value is provided.
Plotting:
Command line options for changing the default behavior of structure layout and pairing probability
plots
--color
Produce a colored version of the consensus structure plot "alirna.ps" (default b&w only)
(default=off)
--aln Produce a colored and structure annotated alignment in PostScript format in the file "aln.ps" in
the current directory.
(default=off)
--aln-EPS-cols=INT
Number of columns in colored EPS alignment output.
(default=`60')
A value less than 1 indicates that the output should not be wrapped at all.
--aln-stk[=prefix]
Create a multi-Stockholm formatted output file. (default=`RNAalifold_results')
The default file name used for the output is "RNAalifold_results.stk". Users may change the
filename to "prefix.stk" by specifying the prefix as optional argument. The file will be create in
the current directory if it does not already exist. In case the file already exists, output will
be appended to it. Note: Any special characters in the filename will be replaced by the filename
delimiter, hence there is no way to pass an entire directory path through this option yet. (See
also the "--filename-delim" parameter)
--noPS Do not produce postscript drawing of the mfe structure.
(default=off)
--noDP Do not produce dot-plot postscript file containing base pair or stack probabilitities.
(default=off)
In combination with the -p option, this flag turns-off creation of individual dot-plot files.
Consequently, computed base pair probability output is omitted but centroid and MEA structure
prediction is still performed.
-t, --layout-type=INT
Choose the layout algorithm. (default=`1')
Select the layout algorithm that computes the nucleotide coordinates. Currently, the following
algorithms are available:
'0': simple radial layout
'1': Naview layout (Bruccoleri et al. 1988)
'2': circular layout
'3': RNAturtle (Wiegreffe et al. 2018)
'4': RNApuzzler (Wiegreffe et al. 2018)
Caveats:
Sequences are not weighted. If possible, do not mix very similar and dissimilar sequences. Duplicate
sequences, for example, can distort the prediction.