seqstat - show statistics and format for a sequence file
Contents
Description
seqstat reads a sequence file seqfile and shows a number of simple statistics about it.
The printed statistics include the name of the format, the residue type of the first sequence (protein,
RNA, or DNA), the number of sequences, the total number of residues, and the average and range of the
sequence lengths.
Expert Options
--informat<s>
Specify that the sequence file is in format <s>, rather than the default FASTA format. Common
examples include Genbank, EMBL, GCG, PIR, Stockholm, Clustal, MSF, or PHYLIP; see the printed
documentation for a complete list of accepted format names. This option overrides the default
expected format (FASTA) and the -B Babelfish autodetection option.
--quiet
Suppress the verbose header (program name, release number and date, the parameters and options in
effect).
Name
seqstat - show statistics and format for a sequence file
Options
-a Show additional verbose information: a table with one line per sequence showing name, length, and
description line. These lines are prefixed with a * character to enable easily grep'ing them out
and sorting them.
-h Print brief help; includes version number and summary of all options, including expert options.
-B (Babelfish). Autodetect and read a sequence file format other than the default (FASTA). Almost any
common sequence file format is recognized (including Genbank, EMBL, SWISS-PROT, PIR, and GCG
unaligned sequence formats, and Stockholm, GCG MSF, and Clustal alignment formats). See the
printed documentation for a complete list of supported formats.
See Also
afetch(1), alistat(1), compalign(1), compstruct(1), revcomp(1), seqsplit(1), sfetch(1), shuffle(1), sindex(1), sreformat(1), stranslate(1), weight(1).
Synopsis
seqstat[options]seqfile
