seqstat - show statistics and format for a sequence file

Author

       Biosquid and its documentation are Copyright (C) 1992-2003 HHMI/Washington University School of  Medicine
       Freely distributed under the GNU General Public License (GPL) See COPYING in the source code distribution
       for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu

Biosquid 1.9g                                     January 2003                                        seqstat(1)

Description

seqstat reads a sequence file seqfile and shows a number of simple statistics about it.

       The  printed  statistics include the name of the format, the residue type of the first sequence (protein,
       RNA, or DNA), the number of sequences, the total number of residues, and the average  and  range  of  the
       sequence lengths.

Expert Options

--informat<s>
              Specify  that  the  sequence  file is in format <s>, rather than the default FASTA format.  Common
              examples include Genbank, EMBL, GCG, PIR, Stockholm, Clustal, MSF,  or  PHYLIP;  see  the  printed
              documentation  for  a  complete  list of accepted format names.  This option overrides the default
              expected format (FASTA) and the -B Babelfish autodetection option.

       --quiet
              Suppress the verbose header (program name, release number and date, the parameters and options  in
              effect).

Name

       seqstat - show statistics and format for a sequence file

Options

-a     Show  additional verbose information: a table with one line per sequence showing name, length, and
              description line.  These lines are prefixed with a * character to enable easily grep'ing them  out
              and sorting them.

       -h     Print brief help; includes version number and summary of all options, including expert options.

       -B     (Babelfish). Autodetect and read a sequence file format other than the default (FASTA). Almost any
              common  sequence  file  format  is  recognized  (including Genbank, EMBL, SWISS-PROT, PIR, and GCG
              unaligned sequence formats, and Stockholm, GCG  MSF,  and  Clustal  alignment  formats).  See  the
              printed documentation for a complete list of supported formats.