sfetch - get a sequence from a flatfile database.
Contents
Description
sfetch retrieves the sequence named seqname from a sequence database.
Which database is used is controlled by the -d and -D options, or "little databases" and "big databases".
The directory location of "big databases" can be specified by environment variables, such as $SWDIR for
Swissprot, and $GBDIR for Genbank (see -D for complete list). A complete file path must be specified for
"little databases". By default, if neither option is specified and the name looks like a Swissprot
identifier (e.g. it has a _ character), the $SWDIR environment variable is used to attempt to retrieve
the sequence seqname from Swissprot.
A variety of other options are available which allow retrieval of subsequences (-f,-t); retrieval by
accession number instead of by name (-a); reformatting the extracted sequence into a variety of other
formats (-F); etc.
If the database has been SSI indexed, sequence retrieval will be extremely efficient; else, retrieval may
be painfully slow (the entire database may have to be read into memory to find seqname). SSI indexing is
recommended for all large or permanent databases. The program sindex creates SSI indexes for any sequence
file.
sfetch was originally named getseq, and was renamed because it clashed with a GCG program of the same
name.
Expert Options
--informat<s>
Specify that the sequence file is in format <s>, rather than the default FASTA format. Common
examples include Genbank, EMBL, GCG, PIR, Stockholm, Clustal, MSF, or PHYLIP; see the printed
documentation for a complete list of accepted format names. This option overrides the default
format (FASTA) and the -B Babelfish autodetection option.
Name
sfetch - get a sequence from a flatfile database.
Options
-a Interpret seqname as an accession number, not an identifier.
-d<seqfile>
Retrieve the sequence from a sequence file named <seqfile>. If a GSI index <seqfile>.gsi exists,
it is used to speed up the retrieval.
-f<from>
Extract a subsequence starting from position <from>, rather than from 1. See -t. If <from> is
greater than <to> (as specified by the -t option), then the sequence is extracted as its reverse
complement (it is assumed to be nucleic acid sequence).
-h Print brief help; includes version number and summary of all options, including expert options.
-o<outfile>
Direct the output to a file named <outfile>. By default, output would go to stdout.
-r<newname>
Rename the sequence <newname> in the output after extraction. By default, the original sequence
identifier would be retained. Useful, for instance, if retrieving a sequence fragment; the
coordinates of the fragment might be added to the name (this is what Pfam does).
-t<to>
Extract a subsequence that ends at position <to>, rather than at the end of the sequence. See -f.
If <to> is less than <from> (as specified by the -f option), then the sequence is extracted as its
reverse complement (it is assumed to be nucleic acid sequence)
-D<database>
Retrieve the sequence from the main sequence database coded <database>.Foreachcode,thereisanenvironment variable that specifies the directory path to that database. Recognized codes and
their corresponding environment variables are -Dsw (Swissprot, $SWDIR); -Dpir (PIR, $PIRDIR); -Dem
(EMBL, $EMBLDIR); -Dgb (Genbank, $GBDIR); -Dwp (Wormpep, $WORMDIR); and -Dowl (OWL, $OWLDIR).
Each database is read in its native flatfile format.
-F<format>
Reformat the extracted sequence into a different format. (By default, the sequence is extracted
from the database in the same format as the database.) Available formats are embl,fasta,genbank,gcg,strider,zuker,ig,pir,squid, and raw.See Also
afetch(1), alistat(1), compalign(1), compstruct(1), revcomp(1), seqsplit(1), seqstat(1), shuffle(1), sindex(1), sreformat(1), stranslate(1), weight(1).
Synopsis
sfetch[options]seqname
