Program extract takes a FASTA format sequence file and a file with a list of start/stop positions in that
file (e.g., as produced by the long-orfs program) and extracts and outputs the specified sequences.
The first command-line argument is the name of the sequence file, which must be in FASTA format.
The second command-line argument is the name of the coordinate file. It must contain a list of pairs of
positions in the first file, one per line. The format of each entry is:
<IDstring>> <start position> <stop position>
This file should contain no other information, so if you're using the output of glimmer or long-orfs ,
you'll have to cut off header lines.
The output of the program goes to the standard output and has one line for each line in the coordinate
file. Each line contains the IDstring , followed by white space, followed by the substring of the
sequence file specified by the coordinate pair. Specifically, the substring starts at the first position
of the pair and ends at the second position (inclusive). If the first position is bigger than the
second, then the DNA reverse complement of each position is generated. Start/stop pairs that "wrap
around" the end of the genome are allowed.