This manual page briefly documents briefly the script theseus_align, designed for a quick-and-dirty way
to ML superposition proteins with different sequences. It should work very well when the protein
sequences are relatively similar, although the ML method will still give much better results than least-
squares when the sequences are moderately divergent. Technically, this procedure gives a structure-based
superposition of a sequence-based alignment. It doesnot perform a structure-based alignment.
First, the script uses theseus to create FASTA formatted sequence files corresponding to the exact
protein sequences found in the pdb files that you supply.
Second, these sequences are aligned using the multiple sequence alignment program of your choice. The
script can easily be modified for CLUSTALW, T_COFFEE, KALIGN, DIALIGN2, or MAFFT. Any multiple sequence
alignment program can be used, as long as it can generate clustal-formatted files. However, I highly
recommend Bob Edgar's MUSCLE program for both its speed and accuracy. (For more info see
http://www.drive5.com/muscle/ .)
Third, theseus performs a superposition of the structures using the sequence alignment as a guide.
The installed version of theseus_align uses muscle (1) for doing the multiple sequence alignment. If you
wish to use one of the other programs mentioned above, you'll have to copy the script to your own
directory and edit it.