logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

gt-fingerprint - Compute MD5 fingerprints for each sequence given in a set of sequence files.

Description

-check [filename]
           compare all fingerprints contained in the given checklist file with checksums in given
           sequence_files(s). The comparison is successful, if all fingerprints given in checkfile can be found
           in the sequence_file(s) in the exact same quantity and vice versa. (default: undefined)

       -duplicates [yes|no]
           show duplicate fingerprints from given sequence_file(s). (default: no)

       -extract [string]
           extract the sequence(s) with the given fingerprint from sequence file(s) and show them on stdout.
           (default: undefined)

       -width [value]
           set output width for FASTA sequence printing (0 disables formatting) (default: 0)

       -o [filename]
           redirect output to specified file (default: undefined)

       -gzip [yes|no]
           write gzip compressed output file (default: no)

       -bzip2 [yes|no]
           write bzip2 compressed output file (default: no)

       -force [yes|no]
           force writing to output file (default: no)

       -help
           display help and exit

       -version
           display version information and exit

       If neither option -check nor option -duplicates is used, the fingerprints for all sequences are shown on
       stdout.

       Fingerprint of a sequence is case insensitive. Thus MD5 fingerprint of two identical sequences will be
       the same even if one is soft-masked.

Examples

       Compute (unified) list of fingerprints:

           $ gt fingerprint U89959_ests.fas | sort | uniq > U89959_ests.checklist_uniq

       Compare fingerprints:

           $ gt fingerprint -check U89959_ests.checklist_uniq U89959_ests.fas
           950b7715ab6cc030a8c810a0dba2dd33 only in sequence_file(s)

       Make sure a sequence file contains no duplicates (not the case here):

           $ gt fingerprint -duplicates U89959_ests.fas
           950b7715ab6cc030a8c810a0dba2dd33        2
           gt fingerprint: error: duplicates found: 1 out of 200 (0.500%)

       Extract sequence with given fingerprint:

           $ gt fingerprint -extract 6d3b4b9db4531cda588528f2c69c0a57 U89959_ests.fas
           >SQ;8720010
           TTTTTTTTTTTTTTTTTCCTGACAAAACCCCAAGACTCAATTTAATCAATCCTCAAATTTACATGATAC
           CAACGTAATGGGAGCTTAAAAATA

Name

       gt-fingerprint - Compute MD5 fingerprints for each sequence given in a set of sequence files.

Reporting Bugs

       Report bugs to https://github.com/genometools/genometools/issues.

GenomeTools 1.6.5                                  04/27/2024                                  GT-FINGERPRINT(1)

Return Values

       •   0 everything went fine (-check: the comparison was successful; -duplicates: no duplicates found)

       •   1 an error occurred (-check: the comparison was not successful; -duplicates: duplicates found)

Synopsis

gtfingerprint [option ...] sequence_file [...]

See Also