hmmpgmd - daemon for database search web services

Author

http://eddylab.org

HMMER 3.4                                           Aug 2023                                          hmmpgmd(1)

Copyright

       Copyright (C) 2023 Howard Hughes Medical Institute.
       Freely distributed under the BSD open source license.

       For additional information on copyright and licensing, see the file called COPYRIGHT in your HMMER source
       distribution, or see the HMMER web page (http://hmmer.org/).

Description

The hmmpgmd program is the daemon that we use internally for the hmmer.org web server. It essentially
stands in front of the search programs phmmer, hmmsearch, and hmmscan.

To use hmmpgmd, first an instance must be started up as a master server, and provided with at least one
sequence database (using the --seqdb flag) and/or an HMM database (using the --hmmdb flag). A sequence
database must be in hmmpgmd format, which may be produced using esl-reformat. An HMM database is of the
form produced by hmmbuild. The input database(s) will be loaded into memory by the master. When the
master has finished loading the database(s), it prints the line: "Data loaded into memory. Master is
ready."

After the master is ready, one or more instances of hmmpgmd may be started as workers. These workers may
be (and typically are) on different machines from the master, but must have access to the same database
file(s) provided to the master, with the same path. As with the master, each worker loads the database(s)
into memory, and indicates completion by printing: "Data loaded into memory. Worker is ready."

The master process and workers are expected to remain running. One or more clients then connect to the
master and submit possibly many queries. The master distributes the work of a query among the workers,
collects results, and merges them before responding to the client. Two example client programs are
included in the HMMER src directory - the C program hmmc2 and the perl script hmmpgmd_client_example.pl.
These are intended as examples only, and should be extended as necessary to meet your needs.

A query is submitted to the master from the client as a character string. Queries may be the sort that
would normally be handled by phmmer (protein sequence vs protein database), hmmsearch (protein HMM query
vs protein database), or hmmscan (protein query vs protein HMM database).

The general form of a client query is to start with a single line of the form @[options], followed by
multiple lines of text representing either the query HMM or fasta-formatted sequence. The final line of
each query is the separator //.

For example, to perform a phmmer type search of a sequence against a sequence database file, the first
line is of the form @--seqdb1, then the fasta-formatted query sequence starting with the header line
>sequence-name, followed by one or more lines of sequence, and finally the closing //.

To perform an hmmsearch type search, the query sequence is replaced by the full text of a HMMER-format
query HMM.

To perform an hmmscan type search, the text matches that of the phmmer type search, except that the first
line changes to @--hmmdb1.

In the hmmpgmd-formatted sequence database file, each sequence can be associated with one or more sub-
databases. The --seqdb flag indicates which of these sub-databases will be queried. The HMM database
format does not support sub-databases.

Name

       hmmpgmd - daemon for database search web services

Options

-h     Help; print a brief reminder of command line usage and all available options.

       --master
              Run as the master server.

       --worker<s>
              Run as a worker, connecting to the master server that is running on IP address <s>.

       --cport<n>
              Port to use for communication between clients and the master server.  The default is 51371.

       --wport<n>
              Port to use for communication between workers and the master server.  The default is 51372.

       --ccncts<n>
              Maximum number of client connections to accept. The default is 16.

       --wcncts<n>
              Maximum number of worker connections to accept. The default is 32.

       --pid<f>
              Name of file into which the process id will be written.

       --seqdb<f>
              Name of the file (in hmmpgmd format) containing protein sequences.  The contents of this file will
              be cached for searches.

       --hmmdb<f>
              Name of the file containing protein HMMs. The contents of this file will be cached for searches.

       --cpu<n>
              Number of parallel threads to use (for --worker ).