logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

pfscale - fit parameters of an extreme-value distribution to a profile score list

Author

       The pftools package was developed by Philipp Bucher.
       Any comments or suggestions should be addressed to <pftools@sib.swiss>.

pftools 2.3                                        August 2003                                        PFSCALE(1)

Description

pfscale fits the two parameters of an extreme-value distribution to a sorted score distribution  obtained
       by searching a sequence database with a profile.  The file 'score_list' is a sorted list of profile match
       scores generated by pfsearch.  If '-' is specified instead of a filename, the score list is read from the
       standard input. The result is written to the standard output.

       If  the original profile is given as the second argument, the normalization function with the lowest mode
       number or the lowest priority number specified within the profile will be  updated  such  as  to  produce
       -Log10  per-residue  E-values.   If  the second argument is omitted, the output consists of a header line
       containing the normalization parameters followed by a modified score list, showing scorerank,  originalrawscores, log-cumulativefrequencies and corresponding normalizedscores next to each other.

       Note  that  this  program  implements  the  significance  estimation  procedure  for profile match scores
       described in Hofmann & Bucher (1995).  It  has  been  used  for  the  calculation  of  the  normalization
       parameters of all profiles in the PROSITE database.

Examples

       (1)    pfsearch -fr -C 200 sh3.prf shuffle20.seq | sort -nr | pfscale -P 0.0001 -Q 0.000001 -

              derives score-normalization parameters for the SH3 domain profile in  file  'sh3.prf'.   The  file
              'shuffle20.seq'  contains  a  window-shuffled derivative of SWISS-PROT release 30 in Pearson/Fasta
              format (window-size 20).  Note that the implicit default of N corresponds  to  the  size  of  this
              database  and  thus  needs not to be specified on the command line.  The cut-off value 200 for the
              pfsearch(1) option -C will produce about 2000 matches completely covering the range defined by the
              command line parameters -P and -Q of pfscale.  A suitable cut-off  value  has  to  be  guessed  in
              advance by computing a few optimal alignment scores for random sequences.

Exit Code

       On  successful  completion  of  its  task,  pfscale  will return an exit code of 0. If an error occurs, a
       diagnostic message will be output on standard error and the exit code will  be  different  from  0.  When
       conflicting  options  where  passed to the program but the task could nevertheless be completed, warnings
       will be issued on standard error.

Name

       pfscale - fit parameters of an extreme-value distribution to a profile score list

Notes

       (1)    The current version of pfscale does  not  yet  support  the  xpsa(5)  output  format  produced  by
              pfscan(1)  or pfsearch(1).  The score list should therefore be generated without the pfscan(1) and
              pfsearch(1) option -k.

Options

score_list
              Input score list.
              The file must contain a sorted list of scores. The first field of each line is considered as being
              a  score, all other fields on the same line are ignored.  The different fields of each line should
              be delimited by whitespaces.  If the filename is replaced by a '-', pfscale will  read  the  score
              list from stdin.

       profile
              Optional profile file.
              If  a filename is specified, the profile will be parsed and either the lowest priority mode or the
              mode number specified with option -M will be scaled. All cut-off levels which  use  the  specified
              mode number will also be updated.

       -h     Display usage help text.

       -l     Remove output line length limit. Individual lines of the output profile can exceed a length of 132
              characters, removing the need to wrap them over several lines.

       -Llog_base
              Logarithmic  base  of  the parameters of the estimated extreme-value distribution.  The parameters
              reported by pfscale are expressed as logarithms and thus can be inserted directly  into  a  linear
              normalization function defined in a generalized profile.
              Default: 10

       -Mmode_nb
              Mode number to scale.
              Defines which mode number (and implicitly which cut-off level) of the input PROSITE profile should
              be  scaled.  This  overrides the default behaviour of scaling only the normalization mode with the
              lowest priority (or lowest mode number).  All cut-off levels defined in the profile as using  this
              mode number (via the MODE keyword) will be updated as well.

       -Ndb_size
              Size  of  the  database  from  which  the  input score list was derived.  The searched database is
              typically a shuffled version of a real protein or nucleotide sequence database.
              Default: 14147368 (size of SWISS-PROT release 30 and shuffled derivatives of it).

       -Pupper_limit
              Upper threshold of the probability range to which the extreme-value distribution will  be  fitted.
              For instance: if N=10'000'000 and P=0.0001 then profile match scores below rank 1000 in the sorted
              input list (corresponding to occurrence probabilities > 0.0001) will be ignored.
              Default: 0.0001

       -Qlower_limit
              Lower  threshold  of the probability range to which the extreme-value distribution will be fitted.
              For instance: if N=10'000'000 and Q=0.000001 then profile match scores above rank 10 in the sorted
              input list (corresponding to occurrence probabilities < 0.000001) will be ignored.
              Default: 0.000001

Parameters

       Note:  for backwards compatibility, release 2.3 of the pftools package will parse the version  2.2  style
              parameters,  but  these are deprecated and the corresponding option (refer to the options section)
              should be used instead.

       L=#    Logarithmic base.
              Use option -L instead.

       M=#    Mode number.
              Use option -M instead.

       N=#    Database size.
              Use option -N instead.

       P=#    Upper probability threshold.
              Use option -P instead.

       Q=#    Lower probability threshold.
              Use option -Q instead.

References

       Hofmann K & Bucher P. (1995).  TheFHA-domain:anuclearsignallingdomainfoundinproteinkinasesandtranscriptionfactors.  Trends Biochem. Sci.  20:47-349.

See Also

pfsearch(1), pfscan(1), xpsa(5)

Synopsis

pfscale   [  -hl  ] [ -Llog_base ] [ -Mmode_nb ] [ -Ndb_size ] [ -Pupper_limit ] [ -Qlower_limit ] [
                 score_list | - ] [ profile ] [ parameters ]

See Also