The available methods are listed below.
new
The object constructor takes a mandatory, but otherwise un-ordered reference to a list of gene lengths
comprising the biological group (e.g. a pathway) whose mutation significance is to be analyzed using the
PathScan paradigm.
my $obj = PopulationPathScan->new ([474, 1038, 285, ...]);
The method checks to make sure that all elements are legitimate lengths, i.e. integers exceeding 3.
assign
This method assigns the manner in which genes will be internally organized for passing to the PathScan
calculation component. The main consideration here is how the list may be compartmentalized for greater
computational efficiency, though at some loss of accuracy, for the PathScan calculation. If the gene
list is long, exact calculation is generally infeasible. The method takes a single argument representing
the number of compartments (or sub-lists) the lengths will be divided into, e.g. 1 represents a single
list, i.e. exact computation, 2 indicates two lists, 3 three lists, etc.
$obj->assign (3);
The values are then organized internally such that the smallest genes are grouped together, then the
slightly larger ones, and so forth. Generally, 3 or 4 lists give reasonable balance between accuracy and
computation (Wendl et al., in progress).
preprocess
This method pre-processes the population-level calculation, specifically, it sets up and executes the
PathScan module to obtain the CDF associated with the given gene set and background mutation rate. It
takes the latter as an argument.
$obj->preprocess (0.0000027);
Executing this method will take various amounts of CPU time, depending upon the level of accuracy and the
number of genes in the calculation.
The method optionally takes the list of the number of mutated genes in the group for each sample as a
second argument, if this information is known at this point
$obj->preprocess (0.0000027, [4, 5, 7, 3, 0, ...]);
and it is usually better to use this form because the internals will compute only a truncated CDF that is
just sufficient to process this list, rather than computing the full CDF. Not only is speed improved,
but this helps avoid overflow errors for large pathways.
population_pval_exact
This method performs the population-level calculation using exact enumeration. It takes the list of the
number of mutated genes in the group for each sample, e.g. each patient's whole genome sequence, for
example
patient 1: 4 genes in the pathway are mutated
patient 2: 5 genes in the pathway are mutated
patient 3: 7 genes in the pathway are mutated
patient 4: 3 genes in the pathway are mutated
patient 5: 0 genes in the pathway are mutated
: : : : : : : : :
which is invoked as
$pval = $obj->population_pval_exact ([4, 5, 7, 3, 0, ...]);
Most scenarios will not actually be able to make use of this method because enumeration of all possible
cases is rarely computationally feasible. This method will mostly be useful for examining small test
cases.
population_pval_approx
This method performs the population-level calculation using Lancaster's approximate transform correction.
It takes, as a mandatory argument, the list of the number of mutated genes in the group for each sample,
e.g. each patient's whole genome sequence.
$pval = $obj->population_pval_approx ([4, 5, 7, 3, 0, ...]);
You must pass the list of hits, even if you already passed this list earlier to the pre-processing
method. Most cases will use this method because exact combination of individual probability values is
rarely computationally feasible. Note that Lancaster's method typically gives much better (more
accurate) results than Fisher's "standard" chi-square transform.
• Fisher, R. A. (1958) StatisticalMethodsforResearchWorkers, 13-th Ed. Revised, Hafner Publishing
Co., New York.
• Lancaster, H. O. (1949) TheCombinationofProbabilitiesArisingfromDatainDiscreteDistributions,
Biometrika 36(3/4), 370-382.
perl v5.30.3 2020-11-06 Genome::Model:...ulationPathScan(3pm)