logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

Statistics::Normality - test whether an empirical distribution can be taken as being drawn from a

Author

       Mike Wendl, "<mwendl at genome.wustl.edu>"

Bugs

       Please report any bugs or feature requests to "bug-statistics-normality at rt.cpan.org", or  through  the
       web   interface  at  <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Statistics-Normality>.   I  will  be
       notified, and then you'll automatically be notified of progress on your bug as I make changes.

Description

       Various situations call for testing whether an empirical sample can be presumed to have been drawn from a
       normally (Gaussian <http://en.wikipedia.org/wiki/Normal_distribution>) distributed population, especially
       because many downstream significance tests depend upon the assumption of normality.  This package
       implements some of the more well-known tests <http://en.wikipedia.org/wiki/Normality_test> from the
       mathematical statistics literature, though there are also others that are not included.  The tests here
       are all so-called omnibus tests that find departures from normality on the basis of skewness and/or
       kurtosis [Dagostino71].  Note that, although the Kolmogorov-Smirnov test
       <http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test> can also be used in this capacity, it is a
       distance test and therefore not advisable [Dagostino71].  This, and other distance tests (e.g. Chi-
       square) are not implemented here.

Export

       A list of functions that can be exported.  You can delete this section if you don't export anything, such
       as for a purely object-oriented module.

   Shapiro-WilkTest
       The Shapiro-Wilk W-Statistic test <http://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test> [Shapiro65] is
       considered to be among the most objective tests of normality [Royston92] and also one of the most
       powerful ones for detecting non-normality [Chen71].  Its statistic is essentially the roughly best
       unbiased estimator of population standard deviation to the sample variance [Dagostino71].  The test is
       mathematically complex and most implementations use several conventional approximations (as we do here),
       including Blom's formula for the expected value of the order statistics [Harter61] and transformation to
       standard normal distribution for evaluation, especially for large samples [Royston92].

               $pval = shapiro_wilk_test ([0.34, -0.2, 0.8, ...]);
               ($pval, $w_statistic) = shapiro_wilk_test ([0.34, -0.2, 0.8, ...]);

       This test may not be the best if there are many repeated values in the test distribution or when the
       number of points in the test distribution is very large, e.g. more than 5000.  The routine will carp
       about the latter, but not the former.  This particular implementation of the test also requires at least
       6 data points in the sample distribution and will croak otherwise.

   D'AgostinoK-SquaredTest
       The D'Agostino K-Squared test <http://en.wikipedia.org/wiki/D%27Agostino%27s_K-squared_test> is a good
       test against non-normality arising from kurtosis <http://en.wikipedia.org/wiki/Kurtosis> and/or skewness
       <http://en.wikipedia.org/wiki/Skewness> [Dagostino90].

               $pval = dagostino_k_square_test ([0.34, -0.2, ...]);
               ($pval, $ksq_statistic) = dagostino_k_square_test ([0.34, -0.2, ...]);

       The test statistic depends upon both the sample kurtosis and skewness, as well as the moments of these
       parameters from a normal population, as quantified by Pearson's coefficients [Pearson31].  These are
       transformed [Dagostino70,Anscombe83] to expressions that sum to the K-squared statistic, which is
       essentially chi-square-distributed with 2 degrees of freedom [Dagostino90].  The kurtosis transform, and
       thus the overall test, generally works best when the sample distribution has at least 20 data points
       [Anscombe83] and the routine will carp otherwise.

Name

       Statistics::Normality - test whether an empirical distribution can be taken as being drawn from a
       normally-distributed population

References

       •   [Anscombe83]  Anscombe,  F.  J. and Glynn, W. J. (1983) DistributionoftheKurtosisStatisticB2forNormalSamples, Biometrika 70(1), 227-234.

       •   [Chen71] Chen, E. H. (1971) ThePoweroftheShapiro-WilkWTestforNormalityinSamplesfromContaminatedNormalDistributions, Journal of the American Statistical Association 66(336), 760-762.

       •   [Dagostino70]  D'Agostino,  R.  B. (1970) TransformationtoNormalityoftheNullDistributionofG1,
           Biometrika 57(3), 679-681.

       •   [Dagostino71] D'Agostino, R. B. (1971) AnOmnibusTestofNormalityforModerateandLargeSizeSamples, Biometrika 58(2), 341-348.

       •   [Dagostino90]  D'Agostino,  R. B. et al. (1990) ASuggestionforUsingPowerfulandInformativeTestsofNormality, American Statistician 44(4), 316-321.

       •   [Harter61] Harter, H. L. (1961) Expectedvaluesofnormalorderstatistics,  Biometrika  48(1/2),
           151-165.

       •   [Pearson31] Pearson, E. S. (1931) NotesonTestsforNormality, Biometrika 22(3/4), 423-424.

       •   [Royston92] Royston, J. P. (1992) ApproximatingtheShapiro-WilkW-testfornon-normality, Statistics
           and Computing 2(3) 117-119.

       •   [Shapiro65]  Shapiro,  S.  S.  and  Wilk,  M.  B. (1965) Ananalysisofvariancetestfornormality-completesamp1es, Biometrika 52(3/4), 591-611.

Support

       You can find documentation for this module with the perldoc command.

           perldoc Statistics::Normality

       You can also look for information at:

       •   RT: CPAN's request tracker

           <http://rt.cpan.org/NoAuth/Bugs.html?Dist=Statistics-Normality>

       •   AnnoCPAN: Annotated CPAN documentation

           <http://annocpan.org/dist/Statistics-Normality>

       •   CPAN Ratings

           <http://cpanratings.perl.org/d/Statistics-Normality>

       •   Search CPAN

           <http://search.cpan.org/dist/Statistics-Normality/>

Synopsis

           use Statistics::Normality ':all';
           use Statistics::Normality 'shapiro_wilk_test';
           use Statistics::Normality 'dagostino_k_square_test';

Tests

       The subtleties and esoterica of various statistical tests for normality require some familiarity with the
       mathematical statistics literature.  We give rules-of-thumb for specific tests, where they exist, but it
       may be advisable to try several different tests to check the consistency of the conclusion.  It is
       probably also a good idea to check results graphically, either by direct plotting or by a Q-Q plot
       <http://en.wikipedia.org/wiki/Q-Q_plot>.  In general, small samples will often pass a normality test
       suggesting the possibility that there is insufficient information to detect departure from normal for
       such cases, should it exist.

       Each of the methods here is a frequentist test, i.e. one that tests against the null-hypothesis
       <http://en.wikipedia.org/wiki/Null_hypothesis> that the sample is normal.  In other words, a low p-value
       recommends rejecting the null.

Version

       Version 0.01

See Also