logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

Catmandu::Stat - Catmandu modules for working with statistical data

Author

       Patrick Hochstenbach, "<patrick.hochstenbach at ugent.be>"

Examples

       The  Catmandu::Stat  distribution  includes  a  CSV  file  on  the Sacramento crime rate in January 2006,
       "t/SacramentocrimeJanuary2006.csv"                  also                   available                   at
       http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv

       To view statistics on the fields available in this file type:

           $ catmandu convert CSV to Stat < t/SacramentocrimeJanuary2006.csv

           | name          | count | zeros | zeros% | min | max | mean | variance | stdev | uniq~ | uniq% | entropy   |
           |---------------|-------|-------|--------|-----|-----|------|----------|-------|-------|-------|-----------|
           | #             | 7584  |       |        |     |     |      |          |       |       |       |           |
           | address       | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 5425  | 71.5  | 12.4/12.4 |
           | beat          | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 20    | 0.3   | 4.3/12.9  |
           | cdatetime     | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 5071  | 66.9  | 12.3/12.3 |
           | crimedescr    | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 305   | 4.0   | 5.6/12.6  |
           | district      | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 6     | 0.1   | 2.6/12.9  |
           | grid          | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 537   | 7.1   | 7.8/9.9   |
           | latitude      | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 5288  | 69.7  | 12.4/12.4 |
           | longitude     | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 5295  | 69.8  | 12.4/12.4 |
           | ucr_ncic_code | 7584  | 0     | 0.0    | 1   | 1   | 1    | 0.0      | 0.0   | 88    | 1.2   | 4.1/12.9  |

       The  file has 7584 rows where and all the fields "address" to "ucr_ncic_code" contain values.  Each field
       has only one value (no arrays available in the CSV file). The are 5492 unique addresses in the CSV  file.
       The "district" field has the lowest entropy, most of its values are shared among many rows.

Modules

       •   Catmandu::Exporter::Stat

       •   Catmandu::Fix::stat_mean

       •   Catmandu::Fix::stat_median

       •   Catmandu::Fix::stat_stddev

       •   Catmandu::Fix::stat_variance

Name

       Catmandu::Stat - Catmandu modules for working with statistical data

See Also

       Catmandu, Catmandu::Breaker,

Synopsis

           # Calculate statistics on the availabity of the ISBN fields in the dataset
           cat data.json | catmandu convert JSON to Stat --fields isbn

           # Preprocess data and calculate statistics
           catmandu convert MARC to Stat --fix 'marc_map(020a,isbn)' --fields isbn < data.mrc

           # Or in fix files

           # Calculate the mean of foo. E.g. foo => [1,2,3,4]
           stat_mean(foo)  # foo => '2.5'

           # Calculate the median of foo. E.g. foo => [1,2,3,4]
           stat_median(foo)  # foo => '2.5'

           # Calculate the standard deviation of foo. E.g. foo => [1,2,3,4]
           stat_stddev(foo)  # foo => '1.12'

           # Calculate the variance of foo. E.g. foo => [1,2,3,4]
           stat_variance(foo)  # foo => '1.25'

See Also