logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

Statistics::TopK - Implementation of the top-k streaming algorithm

Author

       gray, <gray at cpan.org>

perl v5.36.0                                       2023-02-03                              Statistics::TopK(3pm)

Description

       The "Statistics::TopK" module implements the top-k streaming algorithm, also know as the "heavy hitters"
       algorithm. It is designed to process data streams and probabilistally calculate the "k" most frequent
       items while using limited memory.

       A typical example would be to determine the top 10 IP addresses listed in an access log. A simple
       solution would be to hash each IP address to a counter and then sort the resulting hash by the counter
       size. But the hash could theoretically require over 4 billion keys.

       The top-k algorithm only requires storage space proportional to the number of items of interest. It
       accomplishes this by sacrificing precision, as it is only a probabilistic counter.

Methods

new
           $counter = Statistics::TopK->new($k)

       Creates a new "Statistics::TopK" object which is prepared to count the top $k elements.

   add
           $count = $counter->add($element)

       Count the given $element and return its approximate count (if any) in the "Statistics::TopK" object.

       Note that adding an element does not guarantee it will be counted yet, as the algorithm is probabilistic,
       and the occurrence of the current element might only be used decrease the count of one of the current top
       elements.

   top
           @top = $counter->top()

       Returns a list of the top-k counted elements.

   counts
           %counts = $counter->counts()

       Returns a hash of the top-k counted elements and their counts.

Name

       Statistics::TopK - Implementation of the top-k streaming algorithm

Requests And Bugs

       Please report any bugs or feature requests to
       <http://rt.cpan.org/Public/Bug/Report.html?Queue=Statistics-TopK>. I will be notified, and then you'll
       automatically be notified of progress on your bug as I make changes.

See Also

Support

       You can find documentation for this module with the perldoc command.

           perldoc Statistics::TopK

       You can also look for information at:

       •   GitHub Source Repository

           <http://github.com/gray/statistics-topk>

       •   AnnoCPAN: Annotated CPAN documentation

           <http://annocpan.org/dist/Statistics-TopK>

       •   CPAN Ratings

           <http://cpanratings.perl.org/d/Statistics-TopK>

       •   RT: CPAN's request tracker

           <http://rt.cpan.org/Public/Dist/Display.html?Name=Statistics-TopK>

       •   Search CPAN

           <http://search.cpan.org/dist/Statistics-TopK/>

Synopsis

           use Statistics::TopK;

           my $counter = Statistics::TopK->new(10);
           while (my $val = <STDIN>) {
               chomp $val;
               $counter->add($val);
           }
           my @top = $counter->top;
           my %counts = $counter->counts;

See Also