logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

PDL::CCS::MatrixOps - Low-level matrix operations for compressed storage sparse PDLs

Acknowledgements

       Perl by Larry Wall.

       PDL by Karl Glazebrook, Tuomas J. Lukka, Christian Soeller, and others.

Author

       Bryan Jurish <moocow@cpan.org>

   CopyrightPolicy
       All other parts Copyright (C) 2009-2024, Bryan Jurish. All rights reserved.

       This package is free software, and entirely without warranty.  You may redistribute it and/or modify it
       under the same terms as Perl itself.

Functions

ccs_matmult2d_sdd
         Signature: (
           indx ixa(Two=2,NnzA); nza(NnzA); missinga();
           b(O,M);
           zc(O);
           [o]c(O,N)
           ; PDL_Indx sizeN)

       Two-dimensional matrix multiplication of a sparse index-encoded PDL $a() with a dense pdl $b(), with
       output to a dense pdl $c().

       The sparse input PDL $a() should be passed here with 0th dimension "M" and 1st dimension "N", just as for
       the built-in PDL::Primitive::matmult().

       "Missing" values in $a() are treated as $missinga(), which shouldn't be BAD or infinite, but otherwise
       ought to be handled correctly.  The input pdl $zc() is used to pass the cached contribution of a
       $missinga()-row ("M") to an output column ("O"), i.e.

        $zc = ((zeroes($M,1)+$missinga) x $b)->flat;

       $SIZE(Two) must be 2.

       ccs_matmult2d_sdd processes bad values.  It will set the bad-value flag of all output ndarrays if the
       flag is set for any of the input ndarrays.

   ccs_matmult2d_zdd
         Signature: (
           indx ixa(Two=2,NnzA); nza(NnzA);
           b(O,M);
           [o]c(O,N)
           ; PDL_Indx sizeN)

       Two-dimensional matrix multiplication of a sparse index-encoded PDL $a() with a dense pdl $b(), with
       output to a dense pdl $c().

       The sparse input PDL $a() should be passed here with 0th dimension "M" and 1st dimension "N", just as for
       the built-in PDL::Primitive::matmult().

       "Missing" values in $a() are treated as zero.  $SIZE(Two) must be 2.

       ccs_matmult2d_zdd processes bad values.  It will set the bad-value flag of all output ndarrays if the
       flag is set for any of the input ndarrays.

   ccs_vnorm
         Signature: (
           indx acols(NnzA); avals(NnzA);
           float+ [o]vnorm(M);
           ; PDL_Indx sizeM=>M)

       Computes the Euclidean lengths of each column-vector $a(i,*) of a sparse index-encoded pdl $a() of
       logical dimensions (M,N), with output to a dense piddle $vnorm().  "Missing" values in $a() are treated
       as zero, and $acols() specifies the (unsorted) indices along the logical dimension M of the corresponding
       non-missing values in $avals().  This is basically the same thing as:

        $vnorm = ($a**2)->xchg(0,1)->sumover->sqrt;

       ... but should be must faster to compute for sparse index-encoded piddles.

       ccs_vnorm() always clears the bad-status flag on $vnorm().

   ccs_vcos_zdd
         Signature: (
           indx ixa(2,NnzA); nza(NnzA);
           b(N);
           float+ [o]vcos(M);
           float+ [t]anorm(M);
           PDL_Indx sizeM=>M;
         )

       Computes the vector cosine similarity of a dense row-vector $b(N) with respect to each column $a(i,*) of
       a sparse index-encoded PDL $a() of logical dimensions (M,N), with output to a dense piddle $vcos(M).
       "Missing" values in $a() are treated as zero, and magnitudes for $a() are passed in the optional
       parameter $anorm(), which will be implicitly computed using ccs_vnorm if the $anorm() parameter is
       omitted or empty.  This is basically the same thing as:

        $anorm //= ($a**2)->xchg(0,1)->sumover->sqrt;
        $vcos    = ($a * $b->slice("*1,"))->xchg(0,1)->sumover / ($anorm * ($b**2)->sumover->sqrt);

       ... but should be must faster to compute.

       Output values in $vcos() are cosine similarities in the range [-1,1], except for zero-magnitude vectors
       which will result in NaN values in $vcos().  If you need non-negative distances, follow this up with a:

        $vcos->minus(1,$vcos,1)
        $vcos->inplace->setnantobad->inplace->setbadtoval(0); ##-- minimum distance for NaN values

       to get distances values in the range [0,2].  You can use PDL threading to batch-compute distances for
       multiple $b() vectors simultaneously:

         $bx   = random($N, $NB);                   ##-- get $NB random vectors of size $N
         $vcos = ccs_vcos_zdd($ixa,$nza, $bx, $M);  ##-- $vcos is now ($M,$NB)

       ccs_vcos_zdd() always clears the bad status flag on the output piddle $vcos.

   _ccs_vcos_zdd
         Signature: (
           indx ixa(Two=2,NnzA); nza(NnzA);
           b(N);
           float+ anorm(M);
           float+ [o]vcos(M);)

       Guts for ccs_vcos_zdd(), with slightly different calling conventions.

       Always clears the bad status flag on the output piddle $vcos.

   ccs_vcos_pzd
         Signature: (
           indx aptr(Nplus1); indx acols(NnzA); avals(NnzA);
           indx brows(NnzB);                     bvals(NnzB);
           anorm(M);
           float+ [o]vcos(M);)

       Computes the vector cosine similarity of a sparse index-encoded row-vector $b() of logical dimension (N)
       with respect to each column $a(i,*) a sparse Harwell-Boeing row-encoded PDL $a() of logical dimensions
       (M,N), with output to a dense piddle $vcos(M).  "Missing" values in $a() are treated as zero, and
       magnitudes for $a() are passed in the obligatory parameter $anorm().  Usually much faster than
       ccs_vcos_zdd() if a CRS pointer over logical dimension (N) is available for $a().

       ccs_vcos_pzd() always clears the bad status flag on the output piddle $vcos.

Known Bugs

       We should really implement matrix multiplication in terms of inner product, and have a good sparse-matrix
       only implementation of the former.

Name

       PDL::CCS::MatrixOps - Low-level matrix operations for compressed storage sparse PDLs

See Also

perl(1), PDL(3perl)

perl v5.40.0                                       2025-01-04                                     MatrixOps(3pm)

Synopsis

        use PDL;
        use PDL::CCS::MatrixOps;

        ##---------------------------------------------------------------------
        ## ... stuff happens

See Also