PDL::BadValues - Discussion of bad value support in PDL

Author

       Copyright (C) Doug Burke (djburke@cpan.org), 2000, 2006.

       The per-ndarray bad value support is by Heiko Klein (2006).

perl v5.40.1                                       2025-03-27                                PDL::BadValues(3pm)

Description

WhatarebadvaluesandwhyshouldIbotherwiththem?
       Sometimes it's useful to be able to specify a certain value is 'bad' or 'missing'; for example CCDs used
       in astronomy produce 2D images which are not perfect since certain areas contain invalid data due to
       imperfections in the detector.  Whilst PDL's powerful index routines and all the complicated business
       with dataflow, slices, etc etc mean that these regions can be ignored in processing, it's awkward to do.
       It would be much easier to be able to say "$c = $x + $y" and leave all the hassle to the computer.

       If you're not interested in this, then you may (rightly) be concerned with how this affects the speed of
       PDL, since the overhead of checking for a bad value at each operation can be large.  Because of this, the
       code has been written to be as fast as possible - particularly when operating on ndarrays which do not
       contain bad values.  In fact, you should notice essentially no speed difference when working with
       ndarrays which do not contain bad values.

       You may also ask 'well, my computer supports IEEE NaN, so I already have this'.  They are different
       things; a bad value signifies "leave this out of processing", whereas NaN is the result of a
       mathematically-invalid operation.

       Many routines, such as "y=sin(x)", will propagate NaN's without the user having to code differently, but
       routines such as "qsort", or finding the median of an array, need to be re-coded to handle bad values.
       For floating-point datatypes, "NaN" and "Inf" can be used to flag bad values, but by default special
       values are used (Default bad values).

       There is one default bad value for each datatype, but as of PDL 2.040, you can have different bad values
       for separate ndarrays of the same type.

       You can use "NaN" as the bad value for any floating-point type, including complex.

   Aquickoverview
        pdl> $x = sequence(4,3);
        pdl> p $x
        [
         [ 0  1  2  3]
         [ 4  5  6  7]
         [ 8  9 10 11]
        ]
        pdl> $x = $x->setbadif( $x % 3 == 2 )
        pdl> p $x
        [
         [  0   1 BAD   3]
         [  4 BAD   6   7]
         [BAD   9  10 BAD]
        ]
        pdl> $x *= 3
        pdl> p $x
        [
         [  0   3 BAD   9]
         [ 12 BAD  18  21]
         [BAD  27  30 BAD]
        ]
        pdl> p $x->sum
        120

       "demo bad" within perldl gives a demonstration of some of the things possible with bad values.  These are
       also available on PDL's web-site, at http://pdl.perl.org/demos/.  See PDL::Bad for useful routines for
       working with bad values and t/bad.t to see them in action.

       To find out if a routine supports bad values, use the "badinfo" command in perldl or the "-b" option to
       pdldoc.

       Each ndarray contains a flag - accessible via "$pdl->badflag" - to say whether there's any bad data
       present:

       •   If  false/0, which means there's no bad data here, the code supplied by the "Code" option to pp_def()
           is executed.

       •   If true/1, then this says there MAY be bad data in the ndarray, so use  the  code  in  the  "BadCode"
           option (assuming that the pp_def() for this routine has been updated to have a BadCode key).  You get
           all  the  advantages of broadcasting, as with the "Code" option, but it will run slower since you are
           going to have to handle the presence of bad values.

       If  you  create  an  ndarray,  it  will  have  its  bad-value  flag  set  to  0.  To  change  this,   use
       "$pdl->badflag($new_bad_status)",  where  $new_bad_status  can  be  0  or  1.   When a routine creates an
       ndarray, its bad-value flag will depend on the input ndarrays: the bad-value flag will be set true if any
       of the input ndarrays have the bad-value flag.  To check that an ndarray really contains  bad  data,  use
       the "check_badflag" method.

       NOTE: propagation of the badflag

       If you change the badflag of an ndarray, this change is propagated to all the children of an ndarray, so

          pdl> $x = zeroes(20,30);
          pdl> $y = $x->slice('0:10,0:10');
          pdl> $c = $y->slice(',(2)');
          pdl> print ">>c: ", $c->badflag, "\n";
          >>c: 0
          pdl> $x->badflag(1);
          pdl> print ">>c: ", $c->badflag, "\n";
          >>c: 1

       This is also propagated to the parents of an ndarray, so

          pdl> print ">>a: ", $x->badflag, "\n";
          >>a: 1
          pdl> $c->badflag(0);
          pdl> print ">>a: ", $x->badflag, "\n";
          >>a: 0

       There's  also the issue of what happens if you change the badvalue of an ndarray - should these propagate
       to children/parents (yes) or whether you should only be able to change the badvalue at the 'top' level  -
       i.e. those ndarrays which do not have parents.

       The  orig_badvalue()  method  returns  the compile-time value for a given datatype. It works on ndarrays,
       PDL::Type objects, and numbers - eg

         $pdl->orig_badvalue(), byte->orig_badvalue(), and orig_badvalue(4).

       To get the current bad value, use the badvalue() method - it has the same syntax as orig_badvalue().

       To change the current bad value, supply the new number to badvalue - eg

         $pdl->badvalue(2.3), byte->badvalue(2), badvalue(5,-3e34).

       Note: the value is silently converted to the correct C type, and returned  -  i.e.  "byte->badvalue(-26)"
       returns 230 on my Linux machine.

       Note  that  changes  to the bad value are NOT propagated to previously-created ndarrays - they will still
       have the bad flag set, but suddenly the elements that were bad will become 'good', but containing the old
       bad value.  See discussion below.

   Badvaluesandbooleanoperators
       For those boolean operators in PDL::Ops, evaluation on a bad value returns the bad value. This:

        $mask = $img > $thresh;

       correctly propagates bad values. This will omit any bad values, but return a bad value if  there  are  no
       good ones:

        $bool = any( $img > $thresh );

       As of 2.077, a bad value used as a boolean will throw an exception.

       When using one of the 'projection' functions in PDL::Ufunc - such as orover - bad values are skipped over
       (see  the  documentation  of  these  functions for the current handling of the case when all elements are
       bad).

Implementation Details

       A new flag has been added to the state of an ndarray - "PDL_BADVAL". If unset, then the ndarray does  not
       contain  bad  values,  and so all the support code can be ignored. If set, it does not guarantee that bad
       values are present, just that they should be checked for.

       The "pdl_trans" structure has been extended to include an integer value,  "bvalflag",  which  acts  as  a
       switch  to  tell  the  code  whether  to  handle bad values or not. This value is set if any of the input
       ndarrays  have  their  "PDL_BADVAL"  flag  set  (although  this  code  can   be   replaced   by   setting
       "FindBadStateCode" in pp_def).

       The per-ndarray badvalue is a "PDL_Anyval", a struct with an integer-valued type (like an ndarray), and a
       union-value  that can hold one of any value that PDL can handle. As of 2.086, this value must always have
       the same type as the ndarray, and this is enforced in the PDL core.

   Defaultbadvalues
       The default bad values are now stored in a structure within the Core  PDL  structure  -  "PDL.bvals"  (eg
       lib/PDL/Core/pdlcore.h);  see  also  "typedef  pdl_badvals" in lib/PDL/Core/pdl.h.PL and the BOOT code of
       lib/PDL/Core.xs where the values are initialised to  (hopefully)  sensible  values.   See  "badvalue"  in
       PDL::Bad and "orig_badvalue" in PDL::Bad for read/write routines to the values.

       The  default/original  bad  values  are  set  to  the C type's maximum (unsigned integers) or the minimum
       (floating-point and signed integers).

   HowdoIchangearoutinetohandlebadvalues?
       See "BadCode" in PDL::PP and "HandleBad" in PDL::PP.

       If you have a routine that you want to be able to use as in-place, look at the  routines  in  bad.pd  (or
       ops.pd)  which  use  the  "in-place"  option  to see how the bad flag is propagated to children using the
       "xxxBadStatusCode" options.  I decided not to automate this as rules would be a little complex, since not
       every in-place op will need to propagate the badflag (eg unary functions).

       This all means that you can change (as of 2.073)

          Code => '$a() = $b() + $c();'

       to

          Code => 'PDL_IF_BAD(if ( $ISBAD(b()) || $ISBAD(c()) ) {
                        $SETBAD(a());
                      } else,) {
                        $a() = $b() + $c();
                      }'

       which means you only need to maintain the logic in one place, rather than having  a  separate  "BadCode".
       "PDL_IF_BAD"  evaluates  to  its  first  argument  if  the "badvalue" flag is true, or its second if not.
       PDL::PP::PDLCode will then create code something like

          if ( __trans->bvalflag ) {
               broadcastloop over Code with PDL_IF_BAD set appropriately
          } else {
               broadcastloop over Code with PDL_IF_BAD set appropriately
          }

Name

       PDL::BadValues - Discussion of bad value support in PDL

What About Documentation?

       One of the strengths of PDL is its on-line documentation. The aim  is  to  use  this  system  to  provide
       information  on how/if a routine supports bad values: in many cases pp_def() contains all the information
       anyway, so the function-writer doesn't need to do anything at  all!  For  the  cases  when  this  is  not
       sufficient,  there's  the  "BadDoc" option. For code written at the Perl level - i.e. in a .pm file - use
       the "=for bad" pod directive.

       This information will be available via man/pod2man/html documentation.  It's  also  accessible  from  the
       "perldl" shell - using the "badinfo" command - and the "pdldoc" shell command - using the "-b" option.

PDL::BadValues - Discussion of bad value support in PDL

Contents

Author

Description

Implementation Details

Name

What About Documentation?

See Also