WhatarebadvaluesandwhyshouldIbotherwiththem?
Sometimes it's useful to be able to specify a certain value is 'bad' or 'missing'; for example CCDs used
in astronomy produce 2D images which are not perfect since certain areas contain invalid data due to
imperfections in the detector. Whilst PDL's powerful index routines and all the complicated business
with dataflow, slices, etc etc mean that these regions can be ignored in processing, it's awkward to do.
It would be much easier to be able to say "$c = $x + $y" and leave all the hassle to the computer.
If you're not interested in this, then you may (rightly) be concerned with how this affects the speed of
PDL, since the overhead of checking for a bad value at each operation can be large. Because of this, the
code has been written to be as fast as possible - particularly when operating on ndarrays which do not
contain bad values. In fact, you should notice essentially no speed difference when working with
ndarrays which do not contain bad values.
You may also ask 'well, my computer supports IEEE NaN, so I already have this'. They are different
things; a bad value signifies "leave this out of processing", whereas NaN is the result of a
mathematically-invalid operation.
Many routines, such as "y=sin(x)", will propagate NaN's without the user having to code differently, but
routines such as "qsort", or finding the median of an array, need to be re-coded to handle bad values.
For floating-point datatypes, "NaN" and "Inf" can be used to flag bad values, but by default special
values are used (Default bad values).
There is one default bad value for each datatype, but as of PDL 2.040, you can have different bad values
for separate ndarrays of the same type.
You can use "NaN" as the bad value for any floating-point type, including complex.
Aquickoverview
pdl> $x = sequence(4,3);
pdl> p $x
[
[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
]
pdl> $x = $x->setbadif( $x % 3 == 2 )
pdl> p $x
[
[ 0 1 BAD 3]
[ 4 BAD 6 7]
[BAD 9 10 BAD]
]
pdl> $x *= 3
pdl> p $x
[
[ 0 3 BAD 9]
[ 12 BAD 18 21]
[BAD 27 30 BAD]
]
pdl> p $x->sum
120
"demo bad" within perldl gives a demonstration of some of the things possible with bad values. These are
also available on PDL's web-site, at http://pdl.perl.org/demos/. See PDL::Bad for useful routines for
working with bad values and t/bad.t to see them in action.
To find out if a routine supports bad values, use the "badinfo" command in perldl or the "-b" option to
pdldoc.
Each ndarray contains a flag - accessible via "$pdl->badflag" - to say whether there's any bad data
present:
• If false/0, which means there's no bad data here, the code supplied by the "Code" option to pp_def()
is executed.
• If true/1, then this says there MAY be bad data in the ndarray, so use the code in the "BadCode"
option (assuming that the pp_def() for this routine has been updated to have a BadCode key). You get
all the advantages of broadcasting, as with the "Code" option, but it will run slower since you are
going to have to handle the presence of bad values.
If you create an ndarray, it will have its bad-value flag set to 0. To change this, use
"$pdl->badflag($new_bad_status)", where $new_bad_status can be 0 or 1. When a routine creates an
ndarray, its bad-value flag will depend on the input ndarrays: the bad-value flag will be set true if any
of the input ndarrays have the bad-value flag. To check that an ndarray really contains bad data, use
the "check_badflag" method.
NOTE: propagation of the badflag
If you change the badflag of an ndarray, this change is propagated to all the children of an ndarray, so
pdl> $x = zeroes(20,30);
pdl> $y = $x->slice('0:10,0:10');
pdl> $c = $y->slice(',(2)');
pdl> print ">>c: ", $c->badflag, "\n";
>>c: 0
pdl> $x->badflag(1);
pdl> print ">>c: ", $c->badflag, "\n";
>>c: 1
This is also propagated to the parents of an ndarray, so
pdl> print ">>a: ", $x->badflag, "\n";
>>a: 1
pdl> $c->badflag(0);
pdl> print ">>a: ", $x->badflag, "\n";
>>a: 0
There's also the issue of what happens if you change the badvalue of an ndarray - should these propagate
to children/parents (yes) or whether you should only be able to change the badvalue at the 'top' level -
i.e. those ndarrays which do not have parents.
The orig_badvalue() method returns the compile-time value for a given datatype. It works on ndarrays,
PDL::Type objects, and numbers - eg
$pdl->orig_badvalue(), byte->orig_badvalue(), and orig_badvalue(4).
To get the current bad value, use the badvalue() method - it has the same syntax as orig_badvalue().
To change the current bad value, supply the new number to badvalue - eg
$pdl->badvalue(2.3), byte->badvalue(2), badvalue(5,-3e34).
Note: the value is silently converted to the correct C type, and returned - i.e. "byte->badvalue(-26)"
returns 230 on my Linux machine.
Note that changes to the bad value are NOT propagated to previously-created ndarrays - they will still
have the bad flag set, but suddenly the elements that were bad will become 'good', but containing the old
bad value. See discussion below.
Badvaluesandbooleanoperators
For those boolean operators in PDL::Ops, evaluation on a bad value returns the bad value. This:
$mask = $img > $thresh;
correctly propagates bad values. This will omit any bad values, but return a bad value if there are no
good ones:
$bool = any( $img > $thresh );
As of 2.077, a bad value used as a boolean will throw an exception.
When using one of the 'projection' functions in PDL::Ufunc - such as orover - bad values are skipped over
(see the documentation of these functions for the current handling of the case when all elements are
bad).