Precision issue with pad values (in pycdf)? #791

ericthewizard · 2024-10-31T19:29:16Z

There seems to be a precision issue with printing out pad values; I created a simple CDF with a CDF_REAL4 variable called "rle" with a standard pad value of -1.0e30

>>> from spacepy import pycdf
>>> c = pycdf.CDF("rle.cdf")
>>> rle = c['rle']
>>> rle.pad()
-1.0000000150474662e+30

This only seems to be affecting the pad values; the variable attributes FILLVAL, VALIDMIN, VALIDMAX, appear to be correct, e.g.,

>>> rle.meta
<zAttrList:
DISPLAY_TYPE: Time Series [CDF_CHAR]
FILLVAL: -1e+31 [CDF_REAL4]
SCALETYP: linear [CDF_CHAR]
VALIDMAX: -1e+31 [CDF_REAL4]
VALIDMIN: -1e+31 [CDF_REAL4]
VAR_TYPE: data [CDF_CHAR]
>>> rle.meta['FILLVAL']
np.float32(-1e+31)

Version of SpacePy

This was with SpacePy 0.6.0, installed from PyPI.

jtniehof · 2024-11-01T15:13:29Z

The tl;dr is that the value is the same, it's being rendered differently.

The difference (in addition to the fact that your pad is -1e30 and the attributes are -1e31) is that the pad value is being returned as a Python float and the attributes are being returned as numpy array scalars. numpy rounds off to fewer sigfigs than Python. What you're seeing is to the 32-bit float precision:

#!/usr/bin/env python

import spacepy.pycdf
import spacepy.pycdf.const

with spacepy.pycdf.CDF("rle.cdf", create=True) as cdf:
    rle = cdf.new("rle", type=spacepy.pycdf.const.CDF_REAL4,
                  data=[-1e31], pad=-1e31)
    rle.attrs = {
        "FILLVAL": -1e31,
        "SCALETYP": "linear",
        "VALIDMAX": -1e31,
        "VALIDMIN": -1e31,
        }
    print('pad')
    print(rle.pad())
    print(repr(rle.pad()))
    print(type(rle.pad()))
    print('FILLVAL')
    print(rle.attrs['FILLVAL'])
    print(repr(rle.attrs['FILLVAL']))
    print(type(rle.attrs['FILLVAL']))
    print(float(rle.attrs['FILLVAL']))
    print('Value')
    print(rle[0])
    print(repr(rle[0]))
    print(type(rle[0]))

If you change REAL4 to REAL8 above, the attributes, pad, and value all display as -1e31.

We special-case the return of data to strip the numpy-ness of scalars, and that was extended to the pad values. But that isn't the case for attributes. In #195 (fixing #123) I probably just hadn't run into issues with attributes so didn't think of it. I am thinking we probably should do this for attributes, which would make the problem look worse. I'd like to leave this open for the time being.

ericthewizard · 2024-11-04T19:05:38Z

Thanks!

The reason I brought this up is because I'm looking into possibly using pycdf to make a CDF <-> JSON tool, as an alternative to our Java version (https://cdf.gsfc.nasa.gov/html/cdfjson.html), and using that in the backend for our new metadata editor. So for this case, it's important that -1e30 doesn't turn into something like -1.0000000150474662e+30 - but I can always check the data type and convert to the appropriate numpy type on my side.

jtniehof · 2024-11-08T19:05:59Z

#766 changed to array scalars for dmarray. Part of the justification for switching to Python objects in #195 was that's how we did things in dmarray. I think honestly array scalars have gotten a lot less annoying since #195 so that should be revisited, although I hate this back and forth. Will have to think--explicitly casting the return of pad() to an array scalar seems like excessively nitpicky consistency, but to not might be surprising.

jtniehof · 2024-12-03T18:45:10Z

It probably makes the most sense for that application to explicitly format the output, so not only does it not depend on the data type but also not on the numpy print options.

I'll look at the existing JSON output. It might make sense to include this directly in the SpacePy datamodel. One of the things I've been thinking about is how to include CDF-specific information (record variance, types, etc.) when CDFs are moved into other representations. It would be good to come up with a common approach.

ericthewizard changed the title ~~Precision issue with pad values?~~ Precision issue with pad values (in pycdf)? Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precision issue with pad values (in pycdf)? #791

Precision issue with pad values (in pycdf)? #791

ericthewizard commented Oct 31, 2024

jtniehof commented Nov 1, 2024

ericthewizard commented Nov 4, 2024

jtniehof commented Nov 8, 2024

jtniehof commented Dec 3, 2024

Precision issue with pad values (in pycdf)? #791

Precision issue with pad values (in pycdf)? #791

Comments

ericthewizard commented Oct 31, 2024

Version of SpacePy

jtniehof commented Nov 1, 2024

ericthewizard commented Nov 4, 2024

jtniehof commented Nov 8, 2024

jtniehof commented Dec 3, 2024