Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precision issue with pad values (in pycdf)? #791

Open
ericthewizard opened this issue Oct 31, 2024 · 4 comments
Open

Precision issue with pad values (in pycdf)? #791

ericthewizard opened this issue Oct 31, 2024 · 4 comments

Comments

@ericthewizard
Copy link

There seems to be a precision issue with printing out pad values; I created a simple CDF with a CDF_REAL4 variable called "rle" with a standard pad value of -1.0e30

>>> from spacepy import pycdf
>>> c = pycdf.CDF("rle.cdf")
>>> rle = c['rle']
>>> rle.pad()
-1.0000000150474662e+30

This only seems to be affecting the pad values; the variable attributes FILLVAL, VALIDMIN, VALIDMAX, appear to be correct, e.g.,

>>> rle.meta
<zAttrList:
DISPLAY_TYPE: Time Series [CDF_CHAR]
FILLVAL: -1e+31 [CDF_REAL4]
SCALETYP: linear [CDF_CHAR]
VALIDMAX: -1e+31 [CDF_REAL4]
VALIDMIN: -1e+31 [CDF_REAL4]
VAR_TYPE: data [CDF_CHAR]
>>> rle.meta['FILLVAL']
np.float32(-1e+31)

Version of SpacePy

This was with SpacePy 0.6.0, installed from PyPI.

@ericthewizard ericthewizard changed the title Precision issue with pad values? Precision issue with pad values (in pycdf)? Oct 31, 2024
@jtniehof
Copy link
Member

jtniehof commented Nov 1, 2024

The tl;dr is that the value is the same, it's being rendered differently.

The difference (in addition to the fact that your pad is -1e30 and the attributes are -1e31) is that the pad value is being returned as a Python float and the attributes are being returned as numpy array scalars. numpy rounds off to fewer sigfigs than Python. What you're seeing is to the 32-bit float precision:

#!/usr/bin/env python

import spacepy.pycdf
import spacepy.pycdf.const

with spacepy.pycdf.CDF("rle.cdf", create=True) as cdf:
    rle = cdf.new("rle", type=spacepy.pycdf.const.CDF_REAL4,
                  data=[-1e31], pad=-1e31)
    rle.attrs = {
        "FILLVAL": -1e31,
        "SCALETYP": "linear",
        "VALIDMAX": -1e31,
        "VALIDMIN": -1e31,
        }
    print('pad')
    print(rle.pad())
    print(repr(rle.pad()))
    print(type(rle.pad()))
    print('FILLVAL')
    print(rle.attrs['FILLVAL'])
    print(repr(rle.attrs['FILLVAL']))
    print(type(rle.attrs['FILLVAL']))
    print(float(rle.attrs['FILLVAL']))
    print('Value')
    print(rle[0])
    print(repr(rle[0]))
    print(type(rle[0]))

If you change REAL4 to REAL8 above, the attributes, pad, and value all display as -1e31.

We special-case the return of data to strip the numpy-ness of scalars, and that was extended to the pad values. But that isn't the case for attributes. In #195 (fixing #123) I probably just hadn't run into issues with attributes so didn't think of it. I am thinking we probably should do this for attributes, which would make the problem look worse. I'd like to leave this open for the time being.

@ericthewizard
Copy link
Author

Thanks!

The reason I brought this up is because I'm looking into possibly using pycdf to make a CDF <-> JSON tool, as an alternative to our Java version (https://cdf.gsfc.nasa.gov/html/cdfjson.html), and using that in the backend for our new metadata editor. So for this case, it's important that -1e30 doesn't turn into something like -1.0000000150474662e+30 - but I can always check the data type and convert to the appropriate numpy type on my side.

@jtniehof
Copy link
Member

jtniehof commented Nov 8, 2024

#766 changed to array scalars for dmarray. Part of the justification for switching to Python objects in #195 was that's how we did things in dmarray. I think honestly array scalars have gotten a lot less annoying since #195 so that should be revisited, although I hate this back and forth. Will have to think--explicitly casting the return of pad() to an array scalar seems like excessively nitpicky consistency, but to not might be surprising.

@jtniehof
Copy link
Member

jtniehof commented Dec 3, 2024

It probably makes the most sense for that application to explicitly format the output, so not only does it not depend on the data type but also not on the numpy print options.

I'll look at the existing JSON output. It might make sense to include this directly in the SpacePy datamodel. One of the things I've been thinking about is how to include CDF-specific information (record variance, types, etc.) when CDFs are moved into other representations. It would be good to come up with a common approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants