Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have a way to have Time objects in pandas Series and Dataframe objects #17495

Open
Cadair opened this issue Dec 3, 2024 · 3 comments
Open

Have a way to have Time objects in pandas Series and Dataframe objects #17495

Cadair opened this issue Dec 3, 2024 · 3 comments

Comments

@Cadair
Copy link
Member

Cadair commented Dec 3, 2024

What is the problem this feature will solve?

Some heliophysics data has leap seconds, and some of it has high precision time requirements, however, a lot of people are familiar with Pandas and xarray and want to use those tools to work with their data.

Currently, we can not use astropy.time.Time objects as the index for a pandas.Dataframe or a xarray.Dataset which would allow using leap seconds and other good stuff with pandas & xarray.

I don't have a good technical understanding of how to solve this problem, so this is a pretty speculative issue. I am hoping we can get some input from the pandas side.

Describe the desired outcome

There is an equivalent to DatetimeIndex which is backed with Time instead.

This would allow us to have both timeseries as Dataframe objects using Time as the index and also allow us use Time objects as indices in xarray (I am unsure of the exact details of how xarray uses pandas Index, anyone with more understanding please chime in).

Additional context

xarray have a CFTimeIndex which I think is what I am proposing but for cftime instead of astropy Time, but unsure of the details.

This sunpy issue is one of the main motivations for this: sunpy/sunpy#5422

Bigger picture, if we had support for this, the Quantity 2 work and some other bits we should be able to make use of large chunks of the astropy ecosystem within xarray / pandas. This would make a massive difference for being able to use astropy with heliophysics data.

@Cadair
Copy link
Member Author

Cadair commented Dec 3, 2024

As a concrete code example of something which doesn't work:

pd.Series(index=[astropy.time.Time("2024-01-01")]).resample(("5Min"))

File ~/micromamba/envs/sunpy-dev/lib/python3.12/site-packages/pandas/core/generic.py:9771, in NDFrame.resample(self, rule, axis, closed, label, convention, kind, on, level, origin, offset, group_keys)
   9768 else:
   9769     convention = "start"
-> 9771 return get_resampler(
   9772     cast("Series | DataFrame", self),
   9773     freq=rule,
   9774     label=label,
   9775     closed=closed,
   9776     axis=axis,
   9777     kind=kind,
   9778     convention=convention,
   9779     key=on,
   9780     level=level,
   9781     origin=origin,
   9782     offset=offset,
   9783     group_keys=group_keys,
   9784 )

File ~/micromamba/envs/sunpy-dev/lib/python3.12/site-packages/pandas/core/resample.py:2050, in get_resampler(obj, kind, **kwds)
   2046 """
   2047 Create a TimeGrouper and return our resampler.
   2048 """
   2049 tg = TimeGrouper(obj, **kwds)  # type: ignore[arg-type]
-> 2050 return tg._get_resampler(obj, kind=kind)

File ~/micromamba/envs/sunpy-dev/lib/python3.12/site-packages/pandas/core/resample.py:2272, in TimeGrouper._get_resampler(self, obj, kind)
   2263 elif isinstance(ax, TimedeltaIndex):
   2264     return TimedeltaIndexResampler(
   2265         obj,
   2266         timegrouper=self,
   (...)
   2269         gpr_index=ax,
   2270     )
-> 2272 raise TypeError(
   2273     "Only valid with DatetimeIndex, "
   2274     "TimedeltaIndex or PeriodIndex, "
   2275     f"but got an instance of '{type(ax).__name__}'"
   2276 )

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

@taldcroft
Copy link
Member

To my mind the right answer is for those data container packages to allow non-native column types (ala astropy mixin-columns) if they satisfy the array protocol. On the astropy end, we can certainly make that happen.

@Cadair
Copy link
Member Author

Cadair commented Dec 3, 2024

I think that is one thing (and maybe pandas does a bit already) but I think specifically to use time as an index we will need to do more work to allow for things like time based resampling etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants