PERF, BUG: speed up conversion of Python datetime/date to datetime64, handle duck-type datetime objects#31573
Open
eendebakpt wants to merge 2 commits into
Open
Conversation
`NpyDatetime_ConvertPyDateTimeToDatetimeStruct` is the per-element worker
behind casting object arrays of Python datetimes to datetime64 (e.g.
`np.asarray(list_of_datetimes).astype("M8[us]")`, the floor of
matplotlib's date2num). It extracted the seven date/time fields with a
`PyObject_HasAttrString` + `PyObject_GetAttrString` pair per field, each a
generic `__getattribute__` dispatch that builds a temporary key string and
a temporary int.
Two complementary changes:
* Fast path: for genuine datetime.date / datetime.datetime objects
(including subclasses such as pandas Timestamp), read the fields straight
from the CPython C struct via the datetime C-API macros, with no
attribute lookups at all.
* For the remaining attribute accesses (the duck-typed fallback and the
per-element ``tzinfo`` lookup), use a single ``PyObject_GetOptionalAttr``
on pre-interned attribute names (added to the ``npy_interned_str``
registry), fusing the presence check and the lookup and avoiding the
temporary key object. The ``tzinfo`` lookup runs for every datetime, so
this matters even when the fast path supplies the fields.
Both paths converge on a single shared validation tail (``has_time`` flag),
so the date/time range checks and tzinfo handling exist in one place.
Output is bit-identical to before for naive, tz-aware, and date inputs.
Regression tests cover the datetime/date subclass fast paths, the
duck-typed fallback (which real datetimes no longer exercise), invalid
dates, and the tz-aware fast path.
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
The invalid_date/invalid_time error paths passed the C runtime NPY_INT64_FMT/NPY_INT32_FMT length modifiers to PyErr_Format, which uses PyUnicode_FromFormatV rather than the C runtime. On Windows NPY_INT64_FMT expands to "I64d", so an invalid date/time raised SystemError: invalid format string instead of ValueError. Cast the fields to long long/int and use the portable %lld/%d codes. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
eendebakpt
commented
Jun 8, 2026
|
|
||
| invalid_date: | ||
| PyErr_Format(PyExc_ValueError, | ||
| "Invalid date (%" NPY_INT64_FMT ",%" NPY_INT32_FMT ",%" NPY_INT32_FMT ") when converting to NumPy datetime", |
Contributor
Author
There was a problem hiding this comment.
This was already incorrect on main and apparently an unused path. We could consider dropping support for the duck-type datetimes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NpyDatetime_ConvertPyDateTimeToDatetimeStructis the per-element worker behind casting object arrays of Python datetimes todatetime64, e.g.np.asarray(list_of_datetimes).astype("M8[us]")(the floor of matplotlib'sdate2num, and relevant to pandas/xarray too).This PR makes two complementary changes:
datetime.date/datetime.datetimeobjects(including subclasses such as pandas
Timestamp), read the fields straight fromthe CPython C struct via the datetime C-API macros
the per-element
tzinfolookup now use a singlePyObject_GetOptionalAttronpre-interned attribute names (added to
npy_interned_str). Thetzinfolookup runs for every datetime
Benchmark
datetime.datetime→M8[us]datetime.date→M8[D]Benchmark script and notes
AI Disclosure
Claude code was used for profiling and implementing the selection of performance improvements selected.