Skip to content

PERF, BUG: speed up conversion of Python datetime/date to datetime64, handle duck-type datetime objects#31573

Open
eendebakpt wants to merge 2 commits into
numpy:mainfrom
eendebakpt:perf/datetime-cast-interned-attr
Open

PERF, BUG: speed up conversion of Python datetime/date to datetime64, handle duck-type datetime objects#31573
eendebakpt wants to merge 2 commits into
numpy:mainfrom
eendebakpt:perf/datetime-cast-interned-attr

Conversation

@eendebakpt

Copy link
Copy Markdown
Contributor

NpyDatetime_ConvertPyDateTimeToDatetimeStruct is the per-element worker behind casting object arrays of Python datetimes to datetime64, e.g. np.asarray(list_of_datetimes).astype("M8[us]") (the floor of matplotlib's date2num, and relevant to pandas/xarray too).

This PR makes two complementary changes:

  • Fast path: for genuine datetime.date / datetime.datetime objects
    (including subclasses such as pandas Timestamp), read the fields straight from
    the CPython C struct via the datetime C-API macros
  • Cheaper attribute access for the remaining cases: the duck-typed fallback and
    the per-element tzinfo lookup now use a single PyObject_GetOptionalAttr on
    pre-interned attribute names (added to npy_interned_str). The tzinfo
    lookup runs for every datetime

Benchmark

cast before after speedup
datetime.datetimeM8[us] ~1130 ns ~52 ns ~21x
datetime.dateM8[D] ~490 ns ~38 ns ~13x
Benchmark script and notes
import time, datetime, numpy as np

def best(fn, R=20000, trials=9):
    fn()
    ts = []
    for _ in range(trials):
        s = time.perf_counter()
        for _ in range(R):
            fn()
        ts.append(time.perf_counter() - s)
    return min(ts) / R

N = 100
dts = [datetime.datetime(2000, 1, 1) + datetime.timedelta(seconds=i * 1234567)
       for i in range(N)]
a = np.asarray(dts)
dates = [datetime.date(2000, 1, 1) + datetime.timedelta(days=i) for i in range(N)]
ad = np.asarray(dates)
base = np.arange(1000.0)  # unrelated op, sanity-check for binary-layout noise

print("dt   ->M8[us]: %6.1f ns/elem" % (best(lambda: a.astype('M8[us]')) / N * 1e9))
print("date ->M8[D] : %6.1f ns/elem" % (best(lambda: ad.astype('M8[D]')) / N * 1e9))
print("baseline sum : %6.1f ns"      % (best(lambda: base.sum()) * 1e9))

AI Disclosure

Claude code was used for profiling and implementing the selection of performance improvements selected.

eendebakpt and others added 2 commits June 5, 2026 22:33
`NpyDatetime_ConvertPyDateTimeToDatetimeStruct` is the per-element worker
behind casting object arrays of Python datetimes to datetime64 (e.g.
`np.asarray(list_of_datetimes).astype("M8[us]")`, the floor of
matplotlib's date2num).  It extracted the seven date/time fields with a
`PyObject_HasAttrString` + `PyObject_GetAttrString` pair per field, each a
generic `__getattribute__` dispatch that builds a temporary key string and
a temporary int.

Two complementary changes:

* Fast path: for genuine datetime.date / datetime.datetime objects
  (including subclasses such as pandas Timestamp), read the fields straight
  from the CPython C struct via the datetime C-API macros, with no
  attribute lookups at all.

* For the remaining attribute accesses (the duck-typed fallback and the
  per-element ``tzinfo`` lookup), use a single ``PyObject_GetOptionalAttr``
  on pre-interned attribute names (added to the ``npy_interned_str``
  registry), fusing the presence check and the lookup and avoiding the
  temporary key object.  The ``tzinfo`` lookup runs for every datetime, so
  this matters even when the fast path supplies the fields.

Both paths converge on a single shared validation tail (``has_time`` flag),
so the date/time range checks and tzinfo handling exist in one place.
Output is bit-identical to before for naive, tz-aware, and date inputs.

Regression tests cover the datetime/date subclass fast paths, the
duck-typed fallback (which real datetimes no longer exercise), invalid
dates, and the tz-aware fast path.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
The invalid_date/invalid_time error paths passed the C runtime
NPY_INT64_FMT/NPY_INT32_FMT length modifiers to PyErr_Format, which uses
PyUnicode_FromFormatV rather than the C runtime. On Windows NPY_INT64_FMT
expands to "I64d", so an invalid date/time raised SystemError: invalid
format string instead of ValueError.

Cast the fields to long long/int and use the portable %lld/%d codes.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

invalid_date:
PyErr_Format(PyExc_ValueError,
"Invalid date (%" NPY_INT64_FMT ",%" NPY_INT32_FMT ",%" NPY_INT32_FMT ") when converting to NumPy datetime",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already incorrect on main and apparently an unused path. We could consider dropping support for the duck-type datetimes.

@eendebakpt eendebakpt changed the title PERF: speed up conversion of Python datetime/date to datetime64 PERF, BUG: speed up conversion of Python datetime/date to datetime64, handle duck-type datetime objects Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant