Skip to content

Conversation

@charliermarsh
Copy link
Member

Summary

Some benchmarks with Claude's help:

File Size Baseline Optimized Speedup
numpy/globals.py 3 KB 1.48 µs (1.95 GiB/s) 740 ns (3.89 GiB/s) 2.0x
unicode/pypinyin.py 4 KB 2.04 µs (2.01 GiB/s) 1.18 µs (3.49 GiB/s) 1.7x
pydantic/types.py 26 KB 13.1 µs (1.90 GiB/s) 5.88 µs (4.23 GiB/s) 2.2x
numpy/ctypeslib.py 17 KB 8.45 µs (1.92 GiB/s) 3.94 µs (4.13 GiB/s) 2.1x
large/dataset.py 41 KB 21.6 µs (1.84 GiB/s) 11.2 µs (3.55 GiB/s) 1.9x

I think that I originally thought we had to iterate character-by-character here because we needed to do the ASCII check, but the ASCII check can be vectorized by LLVM (and the "search for newlines" can be done with memchr).

@charliermarsh charliermarsh added the performance Potential performance improvement label Dec 8, 2025
@charliermarsh charliermarsh marked this pull request as ready for review December 8, 2025 02:16
@astral-sh-bot
Copy link

astral-sh-bot bot commented Dec 8, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@alex
Copy link
Contributor

alex commented Dec 8, 2025

I doubt it's worth the effort (maybe some day, when std::simd is stable...) but I imagine in principle you could merge the two operations pretty easily:

  • load chunk
  • is_ascii |= chunk & splat(0x80)
  • chunk == splat('\n') | chunk == splat('\r')
    • find the idx if it is

the win would memory locality, saving iterating 2x

@charliermarsh charliermarsh merged commit b845e81 into main Dec 8, 2025
39 checks passed
@charliermarsh charliermarsh deleted the charlie/line-index branch December 8, 2025 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Potential performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants