Use `memchr` for computing line indexes #21838

charliermarsh · 2025-12-08T02:16:20Z

Summary

Some benchmarks with Claude's help:

File	Size	Baseline	Optimized	Speedup
numpy/globals.py	3 KB	1.48 µs (1.95 GiB/s)	740 ns (3.89 GiB/s)	2.0x
unicode/pypinyin.py	4 KB	2.04 µs (2.01 GiB/s)	1.18 µs (3.49 GiB/s)	1.7x
pydantic/types.py	26 KB	13.1 µs (1.90 GiB/s)	5.88 µs (4.23 GiB/s)	2.2x
numpy/ctypeslib.py	17 KB	8.45 µs (1.92 GiB/s)	3.94 µs (4.13 GiB/s)	2.1x
large/dataset.py	41 KB	21.6 µs (1.84 GiB/s)	11.2 µs (3.55 GiB/s)	1.9x

I think that I originally thought we had to iterate character-by-character here because we needed to do the ASCII check, but the ASCII check can be vectorized by LLVM (and the "search for newlines" can be done with memchr).

astral-sh-bot · 2025-12-08T02:35:17Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

alex · 2025-12-08T04:34:24Z

I doubt it's worth the effort (maybe some day, when std::simd is stable...) but I imagine in principle you could merge the two operations pretty easily:

load chunk
is_ascii |= chunk & splat(0x80)
chunk == splat('\n') | chunk == splat('\r')
- find the idx if it is

the win would memory locality, saving iterating 2x

Use memchr for computing line indexes

05e627a

charliermarsh added the performance Potential performance improvement label Dec 8, 2025

charliermarsh marked this pull request as ready for review December 8, 2025 02:16

charliermarsh merged commit b845e81 into main Dec 8, 2025
39 checks passed

charliermarsh deleted the charlie/line-index branch December 8, 2025 13:50

BrewTestBot mentioned this pull request Dec 11, 2025

ruff 0.14.9 Homebrew/homebrew-core#258289

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use `memchr` for computing line indexes #21838

Use `memchr` for computing line indexes #21838

Uh oh!

charliermarsh commented Dec 8, 2025

Uh oh!

astral-sh-bot bot commented Dec 8, 2025

Uh oh!

alex commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use memchr for computing line indexes #21838

Use memchr for computing line indexes #21838

Uh oh!

Conversation

charliermarsh commented Dec 8, 2025

Summary

Uh oh!

astral-sh-bot bot commented Dec 8, 2025

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

alex commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use `memchr` for computing line indexes #21838

Use `memchr` for computing line indexes #21838

`ruff-ecosystem` results