Skip to content

css_lexer: Rewrite to use Bytes, memchr#907

Merged
keithamus merged 1 commit intomainfrom
lex-perf
Feb 27, 2026
Merged

css_lexer: Rewrite to use Bytes, memchr#907
keithamus merged 1 commit intomainfrom
lex-perf

Conversation

@keithamus
Copy link
Copy Markdown
Member

This change significantly rewrites the Lexer to use a ByteCursor struct instead
of Chars. This is useful for 2 reasons: 1. Chars can be a little slower than
comparing bytes, when we know the input to be UTF-8, but more importantly 2.
With a slice of bytes we can start using memchr, which uses SIMD operations so
we don't have to, and switching between Bytes/Chars hampers the performance
benefits of using memchr.

During this rewrite I also picked up a couple of other chances for optimisation,
such as fast-path integer parsing, and improving the scanning of
strings/URLs/comments, plus a bunch of small changes like inline hints.

This change is purely performance oriented, and functionality should be
identical.

This change significantly rewrites the Lexer to use a ByteCursor struct instead
of Chars. This is useful for 2 reasons: 1. Chars can be a little slower than
comparing bytes, when we know the input to be UTF-8, but more importantly 2.
With a slice of bytes we can start using memchr, which uses SIMD operations so
we don't have to, and switching between Bytes/Chars hampers the performance
benefits of using memchr.

During this rewrite I also picked up a couple of other chances for optimisation,
such as fast-path integer parsing, and improving the scanning of
strings/URLs/comments, plus a bunch of small changes like inline hints.

This change is purely performance oriented, and functionality should be
identical.
@keithamus keithamus enabled auto-merge (squash) February 27, 2026 22:56
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All CI jobs have passed. Approving.

@keithamus keithamus merged commit 30b187c into main Feb 27, 2026
17 checks passed
@keithamus keithamus deleted the lex-perf branch February 27, 2026 23:03
github-actions bot pushed a commit that referenced this pull request Feb 27, 2026
## [0.0.18] - 2026-02-27

### Other Changes
- Chore(deps): update rust crate toml to v1 (#875) ([#875](#875))
- Update css-tokenizer-tests (#887) ([#887](#887))
- Chore(deps): update dependencies (patch) (#900) ([#900](#900))

### Chromashift
- chromashift: Update to support wide-gamut color spaces, out of gamut colors (#877) ([#877](#877))
- chromashift: Implement css-color-4 raytrace gamut mapping (#879) ([#879](#879))
- chromashift: Increase precision of display-p3 to sRGB conversion (#897) ([#897](#897))

### Css_ast
- css_ast: Implement gap rule list (#878) ([#878](#878))
- Regenerate css_ast/src/values from csswg drafts (#881) ([#881](#881))
- css_ast: Implement ColorMix (#889) ([#889](#889))
- css_ast: Boxup more types (#890) ([#890](#890))
- Implement ColorFunctionColor::to_chromashift() and extract shortest_color helper (#898) ([#898](#898))
- Regenerate css_ast/src/values from csswg drafts (#899) ([#899](#899))

### Css_lexer
- css_lexer: Introduce UnicodeRange token kind. (#891) ([#891](#891))
- css_lexer: Add "Bad" tokens. (#892) ([#892](#892))
- css_lexer: Re-encode escapes in strings if they're invalid for unescaping (#894) ([#894](#894))
- css_lexer: Re-encode escapes in idents if they're invalid for unescaping (#896) ([#896](#896))
- css_lexer: Rewrite to use Bytes, memchr (#907) ([#907](#907))

### Css_parse
- css_parse: Add BumpBox type and blanket trait impls for Box types (#888) ([#888](#888))

### Csskit
- chore(deps): update dependencies (patch) (#882) ([#882](#882))

### Csskit_transform
- csskit_transform: introduce css-minify-tests (#886) ([#886](#886))
- csskit_transform: snapshot css-minify-tests failures instead of asserting count (#893) ([#893](#893))

### Csskit_vscode
- chore(deps): update dependencies (minor) (#884) ([#884](#884))
- chore(deps): update dependencies to v1.49.0 (minor) (#901) ([#901](#901))

[forcebuild]
@keithamus keithamus mentioned this pull request Feb 27, 2026
github-actions bot pushed a commit that referenced this pull request Feb 27, 2026
## [0.0.18] - 2026-02-27

### Other Changes
- Chore(deps): update rust crate toml to v1 (#875) ([#875](#875))
- Update css-tokenizer-tests (#887) ([#887](#887))
- Chore(deps): update dependencies (patch) (#900) ([#900](#900))

### Chromashift
- chromashift: Update to support wide-gamut color spaces, out of gamut colors (#877) ([#877](#877))
- chromashift: Implement css-color-4 raytrace gamut mapping (#879) ([#879](#879))
- chromashift: Increase precision of display-p3 to sRGB conversion (#897) ([#897](#897))

### Css_ast
- css_ast: Implement gap rule list (#878) ([#878](#878))
- Regenerate css_ast/src/values from csswg drafts (#881) ([#881](#881))
- css_ast: Implement ColorMix (#889) ([#889](#889))
- css_ast: Boxup more types (#890) ([#890](#890))
- Implement ColorFunctionColor::to_chromashift() and extract shortest_color helper (#898) ([#898](#898))
- Regenerate css_ast/src/values from csswg drafts (#899) ([#899](#899))

### Css_lexer
- css_lexer: Introduce UnicodeRange token kind. (#891) ([#891](#891))
- css_lexer: Add "Bad" tokens. (#892) ([#892](#892))
- css_lexer: Re-encode escapes in strings if they're invalid for unescaping (#894) ([#894](#894))
- css_lexer: Re-encode escapes in idents if they're invalid for unescaping (#896) ([#896](#896))
- css_lexer: Rewrite to use Bytes, memchr (#907) ([#907](#907))

### Css_parse
- css_parse: Add BumpBox type and blanket trait impls for Box types (#888) ([#888](#888))

### Csskit
- chore(deps): update dependencies (patch) (#882) ([#882](#882))

### Csskit_transform
- csskit_transform: introduce css-minify-tests (#886) ([#886](#886))
- csskit_transform: snapshot css-minify-tests failures instead of asserting count (#893) ([#893](#893))

### Csskit_vscode
- chore(deps): update dependencies (minor) (#884) ([#884](#884))
- chore(deps): update dependencies to v1.49.0 (minor) (#901) ([#901](#901))
github-actions bot pushed a commit that referenced this pull request Mar 1, 2026
## [0.0.18] - 2026-03-01

### Other Changes
- Chore(deps): update rust crate toml to v1 (#875) ([#875](#875))
- Update css-tokenizer-tests (#887) ([#887](#887))
- Chore(deps): update dependencies (patch) (#900) ([#900](#900))

### Chromashift
- chromashift: Update to support wide-gamut color spaces, out of gamut colors (#877) ([#877](#877))
- chromashift: Implement css-color-4 raytrace gamut mapping (#879) ([#879](#879))
- chromashift: Increase precision of display-p3 to sRGB conversion (#897) ([#897](#897))

### Css_ast
- css_ast: Implement gap rule list (#878) ([#878](#878))
- Regenerate css_ast/src/values from csswg drafts (#881) ([#881](#881))
- css_ast: Implement ColorMix (#889) ([#889](#889))
- css_ast: Boxup more types (#890) ([#890](#890))
- Implement ColorFunctionColor::to_chromashift() and extract shortest_color helper (#898) ([#898](#898))
- Regenerate css_ast/src/values from csswg drafts (#899) ([#899](#899))

### Css_lexer
- css_lexer: Introduce UnicodeRange token kind. (#891) ([#891](#891))
- css_lexer: Add "Bad" tokens. (#892) ([#892](#892))
- css_lexer: Re-encode escapes in strings if they're invalid for unescaping (#894) ([#894](#894))
- css_lexer: Re-encode escapes in idents if they're invalid for unescaping (#896) ([#896](#896))
- css_lexer: Rewrite to use Bytes, memchr (#907) ([#907](#907))

### Css_parse
- css_parse: Add BumpBox type and blanket trait impls for Box types (#888) ([#888](#888))
- css_parse: flush trailing semicolons in CursorCompactWriteSink on drop (#908) ([#908](#908))

### Csskit
- chore(deps): update dependencies (patch) (#882) ([#882](#882))

### Csskit_transform
- csskit_transform: introduce css-minify-tests (#886) ([#886](#886))
- csskit_transform: snapshot css-minify-tests failures instead of asserting count (#893) ([#893](#893))

### Csskit_vscode
- chore(deps): update dependencies (minor) (#884) ([#884](#884))
- chore(deps): update dependencies to v1.49.0 (minor) (#901) ([#901](#901))

[forcebuild]
github-actions bot pushed a commit that referenced this pull request Mar 1, 2026
## [0.0.18] - 2026-03-01

### Other Changes
- Chore(deps): update rust crate toml to v1 (#875) ([#875](#875))
- Update css-tokenizer-tests (#887) ([#887](#887))
- Chore(deps): update dependencies (patch) (#900) ([#900](#900))

### Chromashift
- chromashift: Update to support wide-gamut color spaces, out of gamut colors (#877) ([#877](#877))
- chromashift: Implement css-color-4 raytrace gamut mapping (#879) ([#879](#879))
- chromashift: Increase precision of display-p3 to sRGB conversion (#897) ([#897](#897))

### Css_ast
- css_ast: Implement gap rule list (#878) ([#878](#878))
- Regenerate css_ast/src/values from csswg drafts (#881) ([#881](#881))
- css_ast: Implement ColorMix (#889) ([#889](#889))
- css_ast: Boxup more types (#890) ([#890](#890))
- Implement ColorFunctionColor::to_chromashift() and extract shortest_color helper (#898) ([#898](#898))
- Regenerate css_ast/src/values from csswg drafts (#899) ([#899](#899))

### Css_lexer
- css_lexer: Introduce UnicodeRange token kind. (#891) ([#891](#891))
- css_lexer: Add "Bad" tokens. (#892) ([#892](#892))
- css_lexer: Re-encode escapes in strings if they're invalid for unescaping (#894) ([#894](#894))
- css_lexer: Re-encode escapes in idents if they're invalid for unescaping (#896) ([#896](#896))
- css_lexer: Rewrite to use Bytes, memchr (#907) ([#907](#907))

### Css_parse
- css_parse: Add BumpBox type and blanket trait impls for Box types (#888) ([#888](#888))
- css_parse: flush trailing semicolons in CursorCompactWriteSink on drop (#908) ([#908](#908))

### Csskit
- chore(deps): update dependencies (patch) (#882) ([#882](#882))

### Csskit_transform
- csskit_transform: introduce css-minify-tests (#886) ([#886](#886))
- csskit_transform: snapshot css-minify-tests failures instead of asserting count (#893) ([#893](#893))

### Csskit_vscode
- chore(deps): update dependencies (minor) (#884) ([#884](#884))
- chore(deps): update dependencies to v1.49.0 (minor) (#901) ([#901](#901))
github-actions bot pushed a commit that referenced this pull request Mar 1, 2026
## [0.0.18] - 2026-03-01

### Other Changes
- Chore(deps): update rust crate toml to v1 (#875) ([#875](#875))
- Update css-tokenizer-tests (#887) ([#887](#887))
- Chore(deps): update dependencies (patch) (#900) ([#900](#900))

### Chromashift
- chromashift: Update to support wide-gamut color spaces, out of gamut colors (#877) ([#877](#877))
- chromashift: Implement css-color-4 raytrace gamut mapping (#879) ([#879](#879))
- chromashift: Increase precision of display-p3 to sRGB conversion (#897) ([#897](#897))

### Css_ast
- css_ast: Implement gap rule list (#878) ([#878](#878))
- Regenerate css_ast/src/values from csswg drafts (#881) ([#881](#881))
- css_ast: Implement ColorMix (#889) ([#889](#889))
- css_ast: Boxup more types (#890) ([#890](#890))
- Implement ColorFunctionColor::to_chromashift() and extract shortest_color helper (#898) ([#898](#898))
- Regenerate css_ast/src/values from csswg drafts (#899) ([#899](#899))
- css_ast: Remove StringOrUrl type (#909) ([#909](#909))

### Css_lexer
- css_lexer: Introduce UnicodeRange token kind. (#891) ([#891](#891))
- css_lexer: Add "Bad" tokens. (#892) ([#892](#892))
- css_lexer: Re-encode escapes in strings if they're invalid for unescaping (#894) ([#894](#894))
- css_lexer: Re-encode escapes in idents if they're invalid for unescaping (#896) ([#896](#896))
- css_lexer: Rewrite to use Bytes, memchr (#907) ([#907](#907))

### Css_parse
- css_parse: Add BumpBox type and blanket trait impls for Box types (#888) ([#888](#888))
- css_parse: flush trailing semicolons in CursorCompactWriteSink on drop (#908) ([#908](#908))

### Csskit
- chore(deps): update dependencies (patch) (#882) ([#882](#882))

### Csskit_transform
- csskit_transform: introduce css-minify-tests (#886) ([#886](#886))
- csskit_transform: snapshot css-minify-tests failures instead of asserting count (#893) ([#893](#893))

### Csskit_vscode
- chore(deps): update dependencies (minor) (#884) ([#884](#884))
- chore(deps): update dependencies to v1.49.0 (minor) (#901) ([#901](#901))

[forcebuild]
github-actions bot pushed a commit that referenced this pull request Mar 1, 2026
## [0.0.18] - 2026-03-01

### Other Changes
- Chore(deps): update rust crate toml to v1 (#875) ([#875](#875))
- Update css-tokenizer-tests (#887) ([#887](#887))
- Chore(deps): update dependencies (patch) (#900) ([#900](#900))

### Chromashift
- chromashift: Update to support wide-gamut color spaces, out of gamut colors (#877) ([#877](#877))
- chromashift: Implement css-color-4 raytrace gamut mapping (#879) ([#879](#879))
- chromashift: Increase precision of display-p3 to sRGB conversion (#897) ([#897](#897))

### Css_ast
- css_ast: Implement gap rule list (#878) ([#878](#878))
- Regenerate css_ast/src/values from csswg drafts (#881) ([#881](#881))
- css_ast: Implement ColorMix (#889) ([#889](#889))
- css_ast: Boxup more types (#890) ([#890](#890))
- Implement ColorFunctionColor::to_chromashift() and extract shortest_color helper (#898) ([#898](#898))
- Regenerate css_ast/src/values from csswg drafts (#899) ([#899](#899))
- css_ast: Remove StringOrUrl type (#909) ([#909](#909))

### Css_lexer
- css_lexer: Introduce UnicodeRange token kind. (#891) ([#891](#891))
- css_lexer: Add "Bad" tokens. (#892) ([#892](#892))
- css_lexer: Re-encode escapes in strings if they're invalid for unescaping (#894) ([#894](#894))
- css_lexer: Re-encode escapes in idents if they're invalid for unescaping (#896) ([#896](#896))
- css_lexer: Rewrite to use Bytes, memchr (#907) ([#907](#907))

### Css_parse
- css_parse: Add BumpBox type and blanket trait impls for Box types (#888) ([#888](#888))
- css_parse: flush trailing semicolons in CursorCompactWriteSink on drop (#908) ([#908](#908))

### Csskit
- chore(deps): update dependencies (patch) (#882) ([#882](#882))

### Csskit_transform
- csskit_transform: introduce css-minify-tests (#886) ([#886](#886))
- csskit_transform: snapshot css-minify-tests failures instead of asserting count (#893) ([#893](#893))

### Csskit_vscode
- chore(deps): update dependencies (minor) (#884) ([#884](#884))
- chore(deps): update dependencies to v1.49.0 (minor) (#901) ([#901](#901))
github-actions bot pushed a commit that referenced this pull request Mar 1, 2026
## [0.0.18] - 2026-03-01

### Other Changes
- Chore(deps): update rust crate toml to v1 (#875) ([#875](#875))
- Update css-tokenizer-tests (#887) ([#887](#887))
- Chore(deps): update dependencies (patch) (#900) ([#900](#900))

### Chromashift
- chromashift: Update to support wide-gamut color spaces, out of gamut colors (#877) ([#877](#877))
- chromashift: Implement css-color-4 raytrace gamut mapping (#879) ([#879](#879))
- chromashift: Increase precision of display-p3 to sRGB conversion (#897) ([#897](#897))

### Css_ast
- css_ast: Implement gap rule list (#878) ([#878](#878))
- Regenerate css_ast/src/values from csswg drafts (#881) ([#881](#881))
- css_ast: Implement ColorMix (#889) ([#889](#889))
- css_ast: Boxup more types (#890) ([#890](#890))
- Implement ColorFunctionColor::to_chromashift() and extract shortest_color helper (#898) ([#898](#898))
- Regenerate css_ast/src/values from csswg drafts (#899) ([#899](#899))
- css_ast: Remove StringOrUrl type (#909) ([#909](#909))

### Css_lexer
- css_lexer: Introduce UnicodeRange token kind. (#891) ([#891](#891))
- css_lexer: Add "Bad" tokens. (#892) ([#892](#892))
- css_lexer: Re-encode escapes in strings if they're invalid for unescaping (#894) ([#894](#894))
- css_lexer: Re-encode escapes in idents if they're invalid for unescaping (#896) ([#896](#896))
- css_lexer: Rewrite to use Bytes, memchr (#907) ([#907](#907))

### Css_parse
- css_parse: Add BumpBox type and blanket trait impls for Box types (#888) ([#888](#888))
- css_parse: flush trailing semicolons in CursorCompactWriteSink on drop (#908) ([#908](#908))

### Csskit
- chore(deps): update dependencies (patch) (#882) ([#882](#882))

### Csskit_transform
- csskit_transform: introduce css-minify-tests (#886) ([#886](#886))
- csskit_transform: snapshot css-minify-tests failures instead of asserting count (#893) ([#893](#893))
- csskit_transform: reduce Urls to Strings where possible (#910) ([#910](#910))

### Csskit_vscode
- chore(deps): update dependencies (minor) (#884) ([#884](#884))
- chore(deps): update dependencies to v1.49.0 (minor) (#901) ([#901](#901))

[forcebuild]
github-actions bot pushed a commit that referenced this pull request Mar 1, 2026
## [0.0.18] - 2026-03-01

### Other Changes
- Chore(deps): update rust crate toml to v1 (#875) ([#875](#875))
- Update css-tokenizer-tests (#887) ([#887](#887))
- Chore(deps): update dependencies (patch) (#900) ([#900](#900))

### Chromashift
- chromashift: Update to support wide-gamut color spaces, out of gamut colors (#877) ([#877](#877))
- chromashift: Implement css-color-4 raytrace gamut mapping (#879) ([#879](#879))
- chromashift: Increase precision of display-p3 to sRGB conversion (#897) ([#897](#897))

### Css_ast
- css_ast: Implement gap rule list (#878) ([#878](#878))
- Regenerate css_ast/src/values from csswg drafts (#881) ([#881](#881))
- css_ast: Implement ColorMix (#889) ([#889](#889))
- css_ast: Boxup more types (#890) ([#890](#890))
- Implement ColorFunctionColor::to_chromashift() and extract shortest_color helper (#898) ([#898](#898))
- Regenerate css_ast/src/values from csswg drafts (#899) ([#899](#899))
- css_ast: Remove StringOrUrl type (#909) ([#909](#909))

### Css_lexer
- css_lexer: Introduce UnicodeRange token kind. (#891) ([#891](#891))
- css_lexer: Add "Bad" tokens. (#892) ([#892](#892))
- css_lexer: Re-encode escapes in strings if they're invalid for unescaping (#894) ([#894](#894))
- css_lexer: Re-encode escapes in idents if they're invalid for unescaping (#896) ([#896](#896))
- css_lexer: Rewrite to use Bytes, memchr (#907) ([#907](#907))

### Css_parse
- css_parse: Add BumpBox type and blanket trait impls for Box types (#888) ([#888](#888))
- css_parse: flush trailing semicolons in CursorCompactWriteSink on drop (#908) ([#908](#908))

### Csskit
- chore(deps): update dependencies (patch) (#882) ([#882](#882))

### Csskit_transform
- csskit_transform: introduce css-minify-tests (#886) ([#886](#886))
- csskit_transform: snapshot css-minify-tests failures instead of asserting count (#893) ([#893](#893))
- csskit_transform: reduce Urls to Strings where possible (#910) ([#910](#910))

### Csskit_vscode
- chore(deps): update dependencies (minor) (#884) ([#884](#884))
- chore(deps): update dependencies to v1.49.0 (minor) (#901) ([#901](#901))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant