Speed up multipart header parsing and callback dispatch by Kludex · Pull Request #295 · Kludex/python-multipart

Kludex · 2026-06-04T07:45:42Z

Summary

Two hot-path optimizations to MultipartParser, both behaviour-preserving for valid input:

find-based header parsing. HEADER_FIELD and HEADER_VALUE now jump straight to the delimiter with data.find() and validate the whole field-name span at once via bytes.translate(None, TOKEN_CHARS), instead of scanning byte by byte. This mirrors how PART_DATA and the querystring parser already use data.find.
Dropped per-callback logger.debug. BaseParser.callback was issuing two logger.debug(...) calls on every single callback (per part-data chunk, per header, per field), which dominated dispatch cost even when logging was disabled.

A couple of incidental cleanups: boundary_length = len(boundary) is hoisted out of the per-iteration PART_DATA path, and TOKEN_CHARS is exposed as bytes (with TOKEN_CHARS_SET derived from it) for the bulk validation.

Benchmarks

Measured against main on CPython 3.13 and 3.14 (best-of-N, ns/op), using the existing tests/test_benchmarks.py workloads:

Benchmark	Python 3.13	Python 3.14
`large_form`	~3.1x faster	~2.6x faster
`simple_form`	~2.2x faster	~1.95x faster
`querystring`	~2.4x faster	~1.4x faster
`file_upload`	~1.1x faster	~1.04x faster
`worstcase`	~1.1x faster	~1.03x faster

Header-heavy and small forms benefit most; body-dominated cases (file_upload, worstcase) already used find for boundary scanning, so they only pick up the callback-dispatch savings.

Correctness

All existing tests pass.
Differentially fuzzed the new parser against main's implementation: identical callback event streams and identical errors across randomized bodies x every chunk-split strategy (including byte-by-byte), at both small and default header limits.
Verified downstream: ran Starlette 1.2.1's full tests/test_formparsers.py (40 tests) against this branch - all pass.

The one observable difference is on a malformed-input error path: a header name that is both over max_header_size and contains an invalid character may report the invalid-character error rather than the size-limit error (both still raise MultipartParseError). This was deemed acceptable.

AI Disclaimer

This PR was developed with the assistance of either Claude or Codex. I've reviewed and verified the changes.

Parse header field names and values with bytes.find/translate to jump to the delimiter instead of scanning byte by byte, and drop the per-callback logger.debug calls from the hot path. This roughly halves parse time for header-heavy and small forms (large_form ~3x, simple_form ~2x, querystring ~2.4x on CPython 3.13/3.14), with no behaviour change for valid input.

codspeed-hq · 2026-06-04T07:46:27Z

Merging this PR will improve performance by 34.1%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
✅ 4 untouched benchmarks

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Simulation	`test_querystring_large_form`	1,229.4 µs	916.8 µs	+34.1%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing perf/hot-path-optimizations (0a64a18) with main (9d3ead5)}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a64a18255

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T07:48:39Z

+                field = data[i:end]
+                if field.translate(None, TOKEN_CHARS):


Enforce header size before bulk-validating field names

For a malformed multipart part whose header field never contains : and is delivered in a large write() chunk, this slices and translates the entire remaining chunk before advance_header_size(end - i) runs below. That means the default max_header_size no longer bounds the work or temporary allocation for oversized header names; a request with megabytes of token characters in a header line will be scanned/copied in full instead of failing once the 4 KiB limit is crossed.

Useful? React with 👍 / 👎.

This MR contains the following updates: | Package | Type | Update | Change | OpenSSF | |---|---|---|---|---| | [debugpy](https://aka.ms/debugpy) ([source](https://github.com/microsoft/debugpy)) | dev | patch | `1.8.20` → `1.8.21` | [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/microsoft/debugpy/badge)](https://securityscorecards.dev/viewer/?uri=github.com/microsoft/debugpy) | | [numpy](https://github.com/numpy/numpy) ([changelog](https://numpy.org/doc/stable/release)) | dependencies | patch | `2.4.4` → `2.4.6` | [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/numpy/numpy/badge)](https://securityscorecards.dev/viewer/?uri=github.com/numpy/numpy) | | [pydantic-settings](https://github.com/pydantic/pydantic-settings) ([changelog](https://github.com/pydantic/pydantic-settings/releases)) | dependencies | patch | `2.14.0` → `2.14.2` | [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/pydantic/pydantic-settings/badge)](https://securityscorecards.dev/viewer/?uri=github.com/pydantic/pydantic-settings) | | [python-multipart](https://github.com/Kludex/python-multipart) ([changelog](https://github.com/Kludex/python-multipart/blob/master/CHANGELOG.md)) | dependencies | patch | `^0.0.22` → `^0.0.32` | [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/Kludex/python-multipart/badge)](https://securityscorecards.dev/viewer/?uri=github.com/Kludex/python-multipart) | | [types-requests](https://github.com/python/typeshed) ([changelog](https://github.com/typeshed-internal/stub_uploader/blob/main/data/changelogs/requests.md)) | dependencies | patch | `2.32.0.20240523` → `2.32.4.20260324` | [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/python/typeshed/badge)](https://securityscorecards.dev/viewer/?uri=github.com/python/typeshed) | --- ### Release Notes <details> <summary>microsoft/debugpy (debugpy)</summary> ### [`v1.8.21`](https://github.com/microsoft/debugpy/releases/tag/v1.8.21): debugpy v1.8.21 [Compare Source](microsoft/debugpy@v1.8.20...v1.8.21) Fixes for: - Return evaluate result in DAP response body instead of writing to stdout: [#2027](microsoft/debugpy#2027) - Prevent invalid `scopes` request from crashing debug session: [#2026](microsoft/debugpy#2026) - Skip uninitialized `__slots__` in variable resolver: [#2024](microsoft/debugpy#2024) - Handle `-c` arguments that are `bytes` instead of `str`: [#2021](microsoft/debugpy#2021) - Fix evaluation of variables from chained exception frames: [#2018](microsoft/debugpy#2018) - `ContinueRequest` with a specific `threadId` no longer resumes all threads (in-process adapter): [#2012](microsoft/debugpy#2012) - Avoid strong reference to exceptions during unwind: [#2008](microsoft/debugpy#2008) - Show error message on evaluate failures in the hover context: [#2006](microsoft/debugpy#2006) - Display `dlerror` output when `dlopen` fails: [#2000](microsoft/debugpy#2000) - Replace removed `pkgutil.get_loader` with `importlib.util.find_spec` in `get_fullname`: [#1998](microsoft/debugpy#1998) Enhancements: - Add option to ignore all system exit codes: [#2017](microsoft/debugpy#2017) - Pull changes from pydevd up to March 2026: [#2010](microsoft/debugpy#2010) Infrastructure work: - Suppress Flawfinder false positives on Cython memcpy / read-loop iterators (TSA [#2816216](https://github.com/microsoft/debugpy/issues/2816216), [#2816217](https://github.com/microsoft/debugpy/issues/2816217), [#2816218](https://github.com/microsoft/debugpy/issues/2816218), [#2816219](https://github.com/microsoft/debugpy/issues/2816219), [#2816220](https://github.com/microsoft/debugpy/issues/2816220)): [#2028](microsoft/debugpy#2028), [#2029](microsoft/debugpy#2029), [#2030](microsoft/debugpy#2030), [#2031](microsoft/debugpy#2031), [#2032](microsoft/debugpy#2032) Thanks to [@maxbachmann](https://github.com/maxbachmann), [@mfussenegger](https://github.com/mfussenegger), and [@sambrightman](https://github.com/sambrightman) for the commits. </details> <details> <summary>numpy/numpy (numpy)</summary> ### [`v2.4.6`](https://github.com/numpy/numpy/releases/tag/v2.4.6): (May 18, 2026) [Compare Source](numpy/numpy@v2.4.5...v2.4.6) ### NumPy 2.4.6 Release Notes NumPy 2.4.6 is a quick release that fixes a regression discovered in the 2.4.5 release. This release supports Python versions 3.11-3.14 #### Contributors A total of 4 people contributed to this release. People with a "+" by their names contributed a patch for the first time. - !EarlMilktea - Charles Harris - Sebastian Berg - Warren Weckesser #### Pull requests merged A total of 4 pull requests were merged for this release. - [#31444](numpy/numpy#31444): MAINT: Prepare 2.4.x for further development - [#31453](numpy/numpy#31453): BUG: Fix regression in `arr.conj()` - [#31459](numpy/numpy#31459): BUG: `np.linalg.svd(..., hermitian=True)` returns non-unitary... - [#31460](numpy/numpy#31460): BUG: Don't call INCREF/DECREF on descr in NpyStringAcquireAllocator... ### [`v2.4.5`](https://github.com/numpy/numpy/releases/tag/v2.4.5): (May 15, 2026) [Compare Source](numpy/numpy@v2.4.4...v2.4.5) ### NumPy 2.4.5 Release Notes NumPy 2.4.5 is a patch release that fixes bugs discovered after the 2.4.4 release, has some typing improvements, and maintains infrastructure. This release supports Python versions 3.11-3.14 #### Contributors A total of 17 people contributed to this release. People with a "+" by their names contributed a patch for the first time. - Aleksei Nikiforov - Anarion Zuo + - Ankit Ahlawat - Breno Favaretto + - Charles Harris - Igor Krivenko + - Ijtihed Kilani + - Joren Hammudoglu - Maarten Baert + - Matti Picus - Nathan Goldbaum - Praneeth Kodumagulla + - Ralf Gommers - RoomWithOutRoof + - Sebastian Berg - Warren Weckesser - div + #### Pull requests merged A total of 28 pull requests were merged for this release. - [#31093](numpy/numpy#31093): MAINT: Prepare 2.4.x for further development - [#31182](numpy/numpy#31182): TYP: fix `np.shape` assignability issue for python lists ([#31171](numpy/numpy#31171)) - [#31197](numpy/numpy#31197): ENH: Return rank 0 for empty matrices in matrix\_rank ([#30422](numpy/numpy#30422)) - [#31198](numpy/numpy#31198): CI/BUG: add native jobs for s390x, fix bug in `pack_inner`... - [#31199](numpy/numpy#31199): BUG: f2py map complex\_long\_double to NPY\_CLONGDOUBLE - [#31205](numpy/numpy#31205): MAINT: f2py: Stop setting re.\_MAXCACHE to 50. - [#31206](numpy/numpy#31206): BUG: fix heap buffer overflow in timedelta to string casts - [#31207](numpy/numpy#31207): MAINT: Rename ppc64le and s390x workflow ([#31121](numpy/numpy#31121)) - [#31208](numpy/numpy#31208): BUG: Fix matvec/vecmat in-place aliasing (out=input produces... - [#31209](numpy/numpy#31209): TYP: `tile`: accept numpy scalars and arrays as second argument... - [#31211](numpy/numpy#31211): DEP: Undo deprecation for np.dtype() signature used by old pickles... - [#31212](numpy/numpy#31212): REV: Manual revert of float16 svml use ([#31178](numpy/numpy#31178)) - [#31222](numpy/numpy#31222): TYP: `ix_` fix for boolean and non-1d input ([#31218](numpy/numpy#31218)) - [#31329](numpy/numpy#31329): BUG: incorrect temp elision for new-style (NEP 43) user-defined... - [#31330](numpy/numpy#31330): TYP: fix sliding\_window\_view axis parameter typing - [#31335](numpy/numpy#31335): BUG: Prevent deadlock due to downstream importing NumPy in dlopen... - [#31336](numpy/numpy#31336): BUG: Fix segfault in nditer.multi\_index when \_\_getitem\_\_ raises... - [#31338](numpy/numpy#31338): TYP: Fix ruff lint error - [#31357](numpy/numpy#31357): BUG: fix memory leak in np.zeros when fill-zero loop raises ([#31320](numpy/numpy#31320)) - [#31358](numpy/numpy#31358): BUG: np.einsum() fails with a 0-dimensional out argument and... - [#31379](numpy/numpy#31379): BUG: Fix signed overflow issue in npy\_gcd for INT\_MIN on s390x... - [#31383](numpy/numpy#31383): CI: remove Cirrus CI FreeBSD job ([#31380](numpy/numpy#31380)) - [#31390](numpy/numpy#31390): BUILD: newer MKL uses so.3 - [#31391](numpy/numpy#31391): BLD/MAINT: improve support for Intel LLVM compilers - [#31401](numpy/numpy#31401): BUG: Avoid UB in [safe]()\[add,sub,mul] helpers ([#31396](numpy/numpy#31396)) - [#31402](numpy/numpy#31402): BUG: exclude \_\_pycache\_\_ directories from wheels ([#31397](numpy/numpy#31397)) - [#31404](numpy/numpy#31404): TYP: `_NestedSequence` type parameter default to work around... - [#31426](numpy/numpy#31426): TYP: Fix `DTypeLike` runtime type-checker support ([#31425](numpy/numpy#31425)) </details> <details> <summary>pydantic/pydantic-settings (pydantic-settings)</summary> ### [`v2.14.2`](https://github.com/pydantic/pydantic-settings/releases/tag/v2.14.2) [Compare Source](pydantic/pydantic-settings@v2.14.1...v2.14.2) #### What's Changed This is a security patch release. - Prevent `NestedSecretsSettingsSource` from following symlinks outside `secrets_dir` by [@hramezani](https://github.com/hramezani) in [#889](pydantic/pydantic-settings#889) - Prepare release 2.14.2 by [@hramezani](https://github.com/hramezani) in [#890](pydantic/pydantic-settings#890) ##### Security Fixes [GHSA-4xgf-cpjx-pc3j](GHSA-4xgf-cpjx-pc3j): `NestedSecretsSettingsSource` with `secrets_nested_subdir=True` could follow a symbolic link inside `secrets_dir` pointing outside it, reading out-of-tree files into settings values and bypassing the `secrets_dir_max_size` cap. Affected versions: `>= 2.12.0, < 2.14.2`. **Full Changelog**: <pydantic/pydantic-settings@v2.14.1...v2.14.2> ### [`v2.14.1`](https://github.com/pydantic/pydantic-settings/releases/tag/v2.14.1) [Compare Source](pydantic/pydantic-settings@v2.14.0...v2.14.1) #### What's Changed - Bump the python-packages group with 4 updates by [@dependabot](https://github.com/dependabot)\[bot] in [#850](pydantic/pydantic-settings#850) - Bump the python-packages group with 5 updates by [@dependabot](https://github.com/dependabot)\[bot] in [#854](pydantic/pydantic-settings#854) - Bump the github-actions group with 3 updates by [@dependabot](https://github.com/dependabot)\[bot] in [#853](pydantic/pydantic-settings#853) - Bump the python-packages group with 2 updates by [@dependabot](https://github.com/dependabot)\[bot] in [#856](pydantic/pydantic-settings#856) - Fix field named `cls` conflicting with classmethod parameter by [@hramezani](https://github.com/hramezani) in [#858](pydantic/pydantic-settings#858) - Prepare release 2.14.1 by [@hramezani](https://github.com/hramezani) in [#859](pydantic/pydantic-settings#859) **Full Changelog**: <pydantic/pydantic-settings@v2.14.0...v2.14.1> </details> <details> <summary>Kludex/python-multipart (python-multipart)</summary> ### [`v0.0.32`](https://github.com/Kludex/python-multipart/blob/HEAD/CHANGELOG.md#0032-2026-06-04) [Compare Source](Kludex/python-multipart@0.0.31...0.0.32) - Speed up partial-boundary scanning for CR/LF-dense part data [#300](Kludex/python-multipart#300). ### [`v0.0.31`](https://github.com/Kludex/python-multipart/blob/HEAD/CHANGELOG.md#0031-2026-06-04) [Compare Source](Kludex/python-multipart@0.0.30...0.0.31) - Speed up multipart header parsing and callback dispatch [#295](Kludex/python-multipart#295). - Bound header field name size before validating [#296](Kludex/python-multipart#296). - Validate `Content-Length` is non-negative in `parse_form` [#297](Kludex/python-multipart#297). ### [`v0.0.30`](https://github.com/Kludex/python-multipart/blob/HEAD/CHANGELOG.md#0030-2026-05-31) [Compare Source](Kludex/python-multipart@0.0.29...0.0.30) - Parse `application/x-www-form-urlencoded` bodies per the WHATWG URL standard, treating only `&` as a field separator [#290](Kludex/python-multipart#290). - Ignore RFC 2231/5987 extended parameters (`name*`, `filename*`) in `parse_options_header`, keeping the plain parameter authoritative per [RFC 7578 §4.2](https://datatracker.ietf.org/doc/html/rfc7578#section-4.2) [#291](Kludex/python-multipart#291). ### [`v0.0.29`](https://github.com/Kludex/python-multipart/blob/HEAD/CHANGELOG.md#0029-2026-05-17) [Compare Source](Kludex/python-multipart@0.0.28...0.0.29) - Handle malformed RFC 2231 continuations in `parse_options_header` [#270](Kludex/python-multipart#270). ### [`v0.0.28`](https://github.com/Kludex/python-multipart/blob/HEAD/CHANGELOG.md#0028-2026-05-10) [Compare Source](Kludex/python-multipart@0.0.27...0.0.28) - Speed up partial-boundary tail scan via `bytes.find` [#281](Kludex/python-multipart#281). - Cap multipart boundary length at 256 bytes [#282](Kludex/python-multipart#282). ### [`v0.0.27`](https://github.com/Kludex/python-multipart/blob/HEAD/CHANGELOG.md#0027-2026-04-27) [Compare Source](Kludex/python-multipart@0.0.26...0.0.27) - Add multipart header limits [#267](Kludex/python-multipart#267). - Pass parse offsets via constructors [#268](Kludex/python-multipart#268). ### [`v0.0.26`](https://github.com/Kludex/python-multipart/blob/HEAD/CHANGELOG.md#0026-2026-04-10) [Compare Source](Kludex/python-multipart@0.0.25...0.0.26) - Skip preamble before the first multipart boundary more efficiently [#262](Kludex/python-multipart#262). - Silently discard epilogue data after the closing multipart boundary [#259](Kludex/python-multipart#259). ### [`v0.0.25`](https://github.com/Kludex/python-multipart/blob/HEAD/CHANGELOG.md#0025-2026-04-10) [Compare Source](Kludex/python-multipart@0.0.24...0.0.25) - Add MIME content type info to `File` [#143](Kludex/python-multipart#143). - Handle CTE values case-insensitively [#258](Kludex/python-multipart#258). - Remove custom `FormParser` classes [#257](Kludex/python-multipart#257). - Add `UPLOAD_DELETE_TMP` to `FormParser` config [#254](Kludex/python-multipart#254). - Emit `field_end` for trailing bare field names on finalize [#230](Kludex/python-multipart#230). - Handle multipart headers case-insensitively [#252](Kludex/python-multipart#252). - Apply Apache-2.0 properly [#247](Kludex/python-multipart#247). ### [`v0.0.24`](https://github.com/Kludex/python-multipart/blob/HEAD/CHANGELOG.md#0024-2026-04-05) [Compare Source](Kludex/python-multipart@0.0.23...0.0.24) - Validate `chunk_size` in `parse_form()` [#244](Kludex/python-multipart#244). ### [`v0.0.23`](https://github.com/Kludex/python-multipart/blob/HEAD/CHANGELOG.md#0023-2026-04-05) [Compare Source](Kludex/python-multipart@0.0.22...0.0.23) - Remove unused `trust_x_headers` parameter and `X-File-Name` fallback [#196](Kludex/python-multipart#196). - Return processed length from `QuerystringParser._internal_write` [#229](Kludex/python-multipart#229). - Cleanup metadata dunders from `__init__.py` [#227](Kludex/python-multipart#227). </details> --- - [ ] If you want to rebase/retry this MR, check this box --- This MR has been generated by [Mend Renovate](https://github.com/renovatebot/renovate).  See merge request swiss-armed-forces/cyber-command/cea/loom!460 Co-authored-by: Loom MR Pipeline Trigger <group_103951964_bot_9504bb8dead6d4e406ad817a607f24be@noreply.gitlab.com> Co-authored-by: shrewd-laidback palace <shrewd-laidback-palace-736-c41-2c1-e464fc974@swiss-armed-forces-open-source.ch>

Kludex merged commit 6732164 into main Jun 4, 2026
15 checks passed

Kludex deleted the perf/hot-path-optimizations branch June 4, 2026 07:48

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

This was referenced Jun 4, 2026

Bound header field name size before validating #296

Merged

Version 0.0.31 #298

Merged

tfoutrein mentioned this pull request Jun 4, 2026

Perf: a few small streaming-parser micro-opts that stack on top of 0.0.32 (#295/#296/#300) — would a PR be welcome? #305

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up multipart header parsing and callback dispatch#295

Speed up multipart header parsing and callback dispatch#295
Kludex merged 1 commit into
mainfrom
perf/hot-path-optimizations

Kludex commented Jun 4, 2026

Uh oh!

codspeed-hq Bot commented Jun 4, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Kludex commented Jun 4, 2026

Summary

Benchmarks

Correctness

AI Disclaimer

Uh oh!

codspeed-hq Bot commented Jun 4, 2026

Merging this PR will improve performance by 34.1%

Performance Changes

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant