Skip to content

gh-146385: Switch back to re to detect shlex.quote slow path#146408

Open
bonzini wants to merge 1 commit intopython:mainfrom
bonzini:fix146385
Open

gh-146385: Switch back to re to detect shlex.quote slow path#146408
bonzini wants to merge 1 commit intopython:mainfrom
bonzini:fix146385

Conversation

@bonzini
Copy link

@bonzini bonzini commented Mar 25, 2026

Commit 06a26fd ("gh-118761: Optimise import time for shlex (#132036)") when the input has to be quoted. This is because the regular expression search was able to short-circuit at the first unsafe character.

Go back to the same algorithm as 3.13, but make the "import re" and compilation of the regular expression lazy.

Testing s.isascii() makes shlex.quote() twice as fast in the non-ASCII case, but costs up to 25% of the full run time (because it necessitates an earlier isinstance check) if the string is ASCII. The latter is probably the common case, so drop the check.

@bedevere-app
Copy link

bedevere-app bot commented Mar 25, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@python-cla-bot
Copy link

python-cla-bot bot commented Mar 25, 2026

All commit authors signed the Contributor License Agreement.

CLA signed

@picnixz
Copy link
Member

picnixz commented Mar 25, 2026

@AA-Turner Can you check if the results are consistent on Windows as well please?

@picnixz
Copy link
Member

picnixz commented Mar 25, 2026

Here is what I suggest:

  • For 3.14, we can fix the regression (I will consider it a bug) if this reproduces on multiple systems. We might have other things at play here depending on where Python is built.
  • For 3.15, we can use lazy imports to make it cleaner if necessary.

I will also try to check what happens on older systems (openSUSE + older GCC/clang).

@picnixz picnixz requested review from AA-Turner and picnixz March 25, 2026 09:05
Commit 06a26fd ("pythongh-118761: Optimise import time for ``shlex`` (python#132036)")
when the input has to be quoted. This is because the regular expression
search was able to short-circuit at the first unsafe character.

Go back to the same algorithm as 3.13, but make the "import re" and compilation
of the regular expression lazy.

Testing s.isascii() makes shlex.quote() twice as fast in the non-ASCII
case, but costs up to 25% of the full run time (because it necessitates
an earlier isinstance check) if the string *is* ASCII.  The latter is
probably the common case, so drop the check.
@bedevere-app
Copy link

bedevere-app bot commented Mar 25, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bonzini
Copy link
Author

bonzini commented Mar 25, 2026

Yeah I didn't use lazy import to cater for backports. Would you prefer the lazy import version to be in this fix PR, or in a follow-up?

@picnixz
Copy link
Member

picnixz commented Mar 25, 2026

For now, let's check whether the regression is consistent across platforms.

@bonzini
Copy link
Author

bonzini commented Mar 25, 2026

For now, let's check whether the regression is consistent across platforms.

It is, and also it's really a complexity change. shlex.quote('a ' + 'b' * N) will always be slower in 3.14+ if you make N high enough. (And in practice, 10 is enough to have a substantial effect).

@picnixz
Copy link
Member

picnixz commented Mar 25, 2026

What I want to know is why Adam reported numbers that went in the other direction. Ideally I also want to know the stdev of the numbers using pyperf (which I will do this w-e)

@picnixz
Copy link
Member

picnixz commented Mar 25, 2026

I suggested in #132036 (comment) to replace the re call if we had something more efficient, and Adam found out that it was faster using bytes.translate so I'd like to know which data it was tested on:

With this, we go from ~26 microseconds for quote to ~2 (with only safe characters) and ~6 (with characters needing quotation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants