gh-127022: Simplify `PyStackRef_FromPyObjectSteal` by colesbury · Pull Request #127024 · python/cpython

colesbury · 2024-11-19T16:45:43Z

This gets rid of the immortal check in PyStackRef_FromPyObjectSteal(). Overall, this improves performance about 1-2% in the free threading build.

This also renames PyStackRef_Is() to PyStackRef_IsExactly() because the macro requires that the tag bits of the arguments match, which is only true in certain special cases.

Issue: Simplify PyStackRef_FromPyObjectSteal #127022

This gets rid of the immortal check in `PyStackRef_FromPyObjectSteal()`. Overall, this improves performance about 2% in the free threading build. This also renames `PyStackRef_Is()` to `PyStackRef_IsExactly()` because the macro requires that the tag bits of the arguments match, which is only true in certain special cases.

Fidget-Spinner · 2024-11-20T13:05:57Z

Said benchmark: https://github.com/facebookexperimental/free-threading-benchmarking/tree/main/results/bm-20241118-3.14.0a1+-ed7085a-NOGIL

I was thinking of how this breaks the nice encapsulation we have :(, but 2% speedup is too good to give up.

Include/internal/pycore_stackref.h

Co-authored-by: Pieter Eendebak <[email protected]>

mpage

There are a number of cases where we've gone from checking bitwise equality with PyStackRef_Is to checking equality after masking out the deferred bit with one of PyStackRef_Is{None,True,False}. Were the previous checks wrong?

Include/internal/pycore_stackref.h

mpage · 2024-11-21T19:40:30Z

Python/bytecodes.c

        replaced op(_POP_JUMP_IF_TRUE, (cond -- )) {
            assert(PyStackRef_BoolCheck(cond));
-            int flag = PyStackRef_Is(cond, PyStackRef_True);
+            int flag = PyStackRef_IsExactly(cond, PyStackRef_True);


Why do we use PyStackRef_IsExactly here (which doesn't mask out the deferred bit) but use PyStackRef_IsFalse (which does mask out the deferred bit) in _POP_JUMP_IF_FALSE above? Is this the rare case where it's safe?

Our codegen ensures that these ops only see True or False. That's often by adding a TO_BOOL immediately before, which may be folded into COMPARE_OP. The preceding TO_BOOL, including in COMPARE_OP, ensures the canonical representation of PyStackRef_False or PyStackRef_True with the deferred bit set.

However, there are two places in codegen.c that omit the TO_BOOL because they have other reasons to know that the result is exactly a boolean:

cpython/Python/codegen.c

Lines 678 to 682 in 09c240f

ADDOP_I(c, loc, LOAD_FAST, 0);

ADDOP_LOAD_CONST(c, loc, _PyLong_GetOne());

ADDOP_I(c, loc, COMPARE_OP, (Py_NE << 5) | compare_masks[Py_NE]);

NEW_JUMP_TARGET_LABEL(c, body);

ADDOP_JUMP(c, loc, POP_JUMP_IF_FALSE, body);

cpython/Python/codegen.c

Lines 5746 to 5749 in 09c240f

ADDOP(c, LOC(p), GET_LEN);

ADDOP_LOAD_CONST_NEW(c, LOC(p), PyLong_FromSsize_t(size));

ADDOP_COMPARE(c, LOC(p), GtE);

RETURN_IF_ERROR(jump_to_fail_pop(c, LOC(p), pc, POP_JUMP_IF_FALSE));

The COMPARE_OPs here still generate bools, but not always in the canonical representation. So we can either:

Modify COMPARE_OP to ensure the canonical representation like https://github.com/colesbury/cpython/blob/5583ac0c311132e36ef458842e087945898ffdec/Python/bytecodes.c#L2409-L2416

Use PyStackRef_IsFalse (instead of PyStackRef_IsExactly) in the JUMP_IF_FALSE

Modify the codegen by inserting TO_BOOL in those two spots.

That makes sense, thanks for the explanation. Since using PyStackRef_IsExactly safely is sensitive to code generation changes, I might suggest using it only when we're sure it actually matters for performance, and default to using the variants that mask out the deferred bits everywhere by default since those are always safe. I'd guess that this wouldn't affect the performance improvement of this change much, since it should come from avoiding the tagging in _PyStackRef_FromPyObjectSteal. I don't feel super strongly though.

I'll switch to using PyStackRef_IsFalse and PyStackRef_IsTrue.

I'm no longer convinced that PyStackRef_IsExactly is actually a performance win (and I didn't see it in measurements). I think we have issues with code generation quality that we'll need to address later. Things like POP_JUMP_IF_NONE are composed of _IS_NONE and _POP_JUMP_IF_TRUE and we pack the intermediate result in a tagged _PyStackRef. Clang does a pretty good job of optimizing through it. GCC less so: https://gcc.godbolt.org/z/Ejs8c78qd.

colesbury · 2024-11-21T20:20:25Z

Were the previous checks wrong?

No, the previous checks were okay when PyStackRef_FromPyObjectSteal ensured that stack refs created from immortal objects always had their deferred bit set.

mpage

Nice!

mpage · 2024-11-21T21:53:36Z

Python/bytecodes.c

        replaced op(_POP_JUMP_IF_TRUE, (cond -- )) {
            assert(PyStackRef_BoolCheck(cond));
-            int flag = PyStackRef_Is(cond, PyStackRef_True);
+            int flag = PyStackRef_IsExactly(cond, PyStackRef_True);


That makes sense, thanks for the explanation. Since using PyStackRef_IsExactly safely is sensitive to code generation changes, I might suggest using it only when we're sure it actually matters for performance, and default to using the variants that mask out the deferred bits everywhere by default since those are always safe. I'd guess that this wouldn't affect the performance improvement of this change much, since it should come from avoiding the tagging in _PyStackRef_FromPyObjectSteal. I don't feel super strongly though.

colesbury · 2024-11-22T17:54:57Z

Benchmark on most recent changes: https://github.com/facebookexperimental/free-threading-benchmarking/tree/main/results/bm-20241122-3.14.0a1+-a9e4872-NOGIL#vs-base

Geometric mean: 1.03x faster (HPT: reliability of 100.00%, 1.02x faster at 99th %ile)

This gets rid of the immortal check in `PyStackRef_FromPyObjectSteal()`. Overall, this improves performance about 2% in the free threading build. This also renames `PyStackRef_Is()` to `PyStackRef_IsExactly()` because the macro requires that the tag bits of the arguments match, which is only true in certain special cases.

colesbury added the topic-free-threading label Nov 19, 2024

colesbury requested review from Fidget-Spinner, markshannon and mpage November 19, 2024 16:45

colesbury added the skip news label Nov 19, 2024

bedevere-app bot mentioned this pull request Nov 19, 2024

Simplify PyStackRef_FromPyObjectSteal #127022

Closed

colesbury force-pushed the gh-127022-cheaper-steal branch from 2c43ad0 to 5583ac0 Compare November 19, 2024 16:48

Simplify bytecodes.c logic a bit

16f7e7b

colesbury marked this pull request as ready for review November 21, 2024 16:35

bedevere-app bot added the awaiting core review label Nov 21, 2024

eendebakpt reviewed Nov 21, 2024

View reviewed changes

Include/internal/pycore_stackref.h Outdated Show resolved Hide resolved

Update Include/internal/pycore_stackref.h

3631451

Co-authored-by: Pieter Eendebak <[email protected]>

mpage reviewed Nov 21, 2024

View reviewed changes

Add assert to PyStackRef_IsExactly

06ab2ec

mpage approved these changes Nov 21, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Nov 21, 2024

Use PyStackRef_IsFalse and PyStackRef_IsTrue

a9e4872

colesbury merged commit 4759ba6 into python:main Nov 22, 2024

bedevere-app bot removed the awaiting merge label Nov 22, 2024

colesbury deleted the gh-127022-cheaper-steal branch November 22, 2024 17:55

colesbury mentioned this pull request Dec 8, 2025

gh-139716: Use the same StackRef flagging scheme for immortals on FT build as on GIL build #141675

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

gh-127022: Simplify `PyStackRef_FromPyObjectSteal`#127024

gh-127022: Simplify `PyStackRef_FromPyObjectSteal`#127024
colesbury merged 5 commits intopython:mainfrom
colesbury:gh-127022-cheaper-steal

colesbury commented Nov 19, 2024 •

edited

Loading

Uh oh!

Fidget-Spinner commented Nov 20, 2024

Uh oh!

Uh oh!

mpage left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

mpage Nov 21, 2024

Uh oh!

colesbury Nov 21, 2024 •

edited

Loading

Uh oh!

mpage Nov 21, 2024

Uh oh!

colesbury Nov 22, 2024

Uh oh!

colesbury commented Nov 21, 2024

Uh oh!

mpage left a comment

Uh oh!

mpage Nov 21, 2024

Uh oh!

colesbury commented Nov 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	ADDOP_I(c, loc, LOAD_FAST, 0);
	ADDOP_LOAD_CONST(c, loc, _PyLong_GetOne());
	ADDOP_I(c, loc, COMPARE_OP, (Py_NE << 5) \| compare_masks[Py_NE]);
	NEW_JUMP_TARGET_LABEL(c, body);
	ADDOP_JUMP(c, loc, POP_JUMP_IF_FALSE, body);

	ADDOP(c, LOC(p), GET_LEN);
	ADDOP_LOAD_CONST_NEW(c, LOC(p), PyLong_FromSsize_t(size));
	ADDOP_COMPARE(c, LOC(p), GtE);
	RETURN_IF_ERROR(jump_to_fail_pop(c, LOC(p), pc, POP_JUMP_IF_FALSE));

Uh oh!

Comments

Conversation

colesbury commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fidget-Spinner commented Nov 20, 2024

Uh oh!

Uh oh!

mpage left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mpage Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

colesbury Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mpage Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

colesbury Nov 22, 2024

Choose a reason for hiding this comment

Uh oh!

colesbury commented Nov 21, 2024

Uh oh!

mpage left a comment

Choose a reason for hiding this comment

Uh oh!

mpage Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

colesbury commented Nov 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

colesbury commented Nov 19, 2024 •

edited

Loading

mpage left a comment •

edited

Loading

colesbury Nov 21, 2024 •

edited

Loading