bpo-46406: Faster single digit int division.#30626
Conversation
This expresses the algorithm in a more basic manner resulting in better instruction generation by todays compilers. See https://mail.python.org/archives/list/[email protected]/thread/ZICIMX5VFCX4IOFH5NUPVHCUJCQ4Q7QM/#NEUNFZU3TQU4CPTYZNF3WCN7DOJBBTK5
mdickinson
left a comment
There was a problem hiding this comment.
LGTM, modulo a cast needed to silence the compiler on Windows.
Co-authored-by: Mark Dickinson <[email protected]>
Co-authored-by: Mark Dickinson <[email protected]>
|
Still LGTM. Thank you! |
|
@mdickinson: Please replace |
| PGO/FDO builds doing value specialization such as a fast path for //10. :) | ||
|
|
||
| Verify that 17 isn't specialized and this works as a quick test: | ||
| python -m timeit -s 'x = 10**1000; r=x//10; assert r == 10**999, r' 'x//17' |
There was a problem hiding this comment.
Asserting that division by 10 "works" is irrelevant when the test is actually dividing by 17. But it's a timing statement, so asserting that the division works is beside the point anyway. The point is to get the timing, so
python -m timeit -s 'x = 10**1000' 'x//17'
would be best. And 17 isn't of real interest anyway. Use 10 instead. That's the value we've actually seen special-cased in PGO builds. 17 only came up when Mark temporarily fudged a PGO-build input test to pick on 17 instead of 10, just to see what would happen.
There was a problem hiding this comment.
The command is just a paste of what i was using in my terminal as a quick unittest and microbenchmark. The assert in the setup code is a unittest. It's just easy to write a concise test using 10 and what value is used there isn't really relevant to what is used in the x//17 microbenchmark expression itself.
| remainder = dividend % n; | ||
| pout[size] = quotient; | ||
| } | ||
| return remainder; |
There was a problem hiding this comment.
Patterns like *--pin and *--pout are ubiquitous in this file. so converting that style to pin[size] (etc) is jarring in context. The patch would be better if it stuck to changing what needed to be changed.
It would also be good to add a comment explaining why computing both "/" and "%" is faster on most boxes now than doing what the code originally did. "/" and "%" are both very expensive if they're done in isolation, which is presumably why the original code did a "*" and "-" instead of "%".
There was a problem hiding this comment.
Feel free to add the missing comment to this effect while you're in this file in your #30856.
[] vs pointer arithmetic would make me want to rerun benchmarks and examine generated code out of curiosity so i'll leave that alone for now - it might be worth investigating and would be consistent. I just took what Mark had written on python-dev and ran with it.
This expresses the algorithm in a more basic manner resulting in better
instruction generation by todays compilers.
See https://mail.python.org/archives/list/[email protected]/thread/ZICIMX5VFCX4IOFH5NUPVHCUJCQ4Q7QM/#NEUNFZU3TQU4CPTYZNF3WCN7DOJBBTK5
https://bugs.python.org/issue46406