-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nearest neighbors with trees perf decreased by debugging stats #13330
Comments
The performance scaling without stats seems indeed almost perfect (as Do you think it might be possible to optionally allow computing them with some parameter that defaults to |
From what I saw:
|
Could you elaborate on that? So you are saying something like http://docs.cython.org/en/latest/src/userguide/language_basics.html#conditional-statements won't work? |
whoops... I did not know about that, I will delve a bit more and test something based on this |
One possible explanation would be a typical case of False Sharing: CPU cache invalidation by concurrent write access in contiguously allocated data structure fields that live in the same cache line. |
One way to check this hypothesis would be to use linux perf or cachegrind to collect cache invalidation statistics with and without #19884. |
Description
For
ball_tree
andkd_tree
algorithms, some stats about the tree queries highly decrease the parallelization performances increase.Those stats are:
n_trims
: queried points outside node radiusn_leaves
: leaves reached while queryingn_splits
: non-leaves queried nodesn_calls
: num of computed distancesThose stats only seem useful for debugging, do not look like part of the official API (no documentation) and only 2 (personal) git repos use the method (
get_tree_stats
) to get them.Deactivating them highly improves performances of associated algorithms.
Benchmark
Test of kneighbors function with default parameters and:
(also tested openMP prange parallism but it does not improve perf)
The text was updated successfully, but these errors were encountered: