-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[WIP] ENH : Nearest-neighbors removal of unused stats computations on fit #13331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…into nn_stats_deprec
Thanks for the pull request. Please give your PR a readable title that can be used as a commit message. |
As an additional benchmark, those stats impact is still noticeable but less significant when distances are longer to compute (no stats = this PR):
(benchmark run on a dedicated server with timeit and 7 iterations) |
Following #13330 (comment) , I am working on updating this PR so that stats can be reactivated with compilation arguments (if anybody need them in the future) |
…to do that in setup.py)
Are you still working on this? |
I would even say: are you still working on this @rmenuet? 🙂 |
For your information, I ran some benchmarks to compare the runtime with and without those counters' increments. The raw results are there |
For this specific benchmarking, I think we should fix the size of
Current the There is no nice way to remove with deprecation. If we want to deprecate we would need to:
|
Reference Issues/PRs
Solve #13330
What does this implement/fix? Explain your changes.
By deactivating some undocumented debugging stats: improves perf gain with n_jobs > 1 for nearest neighbors based on tree algorithms (cf. benchmark in issue) and achieves almost linear perf increase with parallelism.
Any other comments?
I can notify the 2 repos that use those stats (https://github.com/fcrimins/py_idistance/blob/master/idist.py and https://github.com/wilseypa/dataAnalysis-scripts/blob/master/dataGeneration/GeneratorProject/ScikitSeqs.py) that those stats won't be computed anymore.