Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate dataset of select_k times #1497

Merged
merged 2 commits into from
May 11, 2023

Conversation

benfred
Copy link
Member

@benfred benfred commented May 8, 2023

This adds an optional flag (--select_k_dataset) to the MATRIX_BENCH that will turn on generating a grid search of benchmarks for different select_k algorithms. Since this adds about 100x as many benchmarks to run as previous (90k vs 900), this is opt-in only right now. This will be used to learn a heuristic function in #1455

This also integrates the faiss block select top-k algorithm into this benchmarking, so that we can compare how it performs against the other select_k algorithms

This adds an optional flag (`--select_k_dataset`) to the MATRIX_BENCH
that will turn on generating a grid search of benchmarks for different
select_k algorithms. This will be used to learn a heuristic function
in rapidsai#1455

This also integrates the faiss block select top-k algorithm into this
benchmarking, so that we can compare how it performs against the other
select_k algorithms
@benfred benfred added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 8, 2023
@benfred benfred requested review from a team as code owners May 8, 2023 22:29
@cjnolet cjnolet assigned cjnolet and benfred and unassigned cjnolet May 10, 2023
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. I figure if we end up finding out this is a common pattern that we use quite a bit then we can focus on consolidating boilerplate. In the meantime, I think this is just fine.

@benfred
Copy link
Member Author

benfred commented May 11, 2023

/merge

@rapids-bot rapids-bot bot merged commit 1d1c523 into rapidsai:branch-23.06 May 11, 2023
@benfred benfred deleted the generate_select_dataset branch May 11, 2023 04:43
benfred added a commit to benfred/raft that referenced this pull request May 17, 2023
This uses the select_k dataset from rapidsai#1497 to
learn a heuristic of the fastest select_k variant based off the rows/ cols/k
of the input. This heuristic is modelled as a DecisionTree, which is automatically
exported in C++ code that is compiled into RAFT. This lets us learn a function to
pick the fastest select_k method - which requires only a few if statements in C++
code, making it very cheap to evaluate.
rapids-bot bot pushed a commit that referenced this pull request May 17, 2023
This uses the select_k dataset from #1497 to learn a heuristic of the fastest select_k variant based off the rows/ cols/k of the input. This heuristic is modelled as a DecisionTree, which is automatically exported in C++ code that is compiled into RAFT. This lets us learn a function to pick the fastest select_k method - which requires only a few if statements in C++ code, making it very cheap to evaluate.

Authors:
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1523
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

2 participants