-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify bench/ann
scripts to Python based module
#1642
Simplify bench/ann
scripts to Python based module
#1642
Conversation
cpp/bench/ann/scripts/get_dataset.py
Outdated
def convert_hdf5_to_fbin(path, normalize): | ||
if normalize and "angular" in path: | ||
p = subprocess.Popen(["python", "scripts/hdf5_to_fbin.py", "-n", | ||
"%s" % path]) | ||
else: | ||
p = subprocess.Popen(["python", "scripts/hdf5_to_fbin.py", | ||
"%s" % path]) | ||
p.wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of invoking via a subprocess - can we just call the python function directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, that script doesn't have a callable function. Would you prefer I refactor the script to make it work or we can do it later?
Co-authored-by: Ben Frederickson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Just one suggestion to avoid old style string formatting, but otherwise I think this is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes are looking good. I did a first-pass to provide some feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so close. I'm really really excited about this. The readme is looking great and the fact that we can now do these end-to-end are perfect. Most of my feedback now is about going just a little farther and easing everything possible for the new user.
# (1) prepare dataset | ||
# download manually "Ground Truth" file of "Yandex DEEP" | ||
# suppose the file name is deep_new_groundtruth.public.10K.bin | ||
../../scripts/split_groundtruth.pl deep_new_groundtruth.public.10K.bin groundtruth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should wrap this in python for consistency. It's just confusing seeing a bunch of python scripts and then seeing pearl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should slip this task to a follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that can work as an immediate follow-up. I'd prefer for it to still be worked into 23.08. We don't necessarily need to support every unique parameter combination initially- so long as we support what's needed to run a basic benchmark end-to-end for billion-scale.
mamba env create --name raft_ann_benchmarks -f conda/environments/bench_ann_cuda-118_arch-x86_64.yaml | ||
conda activate raft_ann_benchmarks | ||
|
||
mamba install -c rapidsai libraft-ann-bench |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be realistic to setup a conda environment that does not depend on cuda conda packages and uses system cuda installation instead? I'd love to be able to use these scripts using docker containers with latest cuda drivers. In fact, I think this would be the main use case on devtech side: to test and adjust raft implementation for the upcoming hardware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, with some testing we can come up with a more minimal environment that does not include any cuda conda packages. Let me know if we can work on this together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks so much @divyegala!
/merge |
Closes #1633