Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @skydes
I noticed a speed bottleneck when dealing with HDF5 files. When there is a large number of groups, it takes much more time to read / write to the HDF5 file. As a result, if using more than 15,000 images, it takes much more time to write to the file than compute NetVLAD features for example. A solution was suggested in h5py/h5py#1055 and https://stackoverflow.com/questions/45023488/inserting-many-hdf5-datasets-very-slow . You simply need to use the libver='latest' option when opening HDF5 files. I found that it greatly increased writing speed when creating a great number of groups.
Sorry for the little hiccup, I accidentally deleted the branch.