Alleviate HDF5 bottleneck #194

clementinboittiaux · 2022-05-25T06:40:16Z

Hi @skydes
I noticed a speed bottleneck when dealing with HDF5 files. When there is a large number of groups, it takes much more time to read / write to the HDF5 file. As a result, if using more than 15,000 images, it takes much more time to write to the file than compute NetVLAD features for example. A solution was suggested in h5py/h5py#1055 and https://stackoverflow.com/questions/45023488/inserting-many-hdf5-datasets-very-slow . You simply need to use the libver='latest' option when opening HDF5 files. I found that it greatly increased writing speed when creating a great number of groups.
Sorry for the little hiccup, I accidentally deleted the branch.

Use libver='latest' when opening HDF5 files. Greatly increases writting spead when creating a great number of groups. This was suggested in h5py/h5py#1055 and https://stackoverflow.com/questions/45023488/inserting-many-hdf5-datasets-very-slow .

sarlinpe · 2022-06-03T08:35:53Z

Hi @clementinboittiaux,
Nice catch! Does this break backward compatibility - i.e. are you still able to read files created before this fix? HDF5 has indeed given us some trouble recently. Changing the pair separator to create subgroups (#159) helped for long names.

clementinboittiaux · 2022-06-07T13:49:46Z

Hey,
I did not stumble across any incompatibility issue so far. I have been using this new branch with features computed before and after the modifications and everything went well !
I just had trouble when reading features from hLoc with PixLoc, but it seems that it is only because the data structure of the h5 files you are using in PixLoc are not the same. (I made a dirty fix on my fork https://github.com/clementinboittiaux/pixloc/blob/0e1c5b1007ea37f1684a5158007ee79bbb4c7b01/pixloc/utils/io.py#L70).

sarlinpe · 2022-06-15T08:35:05Z

Got it, thanks! I'll do some tests and merge later.

Indeed the structure of the HDF5 file is slightly different, I should mention this somewhere.

Alleviate HDF5 bottleneck

a6378ee

Use libver='latest' when opening HDF5 files. Greatly increases writting spead when creating a great number of groups. This was suggested in h5py/h5py#1055 and https://stackoverflow.com/questions/45023488/inserting-many-hdf5-datasets-very-slow .

Merge branch 'master' into hdf5

82e428a

sarlinpe merged commit 9c04a19 into cvg:master Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alleviate HDF5 bottleneck #194

Alleviate HDF5 bottleneck #194

clementinboittiaux commented May 25, 2022

sarlinpe commented Jun 3, 2022

clementinboittiaux commented Jun 7, 2022

sarlinpe commented Jun 15, 2022

Alleviate HDF5 bottleneck #194

Alleviate HDF5 bottleneck #194

Conversation

clementinboittiaux commented May 25, 2022

sarlinpe commented Jun 3, 2022

clementinboittiaux commented Jun 7, 2022

sarlinpe commented Jun 15, 2022