-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Milestone
Description
We observed that high dimensional datasets are much slower to read when they are virtual (versioned) Datasets:
In [12]: shape = (19, 36, 26, 1)
In [14]: a = np.random.rand(*shape)
...: with TempDirCtx() as d:
...: with h5py.File(d / 'foo.h5', 'w') as f:
...: vf = VersionedHDF5File(f)
...: with vf.stage_version('v0') as sv:
...: sv.create_dataset('bar', data=a, chunks=a.shape)
...: with h5py.File(d / 'foo.h5', 'r') as f:
...: vf = VersionedHDF5File(f)
...: cv = vf[vf.current_version]
...: bar = cv['bar']
...: %time _ = [bar[:] for _ in range(1000)]
...:
CPU times: user 2.95 s, sys: 61.8 ms, total: 3.01 s
Wall time: 3.01 s
In [15]: a = np.random.rand(*shape)
...: with TempDirCtx() as d:
...: with h5py.File(d / 'foo.h5', 'w') as f:
...: f.create_dataset('bar', data=a, chunks=a.shape)
...: with h5py.File(d / 'foo.h5', 'r') as f:
...: bar = f['bar']
...: %time _ = [bar[:] for _ in range(1000)]
...:
CPU times: user 37.3 ms, sys: 60.2 ms, total: 97.5 ms
Wall time: 97.2 ms
A little bit of profiling points to H5S__hyper_project_intersection being an expensive function:
Function Stack CPU Time: Total CPU Time: Self Module Function (Full) Source File Start Address
__pyx_f_4h5py_4defs_H5Dread 67.0% 0s defs.cpython-39-x86_64-linux-gnu.so __pyx_f_4h5py_4defs_H5Dread defs.c 0x1b810
H5Dread 67.0% 0s libhdf5.so.103 H5Dread H5Dio.c 0xd8934
H5D__read 67.0% 0s libhdf5.so.103 H5D__read H5Dio.c 0xd7e57
H5D__virtual_read 65.2% 0s libhdf5.so.103 H5D__virtual_read H5Dvirtual.c 0xe3e9b
H5D__virtual_read_one 34.3% 0s libhdf5.so.103 H5D__virtual_read_one H5Dvirtual.c 0xe1c64
H5S_select_project_intersection 31.9% 0s libhdf5.so.103 H5S_select_project_intersection H5Sselect.c 0x252426
H5S__hyper_project_intersection 31.9% 0.740s libhdf5.so.103 H5S__hyper_project_intersection H5Shyper.c 0x248352
H5S__hyper_append_span 10.4% 0.090s libhdf5.so.103 H5S__hyper_append_span H5Shyper.c 0x2421cd
H5S__hyper_new_span 3.0% 0.080s libhdf5.so.103 H5S__hyper_new_span H5Shyper.c 0x2418d7
H5FL_reg_calloc 2.8% 0.030s libhdf5.so.103 H5FL_reg_calloc H5FL.c 0x13d99a
H5S__hyper_cmp_spans 2.6% 0.070s libhdf5.so.103 H5S__hyper_cmp_spans H5Shyper.c 0x23e61f
H5S__hyper_free_span_info 5.4% 0.080s libhdf5.so.103 H5S__hyper_free_span_info H5Shyper.c 0x240d4f
H5S__hyper_free_span 2.4% 0.020s libhdf5.so.103 H5S__hyper_free_span H5Shyper.c 0x240cb4
H5FL_reg_free 1.3% 0.060s libhdf5.so.103 H5FL_reg_free H5FL.c 0x13cca1
H5D__read 2.4% 0s libhdf5.so.103 H5D__read H5Dio.c 0xd7e57
H5D__virtual_pre_io 30.9% 0s libhdf5.so.103 H5D__virtual_pre_io H5Dvirtual.c 0xe23d8
H5S_select_project_intersection 30.9% 0s libhdf5.so.103 H5S_select_project_intersection H5Sselect.c 0x252426
H5S__hyper_project_intersection 30.9% 0.630s libhdf5.so.103 H5S__hyper_project_intersection H5Shyper.c 0x248352
H5S__hyper_append_span 11.7% 0.270s libhdf5.so.103 H5S__hyper_append_span H5Shyper.c 0x2421cd
H5FL_reg_calloc 2.4% 0.060s libhdf5.so.103 H5FL_reg_calloc H5FL.c 0x13d99a
H5S__hyper_new_span 2.0% 0.050s libhdf5.so.103 H5S__hyper_new_span H5Shyper.c 0x2418d7
H5S__hyper_cmp_spans 1.5% 0.070s libhdf5.so.103 H5S__hyper_cmp_spans H5Shyper.c 0x23e61f
H5S__hyper_free_span_info 5.4% 0.141s libhdf5.so.103 H5S__hyper_free_span_info H5Shyper.c 0x240d4f
H5S__hyper_free_span 1.3% 0s libhdf5.so.103 H5S__hyper_free_span H5Shyper.c 0x240cb4
H5FL_reg_free 0.7% 0.030s libhdf5.so.103 H5FL_reg_free H5FL.c 0x13cca1
func@0x44aa0 0.4% 0.020s libhdf5.so.103 func@0x44aa0 [Unknown] 0x44aa0
H5D__chunk_read 1.7% 0s libhdf5.so.103 H5D__chunk_read H5Dchunk.c 0xb782a
Is it possible to speed up this function?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels