create_virtual_dataset is slow

Writing virtual datasets seems to be pretty slow because of the calls to `deepcopy` in `VirtualSource.__getitem__`:
```
In [26]: %%time
    ...: with TempDirCtx() as d:
    ...:     with h5py.File(d / 'foo.h5', 'w') as f:
    ...:         a = np.random.rand(1, 36, 26, 19)
    ...:         f.create_dataset('bar', data=a, chunks=a.shape, maxshape=(None, None, None, None))
    ...:     for i in range(1, 101):
    ...:         with h5py.File(d / 'foo.h5', 'r+') as f:
    ...:             f['bar'].resize((i + 1, 36, 26, 19))
    ...:             a = np.random.rand(1, 36, 26, 19)
    ...:             f['bar'][i, :, :, :] = a
    ...:
    ...:
CPU times: user 129 ms, sys: 8.01 ms, total: 137 ms
Wall time: 137 ms

In [27]: %%time
    ...: with TempDirCtx() as d:
    ...:     with h5py.File(d / 'foo.h5', 'w') as f:
    ...:         vf = VersionedHDF5File(f)
    ...:         with vf.stage_version('v0') as sv:
    ...:             a = np.random.rand(1, 36, 26, 19)
    ...:             sv.create_dataset('bar', data=a, chunks=a.shape, maxshape=(None, None, None, None))
    ...:     for i in range(1, 101):
    ...:         with h5py.File(d / 'foo.h5', 'r+') as f:
    ...:             vf = VersionedHDF5File(f)
    ...:             with vf.stage_version('v{i}'.format(i=i)) as sv:
    ...:                 sv['bar'].resize((i + 1, 36, 26, 19))
    ...:                 a = np.random.rand(1, 36, 26, 19)
    ...:                 sv['bar'][[i], ...] = a
    ...:
    ...:
    ...:
CPU times: user 2.65 s, sys: 49.3 ms, total: 2.7 s
Wall time: 2.7 s
```
Looking at the code it seems that there was some performance optimization there which was broken by `h5py` version 3.3:
https://github.com/h5py/h5py/pull/1905
Is it possible to work around this performance degradation?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create_virtual_dataset is slow #226

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

create_virtual_dataset is slow #226

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions