Skip to content

Generating references without kerchunk #78

@TomNicholas

Description

@TomNicholas

VirtualiZarr + zarr chunk manifests re-implement so much of kerchunk that the only part left is kerchunk's backends - the part that actually generates the byte ranges from a given legacy file. It's interesting to imagine whether we could make virtualizarr work without using kerchunk or fsspec at all.

https://github.com/TomNicholas/VirtualiZarr/issues/61#issuecomment-2047826810 discusses how the rust object-store crate might allow us to read actual bytes from zarr v3 stores with chunk manifests over S3, without using fsspec.

The other place we use fsspec (+ kerchunk) is to generate the references in the first place. But can we imagine alternative implementations for generating that byte range information?

Arguments for doing this without using kerchunk + fsspec are essentially:

  • increased reliability
  • clearer interfaces
  • not using an overly complex tool (i.e. fsspec, which can read from all sorts of systems) to read from just two places (local or S3)
  • possible performance increases during reference generation (though this is unlikely to be a major bottleneck)
  • separation of concerns - if we can find other libraries that generate the byte range information already for their own purposes (e.g. h5py using the ros3 driver, hidefix, or cog3pio), we might be able to avoid bearing that maintainance burden, which would be great.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions