`uproot.dask` is turning TBranches of fixed-size C arrays into Dask arrays with shape `(num_entries,)`, rather than `(num_entries, fixed_size)`

The issue raised in scikit-hep/uproot5#1116 is that @Jailbone's test case creates a TTree of `double[fixed_size]` (one fixed-size array per entry), and this should be read as a 2D NumPy array of shape `(num_entries, fixed_size)`, but `uproot.dask` is presenting it to Dask as having shape `(num_entries,)`. Then, of course, Dask does wrong things with it.

Reproducer:

```python
import uproot
import numpy as np

with uproot.recreate("test.root") as file:
    file["test_tree"] = {"test_branch": np.random.random((100, 10))}
```

```python
>>> uproot.open("test.root:test_tree").show()
name                 | typename                 | interpretation                
---------------------+--------------------------+-------------------------------
test_branch          | double[10]               | AsDtype("('>f8', (10,))")
>>> uproot.open("test.root:test_tree/test_branch").array(library="np").shape
(100, 10)
```

(`fixed_size` is 10.)

But

```python
>>> lazy = uproot.dask("test.root:test_tree", library="np")["test_branch"]
>>> lazy.shape
(100,)
>>> lazy.compute().shape
(100, 10)
```

There's only one place where Uproot creates a `dask.array`; it's here:

https://github.com/scikit-hep/uproot5/blob/724e3775959714274e03b57bd66e850a12508ad2/src/uproot/_dask.py#L459

Should we set the Dask array `shape` in `chunks`, or is that something else? If we know that the TBranch's Interpretation is `AsDtype` (the only type that can have more than one dimension), we can get the part of the shape beyond the number of entries with `inner_shape`:

```python
>>> uproot.open("test.root:test_tree/test_branch").interpretation
AsDtype("('>f8', (10,))")
>>> uproot.open("test.root:test_tree/test_branch").interpretation.inner_shape
(10,)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`uproot.dask` is turning TBranches of fixed-size C arrays into Dask arrays with shape `(num_entries,)`, rather than `(num_entries, fixed_size)` #1173

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

uproot.dask is turning TBranches of fixed-size C arrays into Dask arrays with shape (num_entries,), rather than (num_entries, fixed_size) #1173

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`uproot.dask` is turning TBranches of fixed-size C arrays into Dask arrays with shape `(num_entries,)`, rather than `(num_entries, fixed_size)` #1173