Skip to content

uproot.dask is turning TBranches of fixed-size C arrays into Dask arrays with shape (num_entries,), rather than (num_entries, fixed_size) #1173

Open
@jpivarski

Description

The issue raised in #1116 is that @Jailbone's test case creates a TTree of double[fixed_size] (one fixed-size array per entry), and this should be read as a 2D NumPy array of shape (num_entries, fixed_size), but uproot.dask is presenting it to Dask as having shape (num_entries,). Then, of course, Dask does wrong things with it.

Reproducer:

import uproot
import numpy as np

with uproot.recreate("test.root") as file:
    file["test_tree"] = {"test_branch": np.random.random((100, 10))}
>>> uproot.open("test.root:test_tree").show()
name                 | typename                 | interpretation                
---------------------+--------------------------+-------------------------------
test_branch          | double[10]               | AsDtype("('>f8', (10,))")
>>> uproot.open("test.root:test_tree/test_branch").array(library="np").shape
(100, 10)

(fixed_size is 10.)

But

>>> lazy = uproot.dask("test.root:test_tree", library="np")["test_branch"]
>>> lazy.shape
(100,)
>>> lazy.compute().shape
(100, 10)

There's only one place where Uproot creates a dask.array; it's here:

return da.core.Array(hlg, name, chunks, dtype=dtype)

Should we set the Dask array shape in chunks, or is that something else? If we know that the TBranch's Interpretation is AsDtype (the only type that can have more than one dimension), we can get the part of the shape beyond the number of entries with inner_shape:

>>> uproot.open("test.root:test_tree/test_branch").interpretation
AsDtype("('>f8', (10,))")
>>> uproot.open("test.root:test_tree/test_branch").interpretation.inner_shape
(10,)

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugThe problem described is something that must be fixed

    Type

    No type

    Projects

    • Status

      Dask and high-level behavior

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions