perf: improve performance for read coalescing #1320

nikoladze · 2024-10-23T10:13:25Z

This fixes #1257 - the current read coalescing algorithm has a loop over ranges (to determine Cluster.stop) for every range added to a Cluster leading to a somewhat quadratic scaling. This can be fixed by having a private _stop attribute that is updated in an append method. The downside is that now .append has to be used to append new ranges to a Cluster. Since this is not really a user facing class i think that should be acceptable.

Using the example TTree from #1257

import uproot
import numpy as np

num_baskets = 1_000
num_per_basket = 1_000
num_branches = 50
with uproot.recreate("test.root") as f:
    for i_chunk in range(num_baskets):
        chunk = {f"branch{i}": np.random.rand(num_per_basket) for i in range(num_branches)}
        if "tree" in f:
            f["tree"].extend(chunk)
        else:
            f["tree"] = chunk

To be read via

import uproot

def load(filename):
    with uproot.open({filename: "tree"}) as tree:
        return tree.arrays(library="np")

Before this PR i get on my laptop

In [6]: %timeit -n1 -r2 __ = load("test.root")
1min 19s ± 3.2 s per loop (mean ± std. dev. of 2 runs, 1 loop each)

After the changes i get

In [2]: %timeit __ = load("test.root")
4.62 s ± 90.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

nsmith-

LGTM

nsmith- · 2024-10-23T11:56:07Z

After the change, is the read coalescing algorithm time negligble?

nikoladze · 2024-10-23T12:10:57Z

After the change, is the read coalescing algorithm time negligble?

yes, loop over _merge_adjacent for the 50k ranges in the example now takes in the order of 30ms

jpivarski

It looks good to me, too. As we discussed in person, if stop had previously been accessed every time, this replaces an O(n²) algorithm with an O(n) one, and we always want to do that.

It's also possible that the local-fsspec (default handler) versus uproot.MemmapSource performance difference that I showed today is affected by this. Before we use that as an argument to change the default handler to uproot.MemmapSource, I need to reproduce that plot to see whether the effect entirely goes away. (If it only partially goes away, and uproot.MemmapSource is still better in many-thread systems, then it would still make sense to change the default for local files.)

Since we're all happy with this change, I'll merge it. Thanks!

jpivarski · 2024-10-23T15:43:37Z

Although I'd consider a O(n²) → O(n) fix to be a serious bug-fix, I'm labeling this commit as "perf" for clarity, to indicate that user-visible behavior doesn't change.

make Cluster.stop in coalesce module a managed property

7da3a4a

nsmith- approved these changes Oct 23, 2024

View reviewed changes

jpivarski approved these changes Oct 23, 2024

View reviewed changes

jpivarski changed the title ~~fix: improve performance for read coalescing~~ perf: improve performance for read coalescing Oct 23, 2024

jpivarski merged commit afca74c into scikit-hep:main Oct 23, 2024
28 checks passed

nikoladze mentioned this pull request Oct 29, 2024

Slow loading of file into arrays. #1314

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: improve performance for read coalescing #1320

perf: improve performance for read coalescing #1320

nikoladze commented Oct 23, 2024 •

edited

Loading

nsmith- left a comment

nsmith- commented Oct 23, 2024

nikoladze commented Oct 23, 2024 •

edited

Loading

jpivarski left a comment

jpivarski commented Oct 23, 2024

perf: improve performance for read coalescing #1320

perf: improve performance for read coalescing #1320

Conversation

nikoladze commented Oct 23, 2024 • edited Loading

nsmith- left a comment

Choose a reason for hiding this comment

nsmith- commented Oct 23, 2024

nikoladze commented Oct 23, 2024 • edited Loading

jpivarski left a comment

Choose a reason for hiding this comment

jpivarski commented Oct 23, 2024

nikoladze commented Oct 23, 2024 •

edited

Loading

nikoladze commented Oct 23, 2024 •

edited

Loading