Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve performance for read coalescing #1320

Merged
merged 1 commit into from
Oct 23, 2024

Conversation

nikoladze
Copy link
Contributor

@nikoladze nikoladze commented Oct 23, 2024

This fixes #1257 - the current read coalescing algorithm has a loop over ranges (to determine Cluster.stop) for every range added to a Cluster leading to a somewhat quadratic scaling. This can be fixed by having a private _stop attribute that is updated in an append method. The downside is that now .append has to be used to append new ranges to a Cluster. Since this is not really a user facing class i think that should be acceptable.

Using the example TTree from #1257

import uproot
import numpy as np

num_baskets = 1_000
num_per_basket = 1_000
num_branches = 50
with uproot.recreate("test.root") as f:
    for i_chunk in range(num_baskets):
        chunk = {f"branch{i}": np.random.rand(num_per_basket) for i in range(num_branches)}
        if "tree" in f:
            f["tree"].extend(chunk)
        else:
            f["tree"] = chunk

To be read via

import uproot

def load(filename):
    with uproot.open({filename: "tree"}) as tree:
        return tree.arrays(library="np")

Before this PR i get on my laptop

In [6]: %timeit -n1 -r2 __ = load("test.root")
1min 19s ± 3.2 s per loop (mean ± std. dev. of 2 runs, 1 loop each)

After the changes i get

In [2]: %timeit __ = load("test.root")
4.62 s ± 90.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Copy link
Collaborator

@nsmith- nsmith- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nsmith-
Copy link
Collaborator

nsmith- commented Oct 23, 2024

After the change, is the read coalescing algorithm time negligble?

@nikoladze
Copy link
Contributor Author

nikoladze commented Oct 23, 2024

After the change, is the read coalescing algorithm time negligble?

yes, loop over _merge_adjacent for the 50k ranges in the example now takes in the order of 30ms

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me, too. As we discussed in person, if stop had previously been accessed every time, this replaces an O(n²) algorithm with an O(n) one, and we always want to do that.

It's also possible that the local-fsspec (default handler) versus uproot.MemmapSource performance difference that I showed today is affected by this. Before we use that as an argument to change the default handler to uproot.MemmapSource, I need to reproduce that plot to see whether the effect entirely goes away. (If it only partially goes away, and uproot.MemmapSource is still better in many-thread systems, then it would still make sense to change the default for local files.)

Since we're all happy with this change, I'll merge it. Thanks!

@jpivarski jpivarski changed the title fix: improve performance for read coalescing perf: improve performance for read coalescing Oct 23, 2024
@jpivarski
Copy link
Member

Although I'd consider a O(n²)O(n) fix to be a serious bug-fix, I'm labeling this commit as "perf" for clarity, to indicate that user-visible behavior doesn't change.

@jpivarski jpivarski merged commit afca74c into scikit-hep:main Oct 23, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

uproot4 to uproot5 TTree to DataFrame arrays slower
3 participants