-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add known forth for ATLAS #1282
feat: Add known forth for ATLAS #1282
Conversation
@nikoladze is this ready for review? Or were there additional things that you needed to add to it? |
I think i'm getting there. Before merging i also want to write a bit more info into the docstrings, e.g. how to add further stuff in the future. Current status:
Some problems that this PR should fix (for
|
src/uproot/interpretation/objects.py
Outdated
@@ -122,6 +130,13 @@ def awkward_form( | |||
tobject_header=False, | |||
breadcrumbs=(), | |||
): | |||
awkward = uproot.extras.awkward() | |||
if self._form is not None: # TODO: is this really fine? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpivarski i'm a bit uncomfortable with the piece of code i added here. The _form
attribute already existed before, but was never directly returned here. The tests seem to pass, but do you know if there are potential problems that may be caused by directly returning _form
if it is not None?
One thing that i had to do (and also don't quite feel comfortable with) was to convert the form from a dict representation which seems to happen sometimes (but i'm a bit unsure where and why not always).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still trying to understand your first paragraph, but about the second paragraph: there are three ways that a Form can be represented,
- as a
ak.forms.Form
object, which is immutable and therefore needs to be entirely known or at least built from the leaves up to the root - as a Python dict (JSON represented as dicts and lists), which isn't type-safe, but it can be modified in place
- as a JSON string, which doesn't even ensure that pairs of square brackets are closed.
Thus, they have decreasing levels of safety and we prefer the safer ones when possible. The problem is that the process of discovering the type involves starting at the root and walking down toward the leaves, filling in a ListOffsetArray's content if we see a non-empty example of it. So the ideal, ak.forms.Form
, isn't possible and the Form starts its life as a Python dict. When it gets converted into an object, that's final. The Forth is fully discovered at that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i now realized that the self._form
attribute will always be in the Python dict form (either from generated or known forth code). So at the place where it is returned in AsObjects.awkward_form(...)
it has to be converted to the final ak.forms.Form
object. In principle the form from the known forth path could have been provided directly as ak.forms.Form
, but to be consistent with the one created from the ForthGenerator it is now also a dict.
@nikoladze, can I "Update branch" to see if this passes tests? There are some good things in this PR and I'd like to merge it. When you marked it "Ready for Review," do you mean that there are no more features planned, only getting the tests to pass? |
Co-authored-by: Jim Pivarski <[email protected]>
…entLink and sort keys
13c7ffe
to
bfca8f6
Compare
i did that now - looks like they do.
I'm not planning to add further stuff right now. Many of the issues with awkward forth and ElementLinks have been fixed in other PRs, but this PR will bring the benefit of solving two more edge cases, the first can be seen in this example: import uproot
import skhep_testdata
tree = uproot.open({skhep_testdata.data_path("uproot-issue-123a.root"): "CollectionTree"})
tree["PrimaryVerticesAuxDyn.neutralParticleLinks"].arrays() Uproot can generate a model for this branch
But because of the "Unknown" it could, before this PR, only be read with Now i realize i summarized the 2 edge cases already in the comment above, so i'm just quoting myself for the second edge case that should be solved by this PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is great—I will merge it and it will be included in the next Uproot patch release. If you need to add more known Forth cases in the future, it can be a new PR.
Thank you @nikoladze and @jpivarski! |
Since edge cases still keep popping up for types in PHYSLITE that actually have always the same interpretation (most prominently
vector<vector<ElementLink<...>>>
we were discussing to include a mechanism to pre-define known forth code for special cases.This PR attempts to add such a mechanism and define a special treatment for
vector<vector<ElementLink<...>>>
in ATLAS PHYSLITE files. One needs to define 2 things:form_key
These 2 properties are picked up in the constructor of the generic
AsObjects
interpretation, ensuring that no forth code needs to be generated (and therefore also no python loop through any of the basket data) when the array is read withlibrary="ak"
(default).In the current state it seems to be working, but at the moment breaks a lot of tests, mainly because the hardcoded form now does not match anymore excactly what it was before. Looking into how to resolve this.
tagging @matthewfeickert, @alexander-held