Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #759 Serializing and Deserializing particles #836

Merged
merged 65 commits into from
Jul 21, 2020

Conversation

vrajashkr
Copy link
Contributor

@vrajashkr vrajashkr commented Jun 6, 2020

Related to Issue #759
Extension on to functionality introduced in PR #833

This introduces the from_json method for the Particle, CustomParticle, and DimensionlessParticle classes that converts from a JSON representation to the appropriate particle object.

  • I have added a changelog entry for this pull request (please see
    changelog/README.rst for instructions, and if you need help with picking the PR type, ask!)
  • If adding new functionality, I have added (passing) tests and
    docstrings. (Tests pop up at the bottom, in the checks box).
  • I have fixed any new failing tests (if you're unsure why
    they're failing, ask!).

Suggestions and advice is appreciated :)

Thank you!

vrajashkr added 7 commits June 5, 2020 02:40
Generates JSON representations of Particle, CustomParticle, and DimensionlessParticle
Added Pytest module to test JSON representation of particles
Added to_dict as abstract method in AbstractParticle class, changed to_json to a method of
AbstractParticle class. Changed the format for the JSONification of CustomParticle and
DimensionlessParticle for simplicity. Removed to_dict from classes deriving from AbstractParticle.
Changed to assert version of comparing in pytest and updated the expected output
Introduces a from_json method for each particle class that generates an object of the appropriate particle class
given a JSON string.
Introduces a dict_from_json method that converts a JSON string to a dict.
Introduces pytest modules to test the creation of particle objects from JSON representation
@codecov
Copy link

codecov bot commented Jun 6, 2020

Codecov Report

Merging #836 into master will increase coverage by 0.06%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #836      +/-   ##
==========================================
+ Coverage   96.07%   96.13%   +0.06%     
==========================================
  Files          59       60       +1     
  Lines        5217     5309      +92     
==========================================
+ Hits         5012     5104      +92     
  Misses        205      205              
Impacted Files Coverage Δ
plasmapy/particles/__init__.py 100.00% <100.00%> (ø)
plasmapy/particles/particle_class.py 96.57% <100.00%> (+0.17%) ⬆️
plasmapy/particles/serialization.py 100.00% <100.00%> (ø)
plasmapy/formulary/drifts.py 100.00% <0.00%> (ø)
plasmapy/formulary/__init__.py 100.00% <0.00%> (ø)
plasmapy/formulary/ionization.py 100.00% <0.00%> (ø)
plasmapy/formulary/dimensionless.py 100.00% <0.00%> (ø)
plasmapy/formulary/parameters.py 99.00% <0.00%> (+0.11%) ⬆️
plasmapy/formulary/quantum.py 67.05% <0.00%> (+1.20%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 511c25b...ae02d0a. Read the comment docs.

@vrajashkr
Copy link
Contributor Author

Codecov/patch test isn't looking too good :(
I'd love some advice on how I could improve that score :)

@StanczakDominik
Copy link
Member

Looks like it's mostly exceptions. Look into pytest.raises - it's a context manager that lets you check whether your code does actually raise your expected exception on invalid inputs, and if not, fails the test.

@StanczakDominik StanczakDominik self-requested a review June 6, 2020 09:39
Copy link
Member

@StanczakDominik StanczakDominik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I have about this design is that I have to know in advance the type of Particle I'm going to be restoring from stringified JSON - or I have to manually implement a check for each type. Whereas, since we're storing the class name in the JSON, the dream here would be something like this:

>>> particle_dict = {"type": "Particle", "symbol": "e-"}
>>> plasmapy.particles.from_json(particle_dict)
Particle('e-')

I thought initially about having it be a class method on the abstract base class - an alternative constructor, if you will, but access through plasmapy.particles.AbstractParticle.from_json seems awkward. Having all the subclasses refer to a common method, on the other hand, could lead to plasmapy.particles.DimensionlessParticle.from_json returning CustomParticle, which may be confusing. I don't know what the best design here would be, to be frank.

@vrajashkr
Copy link
Contributor Author

One concern I have about this design is that I have to know in advance the type of Particle I'm going to be restoring from stringified JSON - or I have to manually implement a check for each type. Whereas, since we're storing the class name in the JSON, the dream here would be something like this:

>>> particle_dict = {"type": "Particle", "symbol": "e-"}
>>> plasmapy.particles.from_json(particle_dict)
Particle('e-')

I thought initially about having it be a class method on the abstract base class - an alternative constructor, if you will, but access through plasmapy.particles.AbstractParticle.from_json seems awkward. Having all the subclasses refer to a common method, on the other hand, could lead to plasmapy.particles.DimensionlessParticle.from_json returning CustomParticle, which may be confusing. I don't know what the best design here would be, to be frank.

Potentially, we could make the from_json a method inside plasmapy.particles as you've suggested. This method can check the particle type and call the appropriate particle class's to_json method internally and return the particle object.

This solves the fact that we may not know the particle beforehand as well as developers knowing that they have to implement the method since it is an abstractmethod for AbstractParticle.

What do you think?

@StanczakDominik
Copy link
Member

Sounds good!

@rocco8773
Copy link
Member

Disclaimer: Again, I'm no expert in json and have limited experience with it.

What's the advantage of the current approach over utilizing the JSONEncoder and JSONDecoder classes. As I understand the encoder/decoder framework, an implementation would look something like...

# this does all the translation form Particle to JSON object
class ParticleJSONEncoder(JSONEncoder):
    def default(self, obj):
        # this overrides the JSONEncoder.default method
        # this knows how to convert the Particle class into a JSON format
        return json_particle

# this does all the translation form JSON object to Particle
class ParticleJSONDecoder(JSONDecoder):
    def __init__(self, *args, **kwargs):
        json.JSONDecoder.__init__(self, object_hook=self.object_hook, **kwargs)

    def object_hook(self, dictionary):
        if 'Particle' in dictionary:
            # gather all the relevant info from the dictionary
            return Particle(*args, **kwargs)
        return dictionary

Then a use case would look like...

>>> # to save a Particle
>>> p = Particle("e-")
>>> with open('myparticle.json', 'w') as fp:
...     json.dump(p, fp, cls=ParticleJSONEncoder)
...
...     # or define a member that does the above line
...     p.dump(fp)

>>> # to read a particle
>>> with open('myparticle.json', 'r') as fp:
...     p = json.load(fp, cls=ParticleJSONDecoder)

Now, the loading is a little trickier than the saving. I don't think there should be a member on Particle that loads; just does not make sense to me. We could define a json_load() function in plasmapy/particles/__init__.py.


Now, minus all the effort it takes to figure out the translation (not trivial by any means!), this seems like a logical approach to me. It allows for expand-ability as the Particle class (and it's sisters) evolve while maintaining a constant UI.

Additionally, this encoder and decoder should be designed to work with any subclass of AbstractParticle (i.e. Particle, CustomParticle, etc.).

@vrajashkr
Copy link
Contributor Author

Thank you for your feedback and suggestions @rocco8773 :)

What's the advantage of the current approach over utilizing the JSONEncoder and JSONDecoder classes.

The reasoning I had in mind to use the current method was that in future, if another class inheriting AbstractParticle was to be created, the developer would have to implement the to_json method for it specifically (since the fields needed in JSON could be different for the new type of particle). In that case, they could implement it as method in the class.

For the conversion of JSON to particle, I followed a similar reasoning - the developer will implement the from_json method in the class and then add it to a dictionary in the implementation of plasmapy.particles.from_json which will internally call the appropriate class (using the dictionary mentioned earlier as a hash table of sorts) method to return the appropriate particle object. Something like shown below:

particles = {"Particle":Particle,
                  "CustomParticle": CustomParticle,
                  "DimensionlessParticle": DimensionlessParticle
                }
particles['Particle'].from_json(string)

While I did explain my reasoning, I'd like to say that I definitely agree that the code can be simplified (maintaining the interfaces) and made easier to maintain by using a custom JSONEncoder and JSONDecoder class instead of the current method. I'd definitely like to change the implementation accordingly.

There is one question I have though - is it possible to inform future developers that they need to go to the JSONEncoder and JSONDecoder classes and add the implementation for encode and decode for their new type of Particle? If I am not mistaken, the current approach will throw a NotImplementedError Exception if a developer forgets to add to_json and for_json into the class definition. I have basic OOPS knowledge and don't have much experience with it. Any advice here would be appreciated :)

I'm not sure if I've understood the requirements thoroughly. I've been assuming that every class that inherits from AbstractParticle (i.e every type of particle) will have to have implementations of to_json and from_json (i.e every particle can be represented as JSON and be created from a JSON). If the serialization of a particle can be "opt-in" so to speak, we can probably remove the class methods altogether, thereby keeping the UI consistent for the particles.

Furthermore, I just realised that any subclass of Particle, CustomParticle and DimensionlessParticle may not work with the current approach (the methods of to_json and from_json are already implemented in the super class). This defeats the purpose of the current approach, which will only work for subclasses of the AbstractParticle.

What do you think?

Thank you!

@rocco8773
Copy link
Member

rocco8773 commented Jun 7, 2020

The reasoning I had in mind to use the current method was that in future, if another class inheriting AbstractParticle was to be created, the developer would have to implement the to_json method for it specifically (since the fields needed in JSON could be different for the new type of particle). In that case, they could implement it as method in the class.

This is a good point. Unfortunately all the particle classes do not share the same __init__ so it's difficult to make something that is completely universal. It will also be significantly more difficult to generalize loading versus dumping.

I think there could be a hybrid approach to this. Something like...

class AbstraceParticle(ABC):

    @property
    def json_dict(self):
        # this has default values for all particle types but can be overwritten by subclasses
        json_particle = {
            "type": type(self).__name__,
            "module": self.__module__,
            "mass": p.mass.__str__(),
            "charge: p.charge.__str__(),
        }
        return json_particle

    def json_dump(self, fp):
        json.dump({'plasmapy_particle": self.json_particle})
        # this is where we could apply a JSONEncoder, but if we've already
        # generated a json friendly dictionary then I don't see the point

class Particle(AbstractParticle):
    @property
    def json_dict(self):
        json_particle = super().json_dict
        json_particle["symbol"] = self.particle
        json_particle["init_params"] = {
            "argument": self.particle,
        }
     return json_particle

So, that would handle the saving. Loading would be more difficult and the JSONDecoder might be handy.

class ParticleJSONDecoder(JSONDecoder):
    _particle_classes = {
        "CustomParticle": CustomParticle,
        "DimensionlessParticle": DimentionlessParticle,
        "Particle": Particle,
    }

    def object_hook(self, dictionary):
        if 'plasmapy_particle' in dictionary:
            json_particle = dictionary['plasmapy_particle']
            cls_name = json_particle["type"]
            cls = _particle_classes[cls_name]

            if 'init_params' in json_particle:
                # this would not work as cleanly if init parameters 
                # were custom objects (i.e. not a default decoding)
                params = json_particle["init_params"]
                return cls(**params)
            else:
                p = cls()
                p.mass = u.Quantity(json_particle['mass'])
                p.charge = u.Quantity(json_particle['charge'])
                return p

        return dictionary

# In plasmapy/particles/__init__.py define
def json_load_particle(fp):
    return json.load(fp, cls=ParticleJSONDecoder)

Sorry for throwing a bunch of code your way. It's just easier to write it out than try to explain it in a paragraph.

In this approach, any subclassed particle of AbstractParticle should be JSON serializable. If a programmer wants to change the serialized values, then he or she would only need to override the json_particle property.

@rocco8773
Copy link
Member

I'm not sure if I've understood the requirements thoroughly. I've been assuming that every class that inherits from AbstractParticle (i.e every type of particle) will have to have implementations of to_json and from_json (i.e every particle can be represented as JSON and be created from a JSON).

Yes, every subclass will inherit the members defined in its parent class.

@vrajashkr
Copy link
Contributor Author

I think there could be a hybrid approach to this. Something like...

class AbstraceParticle(ABC):

    @property
    def json_dict(self):
        # this has default values for all particle types but can be overwritten by subclasses
        json_particle = {
            "type": type(self).__name__,
            "module": self.__module__,
            "mass": p.mass.__str__(),
            "charge: p.charge.__str__(),
        }
        return json_particle

    def json_dump(self, fp):
        json.dump({'plasmapy_particle": self.json_particle})
        # this is where we could apply a JSONEncoder, but if we've already
        # generated a json friendly dictionary then I don't see the point

class Particle(AbstractParticle):
    @property
    def json_particle(self):
        json_particle = super().json_particle
        json_particle["symbol"] = self.particle
        json_particle["init_params"] = {
            "argument": self.particle,
        }
     return json_particle

Thank you for the suggestion! This looks like a really good approach. I shall adopt this technique accordingly.

Sorry for throwing a bunch of code your way. It's just easier to write it out than try to explain it in a paragraph.

No issues at all. Infact, it helps a lot. As developers, speaking in code is often more efficient that putting it out in a natural language xD

So, that would handle the saving. Loading would be more difficult and the JSONDecoder might be handy.

I agree, the JSONDecoder approach seems to be the best for converting from JSON to particle. It is elegant and easier to maintain.

Thank you for all the suggestions!

vrajashkr added 3 commits June 8, 2020 02:11
…and Updated JSON representation

Changes approach in particle_class.py to use a property called particle_json
in the AbstractParticle class, updated JSON representation to make it easier
for future conversion into particle objects.
Updates test_particle_class.py with new expected output for JSON
@rocco8773
Copy link
Member

I look forward to seeing how this turns out! Thank you for putting in the time to tackle this. I probably wouldn't have thought about it until someone brought it up and I'm learning the new stuff now. :)

I did make a typo in my example code, which I have fixed. The overriding property in Particle should have been json_dict not json_particle.

vrajashkr added 3 commits June 8, 2020 03:50
Added new class ParticleJSONDecoder, delete old implementation of from_json in
particle_class.py. Added from_json method in plasmapy/particles/__init__.py.
Updated test cases and module in test_particle_class.py
@pep8speaks
Copy link

pep8speaks commented Jun 7, 2020

Hello @darkaether! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 113:91: E501 line too long (179 > 90 characters)
Line 116:91: E501 line too long (205 > 90 characters)
Line 119:91: E501 line too long (206 > 90 characters)
Line 662:91: E501 line too long (154 > 90 characters)
Line 665:91: E501 line too long (154 > 90 characters)
Line 1821:91: E501 line too long (182 > 90 characters)
Line 1824:91: E501 line too long (181 > 90 characters)
Line 1948:91: E501 line too long (180 > 90 characters)
Line 1951:91: E501 line too long (183 > 90 characters)

Line 1007:91: E501 line too long (191 > 90 characters)
Line 1012:91: E501 line too long (191 > 90 characters)
Line 1017:91: E501 line too long (190 > 90 characters)
Line 1022:91: E501 line too long (190 > 90 characters)
Line 1051:91: E501 line too long (164 > 90 characters)
Line 1056:91: E501 line too long (164 > 90 characters)
Line 1077:91: E501 line too long (164 > 90 characters)
Line 1082:91: E501 line too long (190 > 90 characters)
Line 1087:91: E501 line too long (191 > 90 characters)

Comment last updated at 2020-06-09 20:28:48 UTC

@vrajashkr vrajashkr changed the title Issue #759 Converting a JSON representation of particles back to the appropriate particle object. Issue #759 Serializing and Deserializing particles Jun 8, 2020
Merging Particle serialization into branch containing deserialization
Changed `to_json` method to `json_dumps`
Introduced `json_dump` method to write to file
Added test for writing particle JSON to file and loading particle
from a file
@vrajashkr
Copy link
Contributor Author

vrajashkr commented Jul 12, 2020

Could I get help with the test-documentation check? Couldn't find any details on why it is failing. Thank you.

@rocco8773
Copy link
Member

The errors are mostly likely due to you using reST style docstrings, while we use numpy style docstrings. (styles of docstrings). You'll need to edit all your docstrings to match the numpydoc format.

@vrajashkr
Copy link
Contributor Author

The errors are mostly likely due to you using reST style docstrings, while we use numpy style docstrings. (styles of docstrings). You'll need to edit all your docstrings to match the numpydoc format.

Interesting. I wasn't aware that there were so many styles out there. I'll make the change to numpy style. Thank you for the information!

@vrajashkr
Copy link
Contributor Author

Any further comments and feedback are welcome. Thank you!

Copy link
Member

@StanczakDominik StanczakDominik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I'm sorry I've been away from this PR for a while. I'm not quite sure what I missed, but it's looking great now! I do have a couple of optional suggestions that we could think about a bit, but overall, LGTM!

Comment on lines 163 to 189
@staticmethod
def particle_hook(json_dict):
"""
An `object_hook` utilized by the `json` deserialization processes to decode
json strings into a `plasmapy` particle class (`AbstractParticle`,
`CustomParticle`, `DimensionlessParticle`, `Particle`).
"""
particle_types = {
"AbstractParticle": AbstractParticle,
"CustomParticle": CustomParticle,
"DimensionlessParticle": DimensionlessParticle,
"Particle": Particle,
}
if "plasmapy_particle" in json_dict:
try:
pardict = json_dict["plasmapy_particle"]
partype = pardict["type"]
args = pardict["__init__"]["args"]
kwargs = pardict["__init__"]["kwargs"]
particle = particle_types[partype](*args, **kwargs)
return particle
except KeyError:
raise InvalidElementError(
f"json file does not define a valid plasmapy particle"
)
else:
return json_dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could extract this whole staticmethod to be a function, and then we get a programmatic way of converting a dictionary into a Particle for free! Then we could simply have this method be a thin wrapper on top of that function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated this myself. I had if left like this because it's consistent with the JSONDecoder framework. We could drop the JOSNDeocder for something more like...

def particle_hook(....):
    ...

with open('particle.json', 'r') as fp:
    par = json.load(fp, object_hook=particle_hook, ...)

I could go either way on this, but I think I slightly prefer the JSONDecoder framework. In the end, I say we leave the PR as it is and, if the need arises to convert dictionaries into particles, then we can modify with a separate PR.

plasmapy/particles/__init__.py Outdated Show resolved Hide resolved
plasmapy/particles/__init__.py Outdated Show resolved Hide resolved
plasmapy/particles/__init__.py Outdated Show resolved Hide resolved
plasmapy/particles/__init__.py Outdated Show resolved Hide resolved
plasmapy/particles/__init__.py Outdated Show resolved Hide resolved
__init__.py Removed function definitions and moved to new file
particle_json_utils.py New file with json utility functions
@vrajashkr
Copy link
Contributor Author

Could I ask for feedback regarding the docstrings for the ParticleJSONDecoder class and the json_load_particle and json_loads_particle functions?

Thank you!

Copy link
Member

@rocco8773 rocco8773 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a review of the new serialization.py module and any relevant parts.

plasmapy/particles/serialization.py Outdated Show resolved Hide resolved
plasmapy/particles/serialization.py Show resolved Hide resolved
plasmapy/particles/serialization.py Outdated Show resolved Hide resolved
plasmapy/particles/__init__.py Outdated Show resolved Hide resolved
plasmapy/particles/__init__.py Outdated Show resolved Hide resolved
plasmapy/particles/particle_class.py Outdated Show resolved Hide resolved
Copy link
Member

@rocco8773 rocco8773 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 99% there!

You should also add the :inherited-members: option to the the automodapi directive for plasmapy.particles. This would be done here in the ./docs/particles/index.rst file. Adding this option allows the methods defined in AbstractParticles (i.e. json_dump and json_dups) to show up on the Particle, CustomParticle, and DimensionlessParticle documentation.

plasmapy/particles/__init__.py Outdated Show resolved Hide resolved
plasmapy/particles/particle_class.py Outdated Show resolved Hide resolved
plasmapy/particles/particle_class.py Outdated Show resolved Hide resolved
plasmapy/particles/particle_class.py Outdated Show resolved Hide resolved
changelog/836.feature.rst Outdated Show resolved Hide resolved
Copy link
Member

@rocco8773 rocco8773 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that does it for me then. @darkaether Nice work! Thank you for hanging in there. This turned out being longer than expected.

@StanczakDominik Do you want to give this another look over before merging?

@vrajashkr
Copy link
Contributor Author

I think that does it for me then. @darkaether Nice work! Thank you for hanging in there. This turned out being longer than expected.

@rocco8773 Thank you for taking the time to make suggestions! :)

@StanczakDominik Thank you for all your help getting me started with the PR and guiding me through the CI checks. :)

I'm looking forward to making more contributions to this project wherever I can. :)

@StanczakDominik
Copy link
Member

Thanks for doing this @darkaether :)

If you believe it's fine @rocco8773 it's fine for me as well! :)

@StanczakDominik StanczakDominik merged commit 6d20137 into PlasmaPy:master Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plasmapy.particles Related to the plasmapy.particles subpackage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants