Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easier Output Saving for Proton Radiography #1134

Merged
merged 78 commits into from
Oct 4, 2021

Conversation

pheuer
Copy link
Member

@pheuer pheuer commented May 8, 2021

Large runs with SyntheticProtonRadiograph can take a while to run, so it's often necessary to save the output for later analysis. By design, the way to do this is to save the particle positions in the detector plane, so different histograms can be made from the same dataset. So far, I personally have been doing this by just pickling the entire SyntheticProtonRadiograph object. However, I've recently realized that pickling complex objects isn't great practice for long-term storage.

This PR includes the following changes

  • The module proton_radiography is now renamed charged_particle_radiography

  • The class SyntheticProtonRadiograph is now renamed Tracker, as in

from plasmapy.diagnostics import charged_particle_radiography as cpr
sim = cpr.Tracker()
  • Tracker now stores the particle positions and some metadata in a dict structure after running.

  • the save_result() method of Tracker can save this dictonary to a numpy npz file.

  • the synthetic_radiograph function is no longer a method of Tracker but instead a stand-alone function in charged_particle_radiography.py. This function takes either a Tracker object or an output dictionary (eg. like one loaded from a npz file), then creates the synthetic radiograph in the same way as before.

TODO

  • Update documentation notebook to show how to save the npz file? This is tricky because we can't actually save it in a ipynb?
  • Add an option (maybe new default option) to save in the open metadata PRADFORMAT. This will be delayed for a future PR

@pheuer pheuer requested a review from rocco8773 as a code owner May 8, 2021 14:34
@github-actions github-actions bot added the plasmapy.diagnostics Related to the plasmapy.diagnostics subpackage label May 8, 2021
@codecov
Copy link

codecov bot commented May 8, 2021

Codecov Report

Merging #1134 (c548e55) into main (d71d84b) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1134      +/-   ##
==========================================
+ Coverage   97.07%   97.09%   +0.02%     
==========================================
  Files          73       73              
  Lines        7106     7170      +64     
==========================================
+ Hits         6898     6962      +64     
  Misses        208      208              
Impacted Files Coverage Δ
...asmapy/diagnostics/charged_particle_radiography.py 100.00% <100.00%> (ø)
plasmapy/conftest.py
conftest.py 100.00% <0.00%> (ø)
plasmapy/particles/ionization_state.py 94.26% <0.00%> (+0.08%) ⬆️
plasmapy/particles/ionization_state_collection.py 91.51% <0.00%> (+0.41%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d71d84b...c548e55. Read the comment docs.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added notebooks Related to example Jupyter notebooks in docs/examples/ Breaking labels May 8, 2021
@StanczakDominik
Copy link
Member

I've written tests for most of this, but is there a way I can test writing and reading a file in the CI tests? Is there a way I can demonstrate this capability in the example notebooks? Anything wrong with just saving to os.getcwd()?

You could do:

result.save(filename)
loaded_result = load(filename)
assert result == loaded_result

Use https://docs.pytest.org/en/stable/tmpdir.html for the filename.

We could also add a sample dataset as a regression test. It looks like our tests' data files currently live in plasmapy/particles/data/test, which is pretty meh (e.g. plasmapy/plasma/sources tests depend on it) but we can refactor that later.

@pheuer
Copy link
Member Author

pheuer commented May 8, 2021

Great, thanks! I added a test using tmp_path.

Anything similar I can do in a jupyter notebook?

@namurphy
Copy link
Member

Thank you for doing this! These are very useful capabilities to add. Do you think it would also be worth having the option to save and load data in HDF5 format? I'm wondering about that since HDF5 is probably the best format to use for long-term archiving of data sets on sites like Zenodo.

@pheuer
Copy link
Member Author

pheuer commented May 10, 2021

Sure! h5py and HDF5 are my go-to formats, so I'd could easily add HDF5 as an option. That said, directly reading out a dict that can be directly passed into synthetic_radiograph is quick and convenient, which is why I originally went with npz here.

@pheuer
Copy link
Member Author

pheuer commented May 10, 2021

@namurphy There is a nascent effort to introduce a community input/output standard for proton radiograph led by @phyzicist. The formats are all based on HDF5. One of the proposed formats for ParticleLists could be used for this kind of output. Is there a way we could include this as an optional dependency to allow the PlasmaPy proton radiography particle tracer to read and write these standards?

@StanczakDominik
Copy link
Member

StanczakDominik commented May 11, 2021

@StanczakDominik
Copy link
Member

https://github.com/picmi-standard/picmi may also be relevant-ish

@pheuer
Copy link
Member Author

pheuer commented May 12, 2021

After looking into openPMD, I want to try and rework the pradformat spec to make it consistent. I'll leave HDF saving out of this PR, but I intend to add that once we've made changes to the pradformat spec.

@StanczakDominik
Copy link
Member

Getting to the review in a few minutes!

pre-commit.ci autofix

Copy link
Member

@rocco8773 rocco8773 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refinement of test_saving_output.

@pheuer
Copy link
Member Author

pheuer commented Oct 4, 2021

Ok, I made those changes. I think that, if we're at the point of re-naming variables in the test for clarity, it's time to merge this PR and move on to other work.

@namurphy
Copy link
Member

namurphy commented Oct 4, 2021

You have my continued thanks, @pheuer!

@rocco8773
Copy link
Member

I just have to make sure all of the cross-ref repairs to the change log entries work and then I'll merge it.

@rocco8773 rocco8773 enabled auto-merge (squash) October 4, 2021 22:30
@rocco8773 rocco8773 disabled auto-merge October 4, 2021 23:27
@rocco8773 rocco8773 dismissed namurphy’s stale review October 4, 2021 23:29

fixed and no longer applicable

@rocco8773 rocco8773 merged commit 189dc2b into PlasmaPy:main Oct 4, 2021
@pheuer pheuer deleted the easier_saving_prad branch October 5, 2021 00:23
Tlord18 pushed a commit to Tlord18/PlasmaPy that referenced this pull request Oct 6, 2021
Tlord18 pushed a commit to Tlord18/PlasmaPy that referenced this pull request Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notebooks Related to example Jupyter notebooks in docs/examples/ plasmapy.diagnostics Related to the plasmapy.diagnostics subpackage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants