This tutorial here is aimed to climate scientists who work with data in netcdf files. The aim of the tutorial is to show that you can use python to carry out your analysis in a way that:
- scales better to large datasets
- is easier to write
- and is easier to read.
The installation instructions are here. The four sessions are on Pandas, Dask, Numba and Xarray. The names of the notebooks indicate the order of the sessions e.g. pandas_1.ipynb is the first session. There are also a few datasets that you play around with in the sessions. Feel free to substitute your own data instead!
If you have used matlab or R or whatever before you should be able to follow the material. The materials are written in interactive jupyter notebooks and so you can just play around with the existing code. There are links throughout to other material that I've found helpful for learning the basics.
At the end of the tutorial you will not be an expert programmer. Instead the point is to alert you to the fact that packages exist that provide good solutions for many of the problems you face in your own work. Using these packages will allow you to spend more time thinking about the scientific problems that you really want to focus on and less time thinking about which loop index corresponds to which dimension of the array.
Finally, remember that these packages are open source and have all been written by a community of people like you who started out making baby steps themselves. In time, if you come across a bug in the software take that as a chance to get involved and contribute back to the community. To learn more about open source software for climate analysis in particular, see the Pangeo project.