Apache Spark using Python
-
https://github.com/dgadiraju/itversity-books/tree/master/starterkits/spark/python
-
A quick introduction to the Spark API https://lnkd.in/g8Y3tdhX
-
Overview of Spark - RDD, accumulators, broadcast variable https://lnkd.in/g7fepuFF
-
Spark SQL, Datasets, and DataFrames: https://lnkd.in/g3iZp7zk
-
PySpark - Processing data with Spark in Python https://lnkd.in/gBnh6PAi
-
Processing data with SQL on the command line https://lnkd.in/ggnxDaUu
-
Cluster Overview https://lnkd.in/guCQnJnv
-
Packaging and deploying applications https://lnkd.in/gUZpi2P9
-
Customize Spark via its configuration system https://lnkd.in/gZh8Vkmv
-
Monitoring - Track the behavior of your applications https://lnkd.in/grpGKFuP
-
Best practices to optimize performance and memory use https://lnkd.in/gTRYBDQu