Thursday 9 a.m.–12:20 p.m.
Introduction to Spark with python
Orlando Karam
- Audience level:
- Intermediate
- Category:
- Other
Description
In this tutorial we will cover the basics of writing spark programs in python (initially from the pyspark shell, later with independent applications). We will also discuss some of the theory behind spark, and some performance considerations when using spark in a cluster.
Abstract
Spark is a distributed computing (big data) framework, considered by many as the successor to Hadoop. You can write Spark programs in Java, Scala or Python. Spark uses a functional approach, similar to Hadoop’s Map-Reduce.
In this tutorial we will cover the basics of writing spark programs in python (initially from the pyspark shell, later with independent applications). We will also discuss some of the theory behind spark, and some performance considerations when using spark in a cluster.
Student Handout
No handouts have been provided yet for this tutorial