This week we introduce Hadoop, perhaps the dominant method of big data processing in use today. Originally, Hadoop was develop to enable MapReduce tasks to be quickly executed. While this paradigm is still in place, new paradigms, such as Spark, are also now available, which means Hadoop will continue to exist as an important part of the Big Data landscape for much longer. This week, you will learn about Hadoop, and specifically the Hadoop distributed file system (HDFS), the MapReduce programming paradigm, and the Pig platform, which simplifies the task of writing MapReduce tasks.
- Understand the basic concepts of Cloud Computing.
- Understand the basics of the Hadoop platform
- Understand how to use HDFS
- Understand the MapReduce approach to computing, and how to use Pig to create MapReduce applications.
- Be able to execute MapReduce tasks in a Hadoop environment.
Activities and Assignments | Time Estimate | Deadline* | Points |
---|---|---|---|
Week 12 Introduction Video | 10 Minutes | Tuesday | N/A |
Week 12 Lesson 1: Introduction to Hadoop | 2 Hours | Thursday | 20 |
Week 12 Lesson 2: Introduction to MapReduce | 2 Hours | Thursday | 20 |
Week 12 Lesson 3: Introduction to Pig | 2 Hours | Thursday | 20 |
Week 12 Quiz | 45 Minutes | Friday | 70 |
Week 12 Assignment Submission | 4 Hours | The following Monday | 125 Instructor, 10 Peer |
Week 12 Completion of Peer Review | 2 Hours | The following Saturday | 15 |
Please note that unless otherwise noted, the due time is 6pm Central time!