Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation. Apache Spark is a super-fast unified analytics software for large-scale data processing; includes big data and machine learning.
This repository contains a collection of my projects while studying in the Big Data & Data Mining course in college. In my final exam, I created a project to classify air quality in London using the Naive Bayes algorithm and a dataset derived from https://datahub.io/core/london-air-quality.