PySpark-Tutorial provides basic algorithms using PySpark
-
Updated
Jan 20, 2023 - Jupyter Notebook
PySpark-Tutorial provides basic algorithms using PySpark
Plain Stock Close-Price Prediction via Graves LSTM RNNs
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
This repository contains Spark, MLlib, PySpark and Dataframes projects
Various data stream/batch process demo with Apache Scala Spark 🚀
Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster
Apache Spark Basics - Java Examples
A library having Java and Scala examples for Spark 2.x
Spark BigQuery Parallel
This project utilizes PySpark DataFrames and PySpark RDD to implement item-based collaborative filtering. By calculating cosine similarity scores or identifying movies with the highest number of shared viewers, the system recommends 10 similar movies for a given target movie that aligns users’ preferences.
Big Data - Split a large CSV file into N smaller ones and save them into the local disk
BCG GAMMA CASE STUDY
Data Science and Engineering project - Programming for Big Data @ Simon Fraser University (SFU)
Use this project to join data from multiple csv files. Currently in this project we support one to one and one to many join. Along with this you can find how to use kafka producer efficiently with spark.
Repository for Spark structured streaming use case implementations.
This is our final project for SFU's CMPT 353 taught by Greg Baker during Summer 2023
Add a description, image, and links to the spark-dataframes topic page so that developers can more easily learn about it.
To associate your repository with the spark-dataframes topic, visit your repo's landing page and select "manage topics."