spark-dataframes

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

streaming consumer parquet kafka-producer spark-sql spark-kafka-integration spark-streaming-data spark-transformations spark-to-cassandra-connection spark-dataframes spark-joins spark-hive-context spark-jdbc-connection spark-with-mangodb spark-aggregations-using-dataframe spark-use-cases cassandra-installation spark-datadog spark-mangodb spark-catalog-api

Updated Nov 16, 2022
Scala

jubins / Spark-And-MLlib-Projects

Star

This repository contains Spark, MLlib, PySpark and Dataframes projects

python spark pyspark spark-streaming mllib sparksql aws-ec2 spark-dataframes spark-ml

Updated Oct 22, 2017
Jupyter Notebook

yennanliu / spark-etl-pipeline

Star

Various data stream/batch process demo with Apache Scala Spark 🚀

docker dockerfile scala twitter spark apache-spark sbt pipeline stream-processing sbt-plugin spark-streaming sbt-assembly spark-sql spark-dataframes spark-batch spark-rdd

Updated Feb 28, 2020
Scala

jkoth / Data-Lake-with-Spark-and-AWS-S3

Star

Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster

apache-spark aws-s3 aws-emr pyspark data-engineering data-lake json-format udacity-nanodegree spark-dataframes dimensional-model star-schema etl-pipeline

Updated Oct 10, 2019
Python

neerajkesav / SparkJavaExamples

Star

Apache Spark Basics - Java Examples

java spark apache-spark hadoop hdfs sparkjava spark-java rdd sparkcontext spark-transformations spark-dataframes flatmap spark-example learn-spark spark-actions spark-basics javardd

Updated Sep 9, 2016
Java

NashTech-Labs / Sparkathon

Star

A library having Java and Scala examples for Spark 2.x

scala spark apache-spark spark-streaming java-8 rdd spark-sql spark-mllib spark-dataframes spark-ml knoldus spark-dataset spark-structured-streaming

Updated Dec 29, 2016
Java

afzals2000 / spark-bigquery-parallel

Star

Spark BigQuery Parallel

bigquery spark apache-spark pyspark google-cloud-platform spark-sql spark-dataframes spark-scala pyspark-python

Updated Jan 24, 2019
Scala

MaxineXiong / Item-based-collaborative-filtering

Star

This project utilizes PySpark DataFrames and PySpark RDD to implement item-based collaborative filtering. By calculating cosine similarity scores or identifying movies with the highest number of shared viewers, the system recommends 10 similar movies for a given target movie that aligns users’ preferences.

python spark apache-spark collaborative-filtering pyspark movie-recommendation spark-dataframes spark-rdd

Updated Jun 29, 2024
Jupyter Notebook

thenickben / SplitCSV-Spark

Star

Big Data - Split a large CSV file into N smaller ones and save them into the local disk

scala big-data spark spark-dataframes

Updated Nov 3, 2018
Scala

Vivek-Murali / CarCrashAnalysis

Star

BCG GAMMA CASE STUDY

etl pyspark data-engineering spark-dataframes

Updated Jan 27, 2023
Jupyter Notebook

NashTech-Labs / spark-dataframes-meetup

Star

meetup scala spark sbt spark-dataframes knoldus

Updated Apr 4, 2016
Scala

maziyarpanahi / spark-quickie

Star

Getting started with Apache Spark

spark spark-dataframes

Updated Feb 16, 2024

ninjeanne / datastorm

Star

Data Science and Engineering project - Programming for Big Data @ Simon Fraser University (SFU)

aws data-science data big-data spark aws-lambda aws-s3 bigdata data-visualization python3 aws-emr data-engineering aws-dynamodb spark-sql spark-mllib spark-dataframes

Updated Jan 2, 2023
Jupyter Notebook

mayankrawat / CSVJoin

Star

Use this project to join data from multiple csv files. Currently in this project we support one to one and one to many join. Along with this you can find how to use kafka producer efficiently with spark.

Updated Jul 1, 2022
Java

rajeshsantha / MonitoredStructuredStreaming

Star

Repository for Spark structured streaming use case implementations.

scala kafka apache-spark spark-streaming spark-dataframes spark-streaming-kafka spark-structured-streaming

Updated Apr 13, 2020
Scala

LucasDLee / CMPT-353-Final-Project

Star

This is our final project for SFU's CMPT 353 taught by Greg Baker during Summer 2023

python data-science statistics university-project spark-dataframes

Updated Aug 23, 2023
Python

Improve this page

Add a description, image, and links to the spark-dataframes topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the spark-dataframes topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-dataframes

Here are 43 public repositories matching this topic...

mahmoudparsian / pyspark-tutorial

26hzhang / StockPrediction

mahmoudparsian / big-data-mapreduce-course

Thomas-George-T / Movies-Analytics-in-Spark-and-Scala

spider-123-eng / Spark

jubins / Spark-And-MLlib-Projects

yennanliu / spark-etl-pipeline

jkoth / Data-Lake-with-Spark-and-AWS-S3

neerajkesav / SparkJavaExamples

NashTech-Labs / Sparkathon

afzals2000 / spark-bigquery-parallel

MaxineXiong / Item-based-collaborative-filtering

thenickben / SplitCSV-Spark

Vivek-Murali / CarCrashAnalysis

NashTech-Labs / spark-dataframes-meetup

maziyarpanahi / spark-quickie

ninjeanne / datastorm

mayankrawat / CSVJoin

rajeshsantha / MonitoredStructuredStreaming

LucasDLee / CMPT-353-Final-Project

Improve this page

Add this topic to your repo