apachespark

Here are 45 public repositories matching this topic...

ayyankalu / Ipl_Data_analysis_Spark

This repository showcases IPL data analysis using Apache Spark. The project demonstrates the power of Spark for data transformation, cleaning, SQL queries, and visualization, all performed with PySpark to handle large-scale data efficiently.

sql bigdata pyspark datavisualization apachespark data-analysis-python iplanalysis

Updated Sep 9, 2024
Jupyter Notebook

amit2014 / IBM-AI-Engineering-Professional-Certificate

Star

The rapid pace of innovation in Artificial Intelligence (AI) is creating enormous opportunity for transforming entire industries and our very existence. After competing this comprehensive 6 course Professional Certificate, you will get a practical understanding of Machine Learning and Deep Learning. You will master fundamental concepts of Machin…

ai spark bigdata keras ml coursera datascience deeplearning ibm ann apachespark

Updated Apr 20, 2020

ravishankar324 / Washington-state-electric-vehicles-ETL-pipeline

Star

ETL Datapipeline to process Washington's EV data using Apache Spark, Docker, Snowflake, Airflow, AWS services and visualize the transformed parquet data by creating Tableau Dashboards.

python emr docker airflow ec2 s3 iam snowflake pyspark sparksql tableau apachespark

Updated Aug 24, 2024
Python

meghna-cse / CloudComputingAndBigData-CSE6332

Star

Projects completed as part of the CSE 6332 CCBD course at UTA, covering distributed computing, data processing frameworks, and cloud platforms.

big-data cloudcomputing apachespark

Updated Jun 22, 2024
Java

mehrdadalmasi2020 / ApacheSpark_ApacheZeppelin_SQL_Shell

Star

Run your first analysis project on Apache Zeppelin using Scala (Spark), Shell, and SQL

visualization shell scala notebook sparksql zeppelin-notebook apachespark

Updated Feb 16, 2024
Scala

SwethaJoseph / Crime-Pattern-Analysis-Project

Star

Analysis and visualization of open-source police data from two areas, Leicestershire Street and Northumbria Street to derive data-driven insights

python exploratory-data-analysis jupyter-notebook sql-query pyspark datavisualization apachespark datapreprocessing datamanipulation

Updated Jul 8, 2024

Cyang18 / MusicProducer

Star

This is a distributed system that utilizes Apache Spark through Dataproc. We use the Spotify API to send song data to Apache Spark, which then forwards the information to Google Cloud Services. The system processes this data to recommend songs based on the extracted information.

javascript hive apache python3 dataproc-cluster apachespark

Updated Oct 14, 2024
Python

amit2014 / Advanced-Data-Science-with-IBM-Specialization

Star

As a coursera certified specialization completer you will have a proven deep understanding on massive parallel data processing, data exploration and visualization, and advanced machine learning & deep learning. You'll understand the mathematical foundations behind all machine learning & deep learning algorithms. You can apply knowledge in practi…

datascience internetofthings deeplearning apachespark

Updated Dec 26, 2019

geazi-anc / dracula

Star

a brief analysis to the most common words in Dracula, by Bram Stoker

python spark jupyter dracula pyspark dataanalysis apachespark

Updated Jan 11, 2023
Jupyter Notebook

Orkhan-1 / Full-Course-Apache-Spark

Star

This comprehensive course is designed for beginners and experienced developers alike, providing an in-depth exploration of Apache Spark

bigdata spark-streaming spark-sql apachespark

Updated Oct 3, 2024
Java

gilga001 / HPCandBigDataPipeline

Star

A published paper in PEARC18: Combining HPC and Big Data Infrastructures in Large-Scale Post-Processing of SimulaBon Data: A Case Study

python simulation hpc bigdata mdtraj postprocessing apachespark

Updated Jul 23, 2018
Python

payamrastogi / SparkCourse

Star

python apachespark

Updated Sep 15, 2017
Python

saikumarsuvanam / BigData

Star

Hadoop,MachineLearningAlgos,Spark,Pig,Hive

scala hive hadoop pyspark mllib pig graphx apachespark

Updated Jan 26, 2018
Java

Az1m04 / Advance-Data-Science-with-IBM-Watson-Studio

Star

This work on Python notebook .It shows how to calculate covariance and correlations using pyspark

python apachespark advance-data-science-ibm

Updated Jul 29, 2020
Jupyter Notebook

urvashiforreal / Retail-Data-Analysis

Star

Developed a real-time streaming analytics pipeline using Apache Spark to calculate and store KPIs for e-commerce sales data, including total volume of sales, orders per minute, rate of return, and average transaction size. Used Spark Streaming to read data from Kafka, Spark SQL to calculate KPIs, and Spark DataFrame to write KPIs to JSON files.

sparksql sparkstreaming apachespark sparkdataframe