apachespark

Here are 45 public repositories matching this topic...

ayyankalu / Ipl_Data_analysis_Spark

This repository showcases IPL data analysis using Apache Spark. The project demonstrates the power of Spark for data transformation, cleaning, SQL queries, and visualization, all performed with PySpark to handle large-scale data efficiently.

sql bigdata pyspark datavisualization apachespark data-analysis-python iplanalysis

Updated Sep 9, 2024
Jupyter Notebook

amit2014 / IBM-AI-Engineering-Professional-Certificate

Star

The rapid pace of innovation in Artificial Intelligence (AI) is creating enormous opportunity for transforming entire industries and our very existence. After competing this comprehensive 6 course Professional Certificate, you will get a practical understanding of Machine Learning and Deep Learning. You will master fundamental concepts of Machin…

ai spark bigdata keras ml coursera datascience deeplearning ibm ann apachespark

Updated Apr 20, 2020

ravishankar324 / Washington-state-electric-vehicles-ETL-pipeline

Star

ETL Datapipeline to process Washington's EV data using Apache Spark, Docker, Snowflake, Airflow, AWS services and visualize the transformed parquet data by creating Tableau Dashboards.

python emr docker airflow ec2 s3 iam snowflake pyspark sparksql tableau apachespark

Updated Aug 24, 2024
Python

meghna-cse / CloudComputingAndBigData-CSE6332

Star

Projects completed as part of the CSE 6332 CCBD course at UTA, covering distributed computing, data processing frameworks, and cloud platforms.

big-data cloudcomputing apachespark

Updated Jun 22, 2024
Java

mehrdadalmasi2020 / ApacheSpark_ApacheZeppelin_SQL_Shell

Star

Run your first analysis project on Apache Zeppelin using Scala (Spark), Shell, and SQL

visualization shell scala notebook sparksql zeppelin-notebook apachespark

Updated Feb 16, 2024
Scala

AbdelmajidLh / spark-functionality-repo

Star

Ce dépôt GitHub contient un document détaillé sur les bases du langage Scala.

scala spark apache python3 pyspark databricks databricks-notebooks apachespark

Updated Apr 8, 2024

SwethaJoseph / Crime-Pattern-Analysis-Project

Star

Analysis and visualization of open-source police data from two areas, Leicestershire Street and Northumbria Street to derive data-driven insights

python exploratory-data-analysis jupyter-notebook sql-query pyspark datavisualization apachespark datapreprocessing datamanipulation

Updated Jul 8, 2024

divithraju / divith-raju-Immigration-Data-Engineering

Star

A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)

sql bigdata pandas dataset datapipeline datalake dataprocessing dataengineering capstone-project apachespark datacleaning bigdataproject datamodeling datawherehouse dataschema bigdataprocessing

Updated Dec 25, 2022
Jupyter Notebook

amit2014 / Advanced-Data-Science-with-IBM-Specialization

Star

As a coursera certified specialization completer you will have a proven deep understanding on massive parallel data processing, data exploration and visualization, and advanced machine learning & deep learning. You'll understand the mathematical foundations behind all machine learning & deep learning algorithms. You can apply knowledge in practi…

datascience internetofthings deeplearning apachespark

Updated Dec 26, 2019

geazi-anc / dracula

Star

a brief analysis to the most common words in Dracula, by Bram Stoker

python spark jupyter dracula pyspark dataanalysis apachespark

Updated Jan 11, 2023
Jupyter Notebook

Orkhan-1 / Full-Course-Apache-Spark

Star

This comprehensive course is designed for beginners and experienced developers alike, providing an in-depth exploration of Apache Spark

bigdata spark-streaming spark-sql apachespark

Updated Oct 3, 2024
Java

mayankrawat / CSVJoin

Star

Use this project to join data from multiple csv files. Currently in this project we support one to one and one to many join. Along with this you can find how to use kafka producer efficiently with spark.

Updated Jul 1, 2022
Java

gilga001 / HPCandBigDataPipeline

Star

A published paper in PEARC18: Combining HPC and Big Data Infrastructures in Large-Scale Post-Processing of SimulaBon Data: A Case Study

python simulation hpc bigdata mdtraj postprocessing apachespark

Updated Jul 23, 2018
Python

payamrastogi / SparkCourse

Star

python apachespark

Updated Sep 15, 2017
Python

propelledanalytics / SparkSQL.jl

Star

SparkSQL.jl enables Julia programs to work with Apache Spark data using just SQL.

spark julia-language julialang apachespark

Updated Jan 29, 2024
Julia

Arkaprabha-B / PySpark-GraphFrames

Star

Implementation of GraphFrames using PySpark in Eclipse IDE

python3 pyspark-tutorial graphframes apachespark

Updated Aug 6, 2019

sarathchandrikak / ETL-Bank-Transcation

Star

Data Analysis of bank transaction data

sql pyspark redshift sqoop apachespark s3bucket

Updated Jun 6, 2023
Jupyter Notebook

tspannhw / FLiPStackWeekly

Star

FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...

streaming cloudera apachespark apachekafka timspann apachenifi lakehouse apacheflink apacheiceberg

Updated Nov 14, 2024

Az1m04 / Advance-Data-Science-with-IBM-Watson-Studio

Star

This work on Python notebook .It shows how to calculate covariance and correlations using pyspark

python apachespark advance-data-science-ibm

Updated Jul 29, 2020
Jupyter Notebook

urvashiforreal / Retail-Data-Analysis

Star

Developed a real-time streaming analytics pipeline using Apache Spark to calculate and store KPIs for e-commerce sales data, including total volume of sales, orders per minute, rate of return, and average transaction size. Used Spark Streaming to read data from Kafka, Spark SQL to calculate KPIs, and Spark DataFrame to write KPIs to JSON files.

sparksql sparkstreaming apachespark sparkdataframe

Updated Oct 15, 2023
Python

Improve this page

Add a description, image, and links to the apachespark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the apachespark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apachespark

Here are 45 public repositories matching this topic...

ayyankalu / Ipl_Data_analysis_Spark

amit2014 / IBM-AI-Engineering-Professional-Certificate

ravishankar324 / Washington-state-electric-vehicles-ETL-pipeline

meghna-cse / CloudComputingAndBigData-CSE6332

mehrdadalmasi2020 / ApacheSpark_ApacheZeppelin_SQL_Shell

AbdelmajidLh / spark-functionality-repo

SwethaJoseph / Crime-Pattern-Analysis-Project

divithraju / divith-raju-Immigration-Data-Engineering

amit2014 / Advanced-Data-Science-with-IBM-Specialization

geazi-anc / dracula

Orkhan-1 / Full-Course-Apache-Spark

mayankrawat / CSVJoin

gilga001 / HPCandBigDataPipeline

payamrastogi / SparkCourse

propelledanalytics / SparkSQL.jl

Arkaprabha-B / PySpark-GraphFrames

sarathchandrikak / ETL-Bank-Transcation

tspannhw / FLiPStackWeekly

Az1m04 / Advance-Data-Science-with-IBM-Watson-Studio

urvashiforreal / Retail-Data-Analysis

Improve this page

Add this topic to your repo