You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository showcases IPL data analysis using Apache Spark. The project demonstrates the power of Spark for data transformation, cleaning, SQL queries, and visualization, all performed with PySpark to handle large-scale data efficiently.
The rapid pace of innovation in Artificial Intelligence (AI) is creating enormous opportunity for transforming entire industries and our very existence. After competing this comprehensive 6 course Professional Certificate, you will get a practical understanding of Machine Learning and Deep Learning. You will master fundamental concepts of Machin…
ETL Datapipeline to process Washington's EV data using Apache Spark, Docker, Snowflake, Airflow, AWS services and visualize the transformed parquet data by creating Tableau Dashboards.
As a coursera certified specialization completer you will have a proven deep understanding on massive parallel data processing, data exploration and visualization, and advanced machine learning & deep learning. You'll understand the mathematical foundations behind all machine learning & deep learning algorithms. You can apply knowledge in practi…
Use this project to join data from multiple csv files. Currently in this project we support one to one and one to many join. Along with this you can find how to use kafka producer efficiently with spark.
Developed a real-time streaming analytics pipeline using Apache Spark to calculate and store KPIs for e-commerce sales data, including total volume of sales, orders per minute, rate of return, and average transaction size. Used Spark Streaming to read data from Kafka, Spark SQL to calculate KPIs, and Spark DataFrame to write KPIs to JSON files.