rdd
Here are 199 public repositories matching this topic...
This repository contains solutions for the final assignment of the Big Data Mining course (52002/52019), focusing on querying large datasets with BigQuery, network analysis with Python, and distributed data processing with Apache Spark.
-
Updated
Sep 11, 2024 - Jupyter Notebook
Analysis of a college student dataset using Spark RDD. Demo of various operations on RDD such as countByValue, groupBy, groupByKey, reduceByKey,etc. Demo of map, flatmap, split, explicit, filter, type conversion, finding sum, count, distinct, aggregate, length of RDD. Demo of String, int comparisons, UNION & intersetion on RDD's.
-
Updated
Sep 5, 2018
This project covers a range of fundamental operations on Resilient Distributed Datasets (RDDs) and DataFrames, along with an exploration of a Big Recommender Dataset using Apache Spark's powerful tools.
-
Updated
Aug 22, 2023 - Jupyter Notebook
This Repository contains tutorials for Natural Language Processing, Machine Leaning, Ontology Creation, Querying Ontology using DL-Query, Implementing Question and Answering System
-
Updated
Jul 29, 2017 - JavaScript
python rdd notebook in apache spark
-
Updated
Dec 8, 2018 - Jupyter Notebook
Solved assignments of coursera's Fundamentals of scalable data science course
-
Updated
Apr 22, 2020 - Jupyter Notebook
Scripts permettant de récupérer et d'injecter une nomenclatures pegase via une table dans un BD postGres
-
Updated
Jun 3, 2022
Streaming data in Spark and doing data analytics
-
Updated
Sep 19, 2019 - Python
Improve this page
Add a description, image, and links to the rdd topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the rdd topic, visit your repo's landing page and select "manage topics."