rdd
Here are 199 public repositories matching this topic...
Evaluates the execution time differences between RDD (Resilient Distributed Datasets) and DataFrame data structures in Apache Spark. Also takes into account the file format being used, such as CSV or Parquet.
-
Updated
Nov 22, 2023 - Python
This repository contains solutions for the final assignment of the Big Data Mining course (52002/52019), focusing on querying large datasets with BigQuery, network analysis with Python, and distributed data processing with Apache Spark.
-
Updated
Sep 11, 2024 - Jupyter Notebook
Analysis of a college student dataset using Spark RDD. Demo of various operations on RDD such as countByValue, groupBy, groupByKey, reduceByKey,etc. Demo of map, flatmap, split, explicit, filter, type conversion, finding sum, count, distinct, aggregate, length of RDD. Demo of String, int comparisons, UNION & intersetion on RDD's.
-
Updated
Sep 5, 2018
-
Updated
Feb 24, 2023 - Jupyter Notebook
This project covers a range of fundamental operations on Resilient Distributed Datasets (RDDs) and DataFrames, along with an exploration of a Big Recommender Dataset using Apache Spark's powerful tools.
-
Updated
Aug 22, 2023 - Jupyter Notebook
This Repository contains tutorials for Natural Language Processing, Machine Leaning, Ontology Creation, Querying Ontology using DL-Query, Implementing Question and Answering System
-
Updated
Jul 29, 2017 - JavaScript
python rdd notebook in apache spark
-
Updated
Dec 8, 2018 - Jupyter Notebook
spark big data exploring in jupyterlab
-
Updated
Jun 10, 2020 - Jupyter Notebook
This is a tutorial on Spark programming in Scala to analyze CSV data in RDDs and Dataframe formats
-
Updated
Apr 20, 2020 - Batchfile
Scripts permettant de récupérer et d'injecter une nomenclatures pegase via une table dans un BD postGres
-
Updated
Jun 3, 2022
Improve this page
Add a description, image, and links to the rdd topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the rdd topic, visit your repo's landing page and select "manage topics."