🚴‍♂️ Jcdecaux Bike Real-time Data Pipeline with Docker

📊 Overview

This pipeline captures real-time bike station data from the Jcdecaux public API and processes it using Dockerized services for messaging, real-time processing, storage, and visualization. Kafka handles messaging, Spark Streaming processes data, Elasticsearch stores the information, and Kibana provide visualization.

🛠 Architecture

The pipeline consists of the following components:

Jcdecaux API: Source of real-time bike station data.
Kafka: Distributed messaging system for data distribution.
Spark Streaming: Processes data in real-time.
Elasticsearch: Stores data for easy querying.
Kibana: Visualize data stored in Elasticsearch.

🧰 Prerequisites

All components run in Docker containers. Ensure Docker and Docker Compose are installed on your system.

⚙️ Installation and Setup

Clone the repository:

git clone https://github.com/Yasselo/Realtime_jcdecaux_pipeline
cd Realtime_jcdecaux_pipeline

Build and Start Services: Run the following command to start all services, including Kafka, Spark, Elasticsearch, and Kibana, in Docker containers:
```
docker-compose up --build
```
This command pulls the necessary Docker images, builds the custom bike pipeline container, and starts all services defined in `docker-compose.yml`.

🔧 Project Configuration

Kafka Topic: A Kafka topic named `velib_stations` will be created automatically for bike station data.
Environment Variables:
- `KAFKA_BROKER`: Kafka broker address (`kafka:9092`).
- `ELASTICSEARCH_HOST`: Elasticsearch host (`elasticsearch`).
- `SPARK_MASTER`: Spark master URL (`spark://spark:7077`).

🚀 Running the Pipeline

Start the Docker containers:
```
docker-compose up
```
Start the pipeline:
```
docker exec -it bike_pipeline bash
```

Wait for the various services to initialize. Once they're running, enter the following commands in the bash:

python3 producer & python3 consumer

Kibana Access: Open http://localhost:5601 to view Kibana and check Elasticsearch data under "Index Management".

🛠 Docker Images and Setup Details

The following Docker images are used:

Kafka: `wurstmeister/kafka:2.13-2.8.1`
Spark: `bitnami/spark:3.2.4`
Elasticsearch: `docker.elastic.co/elasticsearch/elasticsearch:8.8.2`
Kibana: `docker.elastic.co/kibana/kibana:8.8.2`
Custom Pipeline: Defined in `Dockerfile`, containing OpenJDK 11, Spark, Kafka, and Elasticsearch dependencies.

📌 Conclusion

With the setup complete, real-time bike station data should now be visible in Kibana. This project demonstrates the use of a streaming data pipeline for processing and visualizing real-time data using Docker.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
files		files
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.py		config.py
consumer.py		consumer.py
create_es_index.py		create_es_index.py
docker-compose.yml		docker-compose.yml
logger.py		logger.py
producer.py		producer.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚴‍♂️ Jcdecaux Bike Real-time Data Pipeline with Docker

📊 Overview

🛠 Architecture

🧰 Prerequisites

⚙️ Installation and Setup

🔧 Project Configuration

🚀 Running the Pipeline

🛠 Docker Images and Setup Details

📌 Conclusion

About

Releases

Packages

Languages

Yasselo/Realtime_API-Jcdecaux_Pipeline

Folders and files

Latest commit

History

Repository files navigation

🚴‍♂️ Jcdecaux Bike Real-time Data Pipeline with Docker

📊 Overview

🛠 Architecture

🧰 Prerequisites

⚙️ Installation and Setup

🔧 Project Configuration

🚀 Running the Pipeline

🛠 Docker Images and Setup Details

📌 Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages