ETL-Houseslima-Properati

This project creates a pipeline that takes data from Properati web page ( Properati is a real estate search site), processes it using lambda functions and, finally, stores it in a redshift database. This pipeline is orchestrated using AWS Step Functions and scheduled with AWS EventBridge. Also, we'll build a FLASK REST API to interact with the database. This Flask App allow us to retrieve data and is hosted in AWS Lightsail container service.

The tools that were used for the project are:

AWS for hosting the infraestructure.
AWS S3 as our storage.
AWS Lambda as the executor.
AWS Redshift as our data warehouse.
AWS Step Functions for orchestrating our pipeline.
AWS Eventbrigde for scheduling our pipeline.
AWS Lightsail Containers for hosting our Flask REST API App.
Terraform as IaC for the infra provisioning.
Docker for containerizing our FLASK APP.
Insomnia and Flask for testing and developing our REST API.
Pytest for testing the response we receive from the webpage.
Python as the main programming language.

Project's Architecture

Extracting data from Properati
The extracted data is validated, cleaned and uploaded to redshift.
A Flask REST API is created for the database so we can interact with the data inside our Data Warehouse.
Users can now analyze the data using any visualization tool they prefer or use the API to develop new solutions.

Project's requirements

These next requirements need to be installed locally for the correct functioning of the solution:

AWS CLI for account configuring and terraform provisioning.
AWS CLI Lighstail plugin for deploying our containers and pushing the docker images to the AWS Lightsail Containers' Repository.
Terraform to provision the infraestructure.
Docker to containerize the Flask REST API App image.

Start Pipeline

For testing, let's go to our root folder and run:

pytest: This will run some tests to make sure the web page works as we want to.

The first test will make sure that we receive the response 200, meaning that the webpage exists and we have access to it.
The second test will make sure that the limit of elements per page is 30.

Now to create the pipeline, terraform will initialize everything that we need. Just clone the repo and execute the next commands inside the terraform folder:

aws configure: This command is used to log in into an AWS Account using your secret access keys.
terraform init: This will initiate terraform in the folder.
terraform apply: This will create our infraestructure. You will be prompt to input a redshift password and user.
(Only run if you want to destroy the infraestructure) terraform destroy: This destroys the created infraestructure.

This pipeline is scheduled hourly, so we can wait 1 hour for the pipeline to run or run our Step Functions' State Machines manually.

Flask REST API

Path	Request Type	Parameters
`/properties`	GET	No parameters required. This request retrieves all the data from our database.
`/properties`	POST	id(int), type(str), title(str), bedrooms(int), bathrooms(int), price(int), surface(int), district(str), geo_lon(float), geo_lat(float), place_lon(float), place_lat(float)
`/properties/<int:id>`	GET	No parameters required. This request retrieves an specific property from our database by its id.

The Flask API URL can be found in the AWS lightsail container service.
The path URL/swagger-ui will show the documentation of the Flask API.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
flask_api		flask_api
images		images
lambda_functions		lambda_functions
packages		packages
terraform		terraform
testing_web_page		testing_web_page
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL-Houseslima-Properati

Project's Architecture

Project's requirements

Start Pipeline

Flask REST API

About

Releases

Packages

Languages

SebasMBK/etl-houseslima-properati

Folders and files

Latest commit

History

Repository files navigation

ETL-Houseslima-Properati

Project's Architecture

Project's requirements

Start Pipeline

Flask REST API

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages