This project connects to the Spotify API to collect all useful track, album, and artist information about The Beatles. An ETL pipeline then loads all track information by The Beatles to PostgreSQL, in which the data is normalized utilizing a star schema.
While this project was originally focused on creating an ETL pipeline for The Beatles as a band, this project can be configured for any artist.
- ETL
- Data Modeling
- Normalization
- API Connection
- Python
- PostgreSQL
- pgAdmin
- Psycopg2
- Spotipy
- Pandas
Configurations in the config.ini file will need to be adjusted per local PostGres settings (username, password, host, and database). The config.ini file is also where the artist name can be changed should that be desired.
In order to use the Spotipy package, API tokens will need to be obtained directly from Spotify. Click here for more information on this process.
For this project to process, the Spotify API access key needs to be set as an environment variable called "spotify_id", and secret key needs to be set as an environment variable called "spotify_secret". These environment variables will need to be set on the operating system this project is to be run on.
On the command line of your operating system, navigate to the repository directory (ideally using a Python virtual environment).
Run the following code on the command line to install requirements:
pip install -r requirements.txt
Run the following code on the command line to run this project:
Python run.py
- Modules
main.py
- Organizes execution of all modulesqueries.py
- Queries to drop, create, and insert data into tablessetup.py
- Creates connection to spotifyspotify.py
- Pulls album, artist, and track information to create tablessql.py
- Connects to PostGreSQL, and executes queries to create star schema structure
config.ini
- Configurations for PostGreSQL connection and Spotify artistrequirements.txt
- Python package requirementsschema_design.jpg
- image of star schema