An inspection into the current market of data-field jobs & freelancing (data analysis, data science, ML development & data engineering) by looking into LinkedIn job postings and freelancers accounts on UpWork & Guru.
- Interactive report
- Job postings analysis notebook
- Freelancers analysis notebook
- Kaggle dataset*
- Analysis PDF reports
*: the kaggle dataset is mine for more look here
There's no need to reproduce the project if you just want to see it the results quick links should be enough but if you want to see how it works and how it was made follow the steps below:
The project was done using Python 3.12.11 so if you want the exact same results I'd recommend using it but it probalby
won't matter if you have a close python version.
After you've cloned the repo and moved into it you should create a Python venv
and activate it (how? depends on your OS). afterwards run:
pip -r install requirements.txt
Now you are done setting up the Python virtual environment for the project.
You can either download the dataset I made from Kaggle into data dir
or run the scripts needed to collect the data yourself into the data dir.
The data used in this project was collected using mining scripts in the scripts directory which are executed on various
days to collect different data, the output of those scripts are .csv files which got aggregated using
scripts/sync_raw_databasse.py resulting in the SqlLite database data/raw_database.db which afterwards gets cleaned
using either notebooks/cleaning.ipynb or scripts/sync_cleaned_database.py which results in a new SqlLite database
data/clean_database.db which is used in the analysis notebooks and interactive web report.
If you chose to collect the data yourself here're the commands you need to run:
mkdir data- Activate your python
venv python scripts/mine_linkedin.pypython scripts/upwork_linkedin.pyremeber to check the cloudflare checkbox in the automated chromium instancepython scripts/guru_linkedin.pypython scripts/sync_raw_databasse.py
Now you should have data/raw_database.db (if that isn't the case make a github issue)
Now you can either run:
./notebooks/cleaning.ipynbjupyter notebook./scripts/sync_cleaned_database.pycleaning script
But know it's gonna take a long time because of the QA model that extracts salary.
And that's it if you any time wanted to update your data repeat all the steps we have done to create the clean database.
You can execute any of the project's notebooks locally in the notebooks directory or just view them from the web viewer
in the links below
The interactive report is available as a github page here but if you want to host it your self here are the steps:
- Make sure you have
data/clean_database.db - Activate your venv and execute
python scripts/sync_dashboard_database.py - Now you should have
dashboard/data/linkedin.json&dashboard/data/upwork.json - Initiate a local server at
localhost:5000by runningpython ./dashboard/app.py
Here are some cool findings from the notebooks & the interactive report for more you can either look at the notebooks or the pdf reports.
-
Over 50% of postings do not specify a degree requirement, indicating that many companies are flexible about formal education, relying instead on skills and experience.
-
Machine learning roles show strong demand for both junior and senior levels unlike Data analyst roles which are less open to entry-level candidates, also we can see that data engineering roles are the most demanding for senior roles.
-
Most paid skills in job postings are the data engineering, or ML skills such as C++, PyTorch, TensorFlow & R language
-
Data engineers earn the highest amount per job, ML freelancers earn about half as much per job as data engineers. time or effort per job is unknown, so earnings per job should not be interpreted as earnings per hour.
