This repository has been archived by the owner on Mar 10, 2023. It is now read-only.
This repository has been archived by the owner on Mar 10, 2023. It is now read-only.
Derived (and augmented) dataset available in JSON, TSV and SQL formats #1281
Open
Description
I have wrote some scripts that take both the series and daily reports files output the following two files:
- https://github.com/cipriancraciun/covid19-datasets -- the repository with the data and scripts;
- TSV -- https://github.com/cipriancraciun/covid19-datasets/blob/master/exports/jhu/v1/daily/values.tsv
- JSON -- https://github.com/cipriancraciun/covid19-datasets/blob/master/exports/jhu/v1/daily/values.json
- SQL -- https://github.com/cipriancraciun/covid19-datasets/blob/master/exports/jhu/v1/daily/values-sqlite.sql
If you want to automate the download (given how GitHub handles URL's to raw files), you can use the links listed on this page.
Also some plots for these available at:
What I've done:
- the original JHU dataset has the daily data points in columns, which basically doesn't work with 90% of the usual tools; thus I have "normalized" this in a more SQL-friendly format, with one data-point per row, keyed by location + date;
- for those countries with regions / provinces I have added a "total" row; (it's easier for plotting;)
- I have also provided total for US country and US states;
- I have added an
infected
column which is computed asinfected := confirmed - deaths - recovered
; (this data is available up to 2020-03-22;) - for each of the metrics (
confirmed
,recovered
,deaths
andinfected
) I have added four additional metrics (i.e. in total 16 metrics):absolute_*
-- the original value from the JHU dataset, i.e. cumulative values;relative_*
-- the metric divided byconfirmed
in percentage; (I.e. how many recovered people from the total confirmed up to that date;)delta_*
-- the difference from the previous day; (in case ofinfected
the number can be negative;)deltapct_*
-- the delta divided by the previous day value; (i.e. the speed in percentage;)
- I have also added the
day_index_*
columns which represents the day index since that country / region has reached either1
,10
,100
, or1000
confrimed cases; (it helps align countries and compare them to that;) - I have normalized the country names (i.e. some countries are named differently in different rows, etc.);
- I have augmented the country data with ISO codes, continents, subcontinents and other useful information;
- I have added population columns based on CIA Factbook dataset;
- I have added total rows for continent and sub-continent levels;
- I have provided the date in ISO format;
I will update these files twice per day, say at 06 UTC and 12 UTC.
Moreover I have also added the in the same format also the NY Times US dataset and the ECDC one.
The scripts are available in the following repository and consist mainly of jq
snippets.
If anyone has other ideas about what I can add to these augmented datasets please let me know.
Metadata
Metadata
Assignees
Labels
No labels
Activity