Derived (and augmented) dataset available in JSON, TSV and SQL formats

I have wrote some scripts that take both the series and daily reports files output the following two files:

* https://github.com/cipriancraciun/covid19-datasets -- the repository with the data and scripts;
* TSV -- https://github.com/cipriancraciun/covid19-datasets/blob/master/exports/jhu/v1/daily/values.tsv
* JSON -- https://github.com/cipriancraciun/covid19-datasets/blob/master/exports/jhu/v1/daily/values.json
* SQL -- https://github.com/cipriancraciun/covid19-datasets/blob/master/exports/jhu/v1/daily/values-sqlite.sql

If you want to automate the download (given how GitHub handles URL's to raw files), you can use the [links listed on this page](https://scratchpad.volution.ro/ciprian/eedf5eb117ec363ca4f88492b48dbcd3/#m_g).

Also some plots for these available at:
* https://scratchpad.volution.ro/ciprian/eedf5eb117ec363ca4f88492b48dbcd3/

----

What I've done:
* the original JHU dataset has the daily data points in columns, which basically doesn't work with 90% of the usual tools;  thus I have "normalized" this in a more SQL-friendly format, with one data-point per row, keyed by location + date;
* for those countries with regions / provinces I have added a "total" row;  (it's easier for plotting;)
* I have also provided total for US country and US states;
* I have added an `infected` column which is computed as `infected := confirmed - deaths - recovered`;  (this data is available up to 2020-03-22;)
* for each of the metrics (`confirmed`, `recovered`, `deaths` and `infected`) I have added four additional metrics (i.e. in total 16 metrics):
  * `absolute_*` -- the original value from the JHU dataset, i.e. cumulative values;
  * `relative_*` -- the metric divided by `confirmed` in percentage;  (I.e. how many recovered people from the total confirmed up to that date;)
  * `delta_*` -- the difference from the previous day;  (in case of `infected` the number can be negative;)
  * `deltapct_*` -- the delta divided by the previous day value;  (i.e. the speed in percentage;)
* I have also added the `day_index_*` columns which represents the day index since that country / region has reached either `1`, `10`, `100`, or `1000` confrimed cases;  (it helps align countries and compare them to that;)
* I have normalized the country names (i.e. some countries are named differently in different rows, etc.);
* I have augmented the country data with ISO codes, continents, subcontinents and other useful information;
* I have added population columns based on CIA Factbook dataset;
* I have added total rows for continent and sub-continent levels;
* I have provided the date in ISO format;

I will update these files twice per day, say at 06 UTC and 12 UTC.

Moreover I have also added the in the same format also the NY Times US dataset and the ECDC one.

----

The scripts are available in the following repository and consist mainly of `jq` snippets.
* https://github.com/cipriancraciun/covid19-datasets

----

If anyone has other ideas about what I can add to these augmented datasets please let me know.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Derived (and augmented) dataset available in JSON, TSV and SQL formats #1281

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Derived (and augmented) dataset available in JSON, TSV and SQL formats #1281

Description

Activity

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions