Skip to content
This repository has been archived by the owner on Mar 10, 2023. It is now read-only.
This repository has been archived by the owner on Mar 10, 2023. It is now read-only.

Derived (and augmented) dataset available in JSON, TSV and SQL formats #1281

Open
@cipriancraciun

Description

I have wrote some scripts that take both the series and daily reports files output the following two files:

If you want to automate the download (given how GitHub handles URL's to raw files), you can use the links listed on this page.

Also some plots for these available at:


What I've done:

  • the original JHU dataset has the daily data points in columns, which basically doesn't work with 90% of the usual tools; thus I have "normalized" this in a more SQL-friendly format, with one data-point per row, keyed by location + date;
  • for those countries with regions / provinces I have added a "total" row; (it's easier for plotting;)
  • I have also provided total for US country and US states;
  • I have added an infected column which is computed as infected := confirmed - deaths - recovered; (this data is available up to 2020-03-22;)
  • for each of the metrics (confirmed, recovered, deaths and infected) I have added four additional metrics (i.e. in total 16 metrics):
    • absolute_* -- the original value from the JHU dataset, i.e. cumulative values;
    • relative_* -- the metric divided by confirmed in percentage; (I.e. how many recovered people from the total confirmed up to that date;)
    • delta_* -- the difference from the previous day; (in case of infected the number can be negative;)
    • deltapct_* -- the delta divided by the previous day value; (i.e. the speed in percentage;)
  • I have also added the day_index_* columns which represents the day index since that country / region has reached either 1, 10, 100, or 1000 confrimed cases; (it helps align countries and compare them to that;)
  • I have normalized the country names (i.e. some countries are named differently in different rows, etc.);
  • I have augmented the country data with ISO codes, continents, subcontinents and other useful information;
  • I have added population columns based on CIA Factbook dataset;
  • I have added total rows for continent and sub-continent levels;
  • I have provided the date in ISO format;

I will update these files twice per day, say at 06 UTC and 12 UTC.

Moreover I have also added the in the same format also the NY Times US dataset and the ECDC one.


The scripts are available in the following repository and consist mainly of jq snippets.


If anyone has other ideas about what I can add to these augmented datasets please let me know.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions