Releases: GoogleCloudPlatform/public-datasets-pipelines
Releases · GoogleCloudPlatform/public-datasets-pipelines
v5.2.0
5.2.0 (2022-11-01)
Features
- Add geom columns for thelook_ecommerce dataset (#307) (f39a177)
- Add Municipal Calendar to San Francisco Dataset (#480) (a21c2ef)
- Add PM25_FRM_DAILY_SUMMARY Pipeline To Epa_Historical_Air_Quality Dataset (#518) (4f66c05)
- Add Storms Database to Noaa Dataset (#498) (8d02866)
- Adding a tutorial for the Iowa Liquor dataset (#419) (b619b71)
- Adding New Pipelines To San Francisco Dataset. (#487) (58cda71)
- Extract the tabular metadata for Cloud Datasets program (#452) (1a3d59e)
- Launch AFDB v4 dataset (#522) (c6664a7)
- Migrate the dataset Covid19 Italy from Xenon (#488) (1ca6bd6)
- Migrate the World Bank datasets x 3 from Xenon (#506) (65295d0)
- Migrate the Xenon World Bank WDI dataset (#482) (35457a9)
- onboard chembl-30 dataset (#467) (ef9c57b)
- Onboard COVID-19 Genome Sequence dataset (#460) (0b7828f)
- Onboard dataset Open Buildings (#453) (739b6cf)
- Onboard EBI CHemBL Previous Data dataset (#470) (63b4012)
- Onboard FDIC dataset (#495) (e20e157)
- Onboard Fec dataset (#485) (2da413e)
- Onboard Human Variant Annotation dataset (#438) (ebfe4de)
- Onboard IDC v10 dataset (#433) (c2ffc77)
- onboard irs 990 ein dataset (#481) (65544a2)
- Onboard MERFISH Mouse Brain Receptor Map dataset (#457) (4333fca)
- Onboard Multilingual Spoken Words Corpus - MLCommons Association dataset (#461) (22cc27c)
- Onboard New Fec dataset (#486) (6ee1fa3)
- Onboard New FEC dataset (#513) (e770220)
- Onboard NHTSA Traffic Fatalities dataset (#454) (eb409c4)
- Onboard NOAA Passive Bioacoustic dataset (#471) (2ecd9ea)
- Onboard Uniref50 dataset (#443) (dbf2300)
- Onboard Uniref50 dataset (#473) (b44d572)
- YAML custom tag for interpolating GCR image URLs (#372) (ef901e5)
Bug Fixes
- Added "is_public" to cloud_datasets.tabular_datasets table (#501) (802cff6)
- Added Airport Fee To Schema Files And Pipeline.Yaml In New York Taxi Trips Dataset (#476) (d94105a)
- Adds BRL currency in Google Political Ads (#469) (edd3654)
- AlphaFold dataset - add accession_ids.csv to the bucket (#451) (cacd9f1)
- Change Destination Dataset in Noaa Pipelines (#479) (c7c047c)
- City Health Dashboard Schema Changes (#515) (1bdb0dd)
- deleting pod error (#511) (77fe479)
- Fixing the forecasting issue in the notebook. (#472) (de7f1fa)
- For COVID-19 Italy, resolve bucket variable in pipeline.yaml (#509) (1f913ac)
- For FDA Food Enforcement, Resolve invalid source DateTime data. ([#508](https://github.com/GoogleCloudPlatform/public-datasets-pipel...
v5.1.0
5.1.0 (2022-07-30)
Features
- Add scaffold script for directory + dataset.yaml setup (#412) (5bf354b)
- Adding a notebook tutorial for the EPA dataset: CO levels (#422) (f0bab59)
- Adds operators for Cloud SQL, Cloud Functions, and GCE (#429) (9b5da34)
- Support
--async-builds
flag forgenerate_dag.py
(#424) (7536df9)
Datasets
- Onboard DeepMind AlphaFold DB (#431) (02c887e)
- Onboard CelebA dataset (#420) (0c28563)
- Adds BQ views to
scalable_open_source
dataset (#416) (2785234) - Rename co2 columns to emissions to make it generic from Travel Impact Model dataset. (#418) (e1ac106)
Bug Fixes
- Change
cms_medicare
tables with columnprovider_zipcode
from integer to string type (#417) (27b0a9b) - Resolve conflicts on Census Bureau ACS (#414) (492b973)
- Resolve CRON value in Cloud Storage Geo Index dataset (#413) (8903e82)
- Resolve IP error when creating NOAA cluster (#423) (82d53f4)
- Use proper GCS prefix for custom data folder (#408) (9d56363)
v5.0.0
5.0.0 (2022-07-11)
⚠ BREAKING CHANGES
- Upgrade to Airflow 2.2.5 and Python 3.8.12 (#394)
Datasets
- Onboard Carbon-Free Energy Calculator dataset (#391) (f3a9447)
- Onboard Census Bureau ACS Dataset (#399) (98e0179)
- Onboard Fashion MNIST dataset (#387) (91b7f6a)
- Onboard IMDb dataset (#406) (2559838)
- Optimize tests for DAG and Terraform generation (#395) (ffcd18c)
- Remove co2e columns from Travel Impact Model dataset. (#400) (d7179ce)
Bug Fixes
v4.2.0
4.2.0 (2022-06-25)
Datasets
- Onboard COVID-19 dataset from The New York Times (#383) (9aac451)
- Onboard NOAA dataset (#378) (02cc038)
- Onboard San Jose Translation dataset (#377) (63ea9b9)
- Onboarding MIMIC-III dataset (#389) (baf6b8d)
- [datasets/gbif] Add a query to uncover species found in one region only (#388) (bd5a135)
Features
v4.1.1
v4.1.0
4.1.0 (2022-06-10)
Datasets
- Onboard City Health Dashboard dataset (#374) (c7cd9dd)
- Onboard Cloud Storage Geo Index (#367) (63cdb2a)
- Onboard EPA Historical Air Quality (#373) (4f4c87e)
- Onboard IDC v9 dataset (#364) (bfb9f23)
- Onboard NOAA datasets (#353) (0f1c696)
- Onboard The General Index Dataset (#342) (67d7216)
- Revised COVID-19 Google Mobility dataset (#363) (ddd3dac)
Documentation Set
v4.0.0
4.0.0 (2022-05-23)
⚠ BREAKING CHANGES
Datasets
- Onboard Census Opportunity Atlas Dataset (#263) (13ce71d)
- Onboard deps.dev (Open Source Insights) dataset (#356) (12143af)
- Onboard Diversity Annual Report and complementary datasets (#358) (4a8a2cd)
- Onboard EPA Historical Air Quality dataset (#301) (214a56f)
- Onboard GBIF dataset (#355) (ab4e208)
- Onboard IDC v8 dataset (#319) (0f112e0)
- Onboard International Search Terms for Google Trends (#323) (855aa7f)
- Onboard NASA wildfire (#275) (f593161)
- Onboard New York Trees dataset (#265) (2905308)
- Onboard Open Targets Genetics dataset (#318) (03b4f89)
- Onboard Open Targets Platform dataset (#313) (c5adce6)
- Onboard SEC Failure to Deliver dataset (#309) (afa6492)
- Rename Travel Sustainability to Travel Impact Model (#351) (83df285)
- Retrieve Composer bucket name when deploying DAGs (#312) (220f1d5)
- Update BLS - CPSAAT18 with 2021 data (#357) (a8f8856)
Features
- Added functionality to support a data folder to store schema files (#354) (f893dff)
- Unified variables and adds support for IAM policies (#341) (c4a45a0)
- Use poetry over pipenv (#337) (ca43066)
Bug Fixes
- Adds packages for docs dependency group (#339) (6721490)
- bump black version due to
click
dependency issue (#320) (cac6f18) - Fix generating BQ views for IDC dataset (#324) (5896865)
- Removed unecessary pathlib param from test_deploy_dag (#345) (45dd0b2)
- thelook_ecommerce - increase # of customers and revised order_items (#352) (ed1570d)
v3.0.0
3.0.0 (2022-03-24)
⚠ BREAKING CHANGES
- Reorganize pipelines and infra files into their respective folders (#292)
Features
- Reorganize pipelines and infra files into their respective folders (#292) (7408d44)
- Upgrade some pipelines to Airflow 2 and explicitly set pod storage (#283) (cbc3278)
Datasets
- Onboard Broad Genome References dataset (#316) (4f1f6db)
- Onboard Imaging Data Commons (IDC) v7 dataset (#287) (dfda5d9)
- Onboard ML dataset (#276) (48e51af)
- Onboard Travel Sustainability dataset (#280) (8e9731a)
- Onboard Travel Sustainability dataset (schema update) (#298) (7a13daa)
- Onboarding TheLook E-Commerce dataset (#294) (15f663a)
- Revise Google Political Ads due to new dataset version (#317) (6ffb0d0)
- Update "location" to GEOGRAPHY type for
datasets/google_trends
schema (#297) (9d9d3bd)
Docs
- Docs: Add SF 311 example (#310) (844a7fb)
- Docs: Add a query snippet to calculate the monthly average bike trips for
san_francisco_bikeshare
(#284) (7a009f6) - Docs: Added a template for tutorials (#299) (ae23d4b)
- Docs: SF 311 Calls - Predicting the number of calls per category using LSTM (#293) (88637ca)
Bug Fixes
v2.8.0
2.8.0 (2022-01-27)
Features
- Onboard America Health Rankings dataset (#244) (8ecbfda)
- Onboard American Community Survey dataset (#222) (861d0e6)
- Onboard Census Opportunity Atlas dataset (#248) (0e62f27)
- Onboard Census tract 2019 dataset (#272) (d2b5e52)
- Onboard CFPB Complaints dataset (#225) (9051773)
- Onboard Chronic Disease Indicators dataset (#242) (48c96f2)
- Onboard City Health Dashboard dataset (#250) (8cc5286)
- Onboard COVID-19 CDS EU dataset (#261) (d710dec)
- Onboard EUMETSAT Solar Forecasting dataset (#273) (db479cf)
- Onboard FDA Drug Enforcement dataset (#245) (53c98ac)
- Onboard gnomAD dataset (#264) (804b440)
- Onboard MLCommons Multilingual Spoken Words Corpus (MSWC) dataset (#252) (ec93997)
- Onboard News Hate Crimes dataset (#238) (9b242ef)
- Onboard Race and Economic Opportunity dataset (#236) (fe6c826)
- Onboarding COVID-19 (UK) Government Response dataset (#262) (914d39c)
- Update IDC dataset with new views and
v6
version (#266) (02cae2b)