GCP ç Dataform ãã¤ãã« GA ã«ãªãã¾ããããåæã«å®æå®è¡ã®ä»çµã¿ãåºã¦ãä¸éãã®æ©è½ãæã£ãæãããããã¾ããè²·å以åã® SaaS ç(Legacy ç)ãã GCP çã«ç§»è¡ããæ!!
ããã GitHub ãªãã¸ããªã¨é£æºããå ´åãç»å ´äººç©ãå¤ãã¦é£ãããªã£ã¦ããã¨æãã ç¹ã« GCP ã«é¦´æã¿ããªãã£ãããã¼ã¿åæãã¡ã¤ã³ã®äººã¯å°ããããå ¬å¼ããã¥ã¡ã³ãã«ã¯ step by step ã§æ¸ãã¦ãããã®ã®ããªãå¿ è¦ãªã®ãåãããªãã¾ã¾è¨å®ãããã¨ã«ãªãã
ãªã®ã§å ¨ä½åãå³ã«ãããè£è¶³ããã¨ãã趣æ¨ã®ã¨ã³ããªã§ãã
Dataform ã¨ã¯
Dataform ã¨ã¯...ã¨ãã話ã¯ãã¾ãããå ¬å¼ããã¥ã¡ã³ããä¸éã®ããã°è¨äºãèªããã
Dataform ã使ãã¨ããã¼ãã«å士ã®ä¾åã«åºã¥ãã¦é çªã« SQL ãå®è¡ãã¦ãã¼ã¿ãã¤ãã©ã¤ã³ãä½ã£ãããä¾åé¢ä¿ãå¯è¦åãããããã¼ã¿ã»ããåã« suffix ãä»ä¸ãã¦ã¹ãã¼ã¸ã³ã°ç°å¢ãä½ã£ããã§ããã
ç¾ç¶ã® Dataform ã§ã«ãã¼ããããªãè¤éãªè¦ä»¶ãå ·ä½çã«ã¯æ´æ°ãå¤æã«ãªã¢ã«ã¿ã¤ã æ§ãè¦æ±ãããããå¤é¨ã®ä¾åãããªã¬ã¼ã«åºã¥ããã¤ãã©ã¤ã³ãªã©ã§ã¯ Cloud Dataflow ã Clodu Composer ãå¿ è¦ã«ãªããã©ããã¼ã¿ã¦ã§ã¢ãã¦ã¹å®è£ ã®ããªãã®ç¯å²ãã«ãã¼ãããµã¼ãã¹ã§ãã
Google Cloud ã®ãªã½ã¼ã¹ã¨ã®é¢ä¿
ãã®ãããã®ããã¥ã¡ã³ãã§ç»å ´ããç»å ´äººç©ãã¾ã¨ããã
- Quickstart: Create and execute a SQL workflow  | Dataform  | Google Cloud
- Create a Dataform repository  | Google Cloud
- Connect to a third-party Git repository  | Dataform  | Google Cloud
â ã¾ã Dataform API ãæå¹ã«ãã¦ãã³ã³ã½ã¼ã«ãã Dataform ãªãã¸ããªãä½ãã
ãã㨠⡠Dataform ãµã¼ãã¹ã¢ã«ã¦ã³ãã Cloud IAM ã«çãã¦ããã
service-{PROJECT_NUMBER}@gcp-sa-dataform.iam.gserviceaccount.com
ã®å½¢å¼ã
IAM ä¸è¦§ã§ã¯ãã¦ã¼ã¶ãä½ã£ããã®ä»¥å¤ã¯ããã©ã«ãã§ã¯é表示ãªã®ã§ã"Google æä¾ã®ãã¼ã«ä»ä¸ãå«ãã" ããã§ãã¯ããã¨ããã
éè¦ãªã®ã¯ãDataform ã¯ãã®ãµã¼ãã¹ã¢ã«ã¦ã³ãã®æ¨©éã§åä½ããã¨ãããã¨ã
ãªã®ã§ããã¤ã«ä¸éãã®æ¨©éãä»ä¸ãã¦ãããå¿
è¦ãããã
⢠GitHub ã«è¡ããDataform ã®ã³ã¼ããä¿åãããªãã¸ããªãä½ãã
â£ãã®ãªãã¸ããªã GCP å´ããæä½ãããã Personal access token (PAT) ãæãåºã
ä» GitHub ã«ã¯ 2 種é¡ã® PAT ãããã
- personal access tokens (classic)
- fine-grained personal access token
classic ã¯ä»¥åãããããããªãã¸ããªåä½ã§ã®ã¢ã¯ã»ã¹å¶å¾¡ãã§ããªããæ°ãã fine-grained ã使ã ⢠ã§ä½ã£ããªãã¸ããªã®ã¿è¨±å¯ããã®ãè¯ããclassic ã§ã¯ãscope ã« repo
ããfine-grained ã§ã¯ Repository permissions 㧠Contents: read and write
ãä¸ããã°ããã
⤠åã³ GCP å´ã«æ»ããSecret Manager ã§ã·ã¼ã¯ã¬ãããä½æãã⣠ã§ä½æãã PAT ããã¼ã¸ã§ã³ã¨ãã¦ä¿åããã
⥠ã·ã¼ã¯ã¬ããã® "権é" ã¿ãã§ãâ¡ ã® Dataform ãµã¼ãã¹ã¢ã«ã¦ã³ãã« roles/secretmanager.secretAccessor
ãã¼ã«ãä»ä¸ã㦠Dataform ããã®ãã¼ã¯ã³ãå©ç¨ã§ããããã«ããã
è£è¶³ããã¨ãããã§ã¯åå¥ã®ã·ã¼ã¯ã¬ããã¸ã®ã¢ã¯ã»ã¹æ¨©ãä»ä¸ãã¦ãããIAM ã³ã³ã½ã¼ã«ã®ç®ç«ã¤ãã¿ã³ãããã¼ã«ãä»ä¸ããã¨ãããã¸ã§ã¯ãã¬ãã«ã®ã¢ã¯ã»ã¹æ¨©ã¨ãªããä¸éãã®ã·ã¼ã¯ã¬ããã«ã¢ã¯ã»ã¹ã§ãã権éãä»ä¸ãããã¨ã«ãªãã®ã§æ³¨æã
ãã㧠Dataform ã³ã³ã½ã¼ã«ã§ã³ã¼ããæ¸ã㦠GitHub 㸠push ã§ããããã«ãªã£ãã
⦠Dataform ã使ãç®ç㯠BigQuery ã®ãã¼ãã«ãä½æãããæ´æ°ããããããã¨ãªã®ã§ãIAM 㧠Dataform ãµã¼ãã¹ã¢ã«ã¦ã³ãã« BigQuery ãæä½ãã以ä¸ã®æ¨©éãä»ä¸ããã
- BigQuery ã¸ã§ãã¦ã¼ã¶ã¼ (
roles/bigquery.jobUser
) - BigQuery ãã¼ã¿ç·¨éè
(
roles/bigquery.dataEditor
)
ãã®ä»ãå¥ã®ããã¸ã§ã¯ãã®ãã¼ã¿ãåç
§ãããå ´åã¯ããã¡ãã§ã BigQuery ãã¼ã¿é²è¦§è
(roles/bigquery.dataViewer
) ãä»ä¸ãããæ´æ°ããããªããã¼ã¿ç·¨éè
ããDataform å®è¡ããã¸ã§ã¯ã㨠BigQuery ãªã½ã¼ã¹ã®ããã¸ã§ã¯ãã¯ç°ãªã£ã¦ãã¦ãæ§ããªãã
Google Spreadsheet ãªã© Google ãã©ã¤ãã®ãã¼ã¿ãå¤é¨ãã¼ãã«ã¨ãã¦æ±ãããå ´åã¯ã該å½ã®ãã¡ã¤ã«ã Dataform ãµã¼ãã¹ã¢ã«ã¦ã³ãã«å ±æããã°ããã
(ãªãã·ã§ã³) CI ããã®ä»ã®ã¯ã¼ã¯ããã¼ããå®è¡ããå¯è½æ§ããããªããããç¨ã®ãµã¼ãã¹ã¢ã«ã¦ã³ããä½ã£ã¦ããã¨ããã
â¡ ã®ãµã¼ãã¹ã¢ã«ã¦ã³ãã¯ãGCP ã管çãããã®ã§ Service Agentã¨ãå¼ã°ããã追å ã®ãã¼ã«ãä»ä¸ãããã¨ã¯ã§ãããã©ãã¦ã¼ã¶ã¼ãæã代ãã£ããèªè¨¼æ å ±ãå ¥æãããã¯ã§ããªãã®ã§ãDataform ã³ã³ã½ã¼ã«ã®å¤ããå©ç¨ã§ããªãã
IAM ã§ã®æ¨©é㯠Google ã°ã«ã¼ãã«å¯¾ãã¦ãä»ä¸ã§ããã®ã§ãDataform ãµã¼ãã¹ã¨ã¼ã¸ã§ã³ãã¨ãCI ç¨ã®ãµã¼ãã¹ã¢ã«ã¦ã³ããæå±ããã¦ããã¦ããã®ã°ã«ã¼ãã«å¯¾ã㦠BigQuery ã®æ¨©éãä¸ããããã«ãã¦ããã¨ã2 ã¤ã®ãµã¼ãã¹ã¢ã«ã¦ã³ãããããä¸ããªãã¦æ¸ãã権éãé£ãéããã¨ããªããªãã
GitHub ãªãã¸ããªã¨é£æºããªãã¦ã使ãã
ããã¾ã§ GitHub ãªãã¸ããªã¨é£æºããã¤ããã§æ¸ãã¦ãããã©ãé£æºããªãã¦ã Dataform ã使ããã¨ã¯ã§ããã
ãã ããå·®åãè¦ããç¹å®ã® revision ã«æ»ããããã UI ã¯ãªãã®ã§ãgit ãªãã¸ããªã¨ãã¦ã®ä¾¿å©ãã¯ã»ã¼å¾ãããªãã2023-05-14 ç¾å¨ãHEAD ã«ã³ããããããæ»ããã©ãããWorkspace ã®å¤æ´ã main ã« push ããæä½ã ãã§ããã
ã¡ãã£ã¨è§¦ã£ã¦ã¿ããã ããªããé£æºé¨åã¯æ°ã«ããªãã§ããã¨æããå¿ è¦ã«ãªã£ãã GitHub ãªãã¸ããªã¨æ¥ç¶ããã°ãããLegacy çã¨éã£ã¦æ¢ã«ãã¡ã¤ã«ã®ãã GitHub ãªãã¸ããªã¨é£æºãããã¨ãã§ãããã
CI
GitHub é£æºãããã GitHub Actions 㧠CI ãæ´åãããã§ããã
ä¸ã®ãªãã·ã§ã³é¨åã§ä½ã£ããµã¼ãã¹ã¢ã«ã¦ã³ãã OIDC é£æºã§å©ç¨ãã¦ã
- push ããã compile ã«æåããããã§ã㯠& dry-run ãã
- main ã«ãã¼ã¸ããã Dataform ãå®è¡ãã
ãªã©ããããã«ãã¦ãã¾ãã
CI ã§è²ã
ããéã®æ³¨æç¹ã¨ãã¦ãã«ã¼ãã® package.json
ã®ä¾å㯠@dataform/core
ã®ã¿ããæå°éã«ããå¿
è¦ããããDataform ã³ã³ã½ã¼ã«ã§ SQLX ãã³ã³ãã¤ã«ããéã«ãããã¯ã°ã©ã¦ã³ã㧠npm install
ãèµ°ã£ã¦ããããã ãã©ãä¾åãå¢ããã¨ã¿ã¤ã ã¢ã¦ããã¦ã¾ã¨ãã«ä½¿ããªããªãã@dataform/cli
ãå¢ããã ãã§ãã¡ã
CI ã®ä¸ã§ã ãã¤ã³ã¹ãã¼ã«ããã $ npx -p @dataform/cli@latest dataform compile
ãªã©ãã¦ãã¾ãã
Can't install certain devDependencies [264598563] - Visible to Public - Issue Tracker
ãã¾ã
Dataform é¢é£ã ãæãåºãã Terraformãgoogle-beta ãè¦ãã¾ãã
resources.tf
variable "project" { type = string } variable "region" { type = string default = "asia-northeast1" } variable "dataform_repository_url" { type = string } variable "dataform_default_branch" { type = string default = "main" } data "google_project" "this" { project_id = var.project } resource "google_secret_manager_secret" "dataform_repository_token" { secret_id = "dataform_thirdparty_repository_token" replication { automatic = true } } resource "google_secret_manager_secret_version" "dataform_repository_token" { secret = google_secret_manager_secret.dataform_repository_token.id secret_data = "******" } # secret ã state ã«æ¸ããã¦ãã¾ãã®ã§ãæ®æ®µã¯æã§ã³ã³ã½ã¼ã«ãã secret 㨠version ãä½ã£ã¦ # secret 㯠terraform importã version 㯠data ãªã½ã¼ã¹ã§åç §ãã¿ããã«ãã¦ãã # # data "google_secret_manager_secret" "dataform_repository_token" { # secret = google_secret_manager_secret.dataform_repository_token.id # } resource "google_dataform_repository" "dataform" { provider = google-beta name = "my-dataform-repository" region = var.region git_remote_settings { url = var.dataform_repository_url default_branch = var.dataform_default_branch authentication_token_secret_version = google_secret_manager_secret_version.dataform_repository_token.id } } # Dataform Service Agent ã¸ã®ãã¼ã«ä»ä¸ locals { dataform_agent_member = "serviceAccount:service-${data.google_project.this.number}@gcp-sa-dataform.iam.gserviceaccount.com" } resource "google_project_iam_member" "dataform_agent_roles" { for_each = toset([ "roles/bigquery.dataEditor", "roles/bigquery.jobUser", "roles/dataform.editor", "roles/dataform.serviceAgent", ]) project = var.project role = each.value member = local.dataform_agent_member depends_on = [ # Dataform Service Agent ã¯æåã® repository ãä½æãããå¾ã«ä½ããã google_dataform_repository.dataform ] } resource "google_secret_manager_secret_iam_member" "dataform_repository_token" { secret_id = google_secret_manager_secret.dataform_repository_token.secret_id role = "roles/secretmanager.secretAccessor" member = local.dataform_agent_member depends_on = [ google_dataform_repository.dataform ] } # CI ãªã©ãã Dataform ãå®è¡ããããã®ãµã¼ãã¹ã¢ã«ã¦ã³ã resource "google_service_account" "dataform" { account_id = "dataform" display_name = "dataform" description = "Dataform å®è¡ç¨ãµã¼ãã¹ã¢ã«ã¦ã³ã" } resource "google_project_iam_member" "dataform_sa_roles" { for_each = toset([ "roles/bigquery.dataEditor", "roles/bigquery.jobUser", "roles/dataform.editor", ]) project = var.project role = each.value member = google_service_account.dataform.member }
ãããã«
Dataform ãè²·åããã¦é²åãæ¢ã¾ã£ã¦ããéã«ã©ã¤ãã«ã® dbt ãæ®åãã¦ãã¾ã£ãæãããã¾ãããå·»ãè¿ãã¦ã»ããã§ããã
Google Cloud çµã¿è¾¼ã¿ã ããBigQuery ã DWH ã¨ãã¦ä½¿ã£ã¦ããããã使ãå§ããããããç¡æã ã dbt Cloud ã®ããã«ã¦ã¼ã¶ã¼æ°èª²éã§ããªãã第ä¸ã®é¸æè¢ã«ãªãã¨æãã¾ãã
æ°ãåããããå®æå®è¡ã¨ãã®éç¥ã«ã¤ãã¦æ¸ããã¨æãã¾ãã