Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/commands for validating file #24

Merged
merged 8 commits into from
Nov 22, 2022

Conversation

b-per
Copy link
Collaborator

@b-per b-per commented Nov 21, 2022

Solves #12 and most of #23 (except the validating that env_var already exists)

This PR creates a new command called validate with a flag to check on dbt Cloud that:

  • Project IDs
  • Environment IDs
  • Deferred to Job IDs

all exist in dbt Cloud

Usage: main.py validate [OPTIONS] CONFIG

  Check that the config file is valid

  CONFIG is the path to your jobs.yml config file.

Options:
  --online  Connect to dbt Cloud to check that IDs are correct.
  --help    Show this message and exit.

It also returns an exit code different from 0 if the validation fails

Supports offline (just the format) and online validations of IDs
@b-per
Copy link
Collaborator Author

b-per commented Nov 22, 2022

Merged with main to remove conflicts and a couple of fixes for the handling of env_vars following the refactor that happened a week ago.

@@ -242,7 +242,13 @@ def update_env_var(
return None

env_var_id: int = all_env_vars[custom_env_var.name].id
url = f"{self.base_url}/api/v3/accounts/{self.account_id}/projects/{project_id}/environment-variables/{env_var_id}/"

# the endpoint is different for updating an overwrite vs creating one
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trick here is update_env_var() does not create env vars, BUT, it might have to create an env var job overwrite if none exists today.

And depending on the case, the endpoint is different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent find, @b-per!


def __init__(self, **data: Any):
data["raw_value"] = data["value"]
data["raw_value"] = data["value"] if "value" in data else data["display_value"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some tests and the API gave me back a raw_value but no value, so I added the extra logic to handle this use case.

@click.argument("config", type=click.File("r"))
def cli(config):
def sync(config):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this one sync and the other one validate

Copy link
Contributor

@nicholasyager nicholasyager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@b-per Fantastic changes! I left a couple high-level architectural comments, but nothing blocking 👍🏻 ✅

src/main.py Show resolved Hide resolved
@@ -242,7 +242,13 @@ def update_env_var(
return None

env_var_id: int = all_env_vars[custom_env_var.name].id
url = f"{self.base_url}/api/v3/accounts/{self.account_id}/projects/{project_id}/environment-variables/{env_var_id}/"

# the endpoint is different for updating an overwrite vs creating one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent find, @b-per!

Comment on lines +141 to +176
logger.info(f"Checking that Project IDs are valid")
if config_project_ids - cloud_project_ids:
logger.error(
f"❌ The following project IDs are not valid: {config_project_ids - cloud_project_ids}"
)
online_check_issues = True

# Check if some Environment IDs in the config are not in Cloud
logger.info(f"Checking that Environments IDs are valid")
if config_environment_ids - cloud_environment_ids:
logger.error(
f"❌ The following environment IDs are not valid: {config_environment_ids - cloud_environment_ids}"
)
online_check_issues = True

# In case deferral jobs are mentioned, check that they exist
deferral_jobs = set(
[
job.deferring_job_definition_id
for job in defined_jobs
if job.deferring_job_definition_id
]
)
if deferral_jobs:
logger.info(f"Checking that Deferring Job IDs are valid")
cloud_jobs = dbt_cloud.get_jobs()
cloud_job_ids = set([job.id for job in cloud_jobs])
if deferral_jobs - cloud_job_ids:
logger.error(
f"❌ The following deferral job IDs are not valid: {deferral_jobs - cloud_job_ids}"
)
online_check_issues = True

if online_check_issues:
# return an error to handle with bash/CI
sys.exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a long-term architectural comment:

This logic is very sound, and could likely be abstracted into validation functions that can be composed easily. Something to consider for a MUCH later date 😄

@nicholasyager nicholasyager self-requested a review November 22, 2022 17:09
@b-per b-per merged commit 21d6418 into main Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pydantic validations Add a validator or apply command to check that the current YAML file is correct
2 participants