Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingestion): add logging, make job more resilient to errors #4331

Merged
merged 12 commits into from
Mar 7, 2022

Conversation

anshbansal
Copy link
Collaborator

@anshbansal anshbansal commented Mar 7, 2022

Added

  • Snowflake SaaS version check to know what version someone is running https://community.snowflake.com/s/topic/0TO0Z000000Unu5WAC/releases
  • Snowflake Role (useful in case a default role from username is being used)
  • Grants to the snowflake role (skipped for accountadmin)
  • DataHub CLI version in Source reports
  • DataHub GMS version in DataHub Rest Sink Reports

Fix

  • Missing permission from the snowflake documentation
  • Report error properly instead of huge stacktrace in case of snowflake credentials, looping any sql based tables/views
  • Warnings and errors not being added to reports for snowflake, bigquery source
  • Missing connect_args for lineage for some cases in snowflake sources

The gms version is shown in sink report as follows

{'records_written': 55,
 'warnings': [],
 'failures': [],
 'downstream_start_time': datetime.datetime(2022, 3, 7, 20, 7, 8, 207437),
 'downstream_end_time': datetime.datetime(2022, 3, 7, 20, 7, 49, 110622),
 'downstream_total_latency_in_seconds': 40.903185,
 'gms_version': 'RC'}

The added fields are shown in report for snowflake as follows

'cli_version': 'unavailable (installed in develop mode)',
'saas_version': '6.5.1',
 'role': 'datahub_role',
 'role_grants': ['USAGE granted on DATABASE ASEEM_TEST_DB',
                 'SELECT granted on EXTERNAL_TABLE ASEEM_TEST_DB.PUBLIC.ET1',
                 'SELECT granted on EXTERNAL_TABLE ASEEM_TEST_DB.PUBLIC.ET2',
                 'SELECT granted on EXTERNAL_TABLE ASEEM_TEST_DB.PUBLIC.TEST_BUCKET_ASEEM_TEST_CSV',
                 'USAGE granted on SCHEMA ASEEM_TEST_DB.PUBLIC',
                 'SELECT granted on TABLE ASEEM_TEST_DB.PUBLIC.NORMAL_TABLE_FROM_EXTERNAL',
                 'SELECT granted on TABLE ASEEM_TEST_DB.PUBLIC.T1_V1_ET1',
                 'SELECT granted on TABLE ASEEM_TEST_DB.PUBLIC.T1_V1_ET2',
                 'SELECT granted on TABLE ASEEM_TEST_DB.PUBLIC.T1_V2_ET1',
                 'SELECT granted on TABLE ASEEM_TEST_DB.PUBLIC.T1_V2_ET2',
                 'SELECT granted on TABLE ASEEM_TEST_DB.PUBLIC.T2_V1_ET1',
                 'SELECT granted on TABLE ASEEM_TEST_DB.PUBLIC.T2_V1_ET2',
                 'SELECT granted on TABLE ASEEM_TEST_DB.PUBLIC.T2_V2_ET1',
                 'SELECT granted on TABLE ASEEM_TEST_DB.PUBLIC.T2_V2_ET2',
                 'SELECT granted on TABLE ASEEM_TEST_DB.PUBLIC.TABLE_EXTERNAL_TABLE_DOWNSTREAM',
                 'SELECT granted on TABLE ASEEM_TEST_DB.PUBLIC.TABLE_VIEW_DOWNSTREAM',
                 'SELECT granted on TABLE ASEEM_TEST_DB.PUBLIC.TABLE_VIEW_DOWNSTREAM2',
                 'SELECT granted on VIEW ASEEM_TEST_DB.PUBLIC.NORMAL_VIEW_FROM_EXTERNAL_TABLE',
                 'SELECT granted on VIEW ASEEM_TEST_DB.PUBLIC.V1_ET1',
                 'SELECT granted on VIEW ASEEM_TEST_DB.PUBLIC.V1_ET2',
                 'SELECT granted on VIEW ASEEM_TEST_DB.PUBLIC.V2_ET1',
                 'SELECT granted on VIEW ASEEM_TEST_DB.PUBLIC.V2_ET2',
                 'OPERATE granted on WAREHOUSE COMPUTE_WH',
                 'USAGE granted on WAREHOUSE COMPUTE_WH']}

Failure report for wrong credentials snowflake

 'failures': {'version': ['Error: (snowflake.connector.errors.DatabaseError) 250001 (08001): Failed to connect to DB: '
                          'abcdef.snowflakecomputing.com:443. Incorrect username or password was specified.\n'
                          '(Background on this error at: http://sqlalche.me/e/13/4xp6)']},

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

Copy link
Contributor

@treff7es treff7es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! :)

@anshbansal anshbansal changed the title feat(snowflake): add role, sass version, and role grants check feat(snowflake): add role, saas version, and role grants check Mar 7, 2022
@github-actions
Copy link

github-actions bot commented Mar 7, 2022

Unit Test Results (metadata ingestion)

       5 files         5 suites   50m 36s ⏱️
   347 tests    347 ✔️   0 💤 0
1 579 runs  1 548 ✔️ 31 💤 0

Results for commit 6656e76.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Mar 7, 2022

Unit Test Results (build & test)

  39 files   -   36    39 suites   - 36   17s ⏱️ - 18m 1s
168 tests  - 379  168 ✔️  - 326  0 💤  - 52  0  - 1 

Results for commit 6656e76. ± Comparison against base commit 7efc88e.

♻️ This comment has been updated with latest results.

@anshbansal anshbansal changed the title feat(snowflake): add role, saas version, and role grants check feat(snowflake): add logging, make job more resilient to errors Mar 7, 2022
@anshbansal anshbansal changed the title feat(snowflake): add logging, make job more resilient to errors feat(ingestion): add logging, make job more resilient to errors Mar 7, 2022
@anshbansal anshbansal changed the title feat(ingestion): add logging, make job more resilient to errors fix(ingestion): add logging, make job more resilient to errors Mar 7, 2022
url,
connect_args=self.config.get_sql_alchemy_connect_args(),
**self.config.options,
)

def inspect_version(self) -> Any:
db_engine = self.get_metadata_engine()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible this will be null?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Do you see any case?


def inspect_role_grants(self) -> Any:
db_engine = self.get_metadata_engine()
cur_role = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this equal to self.config.role

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not want to edit the existing config so used a different variable. I am putting it in self.report.role for reporting.

@@ -27,6 +27,9 @@ grant select on all views in database <your-database> to role datahub_role;
grant usage on future schemas in database "<your-database>" to role datahub_role;
grant select on future tables in database "<your-database>" to role datahub_role;

// Grant privileges on snowflake default database - needed for lineage
grant imported privileges on DATABASE snowflake to role datahub_role;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already on line 46

@@ -27,6 +27,9 @@ grant select on all views in database <your-database> to role datahub_role;
grant usage on future schemas in database "<your-database>" to role datahub_role;
grant select on future tables in database "<your-database>" to role datahub_role;

// Grant privileges on snowflake default database - needed for lineage
grant imported privileges on DATABASE snowflake to role datahub_role;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already on line 46

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in #4341. Thanks for catching

@shirshanka shirshanka merged commit beb51eb into datahub-project:master Mar 7, 2022
@anshbansal anshbansal deleted the check_snowflake_grants branch March 8, 2022 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants