-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bigquery #455
Bigquery #455
Conversation
drewbanin
commented
Jun 7, 2017
•
edited
Loading
edited
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in addition to comments here, this needs:
- unit tests on completely new functionality (adapter task)
- integration tests on BQ
dbt/adapters/bigquery.py
Outdated
@classmethod | ||
def get_bigquery_credentials(cls, config): | ||
method = config.get('method') | ||
Creds = google.oauth2.service_account.Credentials |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't upcase variable names
dbt/adapters/bigquery.py
Outdated
Creds = google.oauth2.service_account.Credentials | ||
|
||
if method == 'oauth': | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if they use the oauth method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We initialize a bigquery client with the credentials
argument if it's a service account (from a private key file or the private key raw json).
If the credentials object is None
(as with oauth), then the bq client lib will use the oauth flow to authenticate the user. We force this flow if the user isn't authenticated by shelling out to gcloud
here
More info on gcp auth flows here
dbt/adapters/bigquery.py
Outdated
return Creds.from_service_account_info(details) | ||
|
||
error = ('Bad `method` in profile: "{}". ' | ||
'Should be "oauth" or "service-account"'.format(method)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or service-account-json?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also can we put this validation into a voluptuous schema for the bigquery profile?
dbt/adapters/bigquery.py
Outdated
from_name, to_name) | ||
raise dbt.exceptions.NotImplementedException(message) | ||
|
||
# hack because of current API limitations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate more? if there's a link explaining why we need this, you should add it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it's actually not super well documented which is part of the problem. BQ defaults to the "Legacy SQL" syntax when views are created via the API. You can set a useLegacySql = False
flag on the Table
object, but it looks like it's presently ignored by the API.
The best workaround i could find was to use the #standardSQL
header line
dbt/adapters/bigquery.py
Outdated
_, cursor = cls.add_query(profile, sql, 'master') | ||
res = cursor.fetchone() | ||
|
||
logger.debug("Cancel query '{}': {}".format(connection_name, res)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the behavior on ctrl+c here?
if we need to just raise an exception, you should remove all this code.
also please change this and the exception for add_query
to dbt.exceptions.NotImplementedException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmcarthur hadn't gotten there yet :)
Looks like we can actually issue a cancel command for running queries: https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmcarthur it doesn't look like it's possible to cancel a table/view creation request. Jobs can be canceled though, which may be good to keep in mind for the future. This branch will log an error that looks like this on ctrl-c:
The bigquery adapter does not support query cancellation. Some queries may still be running!
dbt/adapters/default.py
Outdated
for package, require in cls.requires.items(): | ||
logger.info("Installing {}".format(require)) | ||
pip_main(['install', require]) | ||
logger.info("Installed {} successfully!".format(require)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a unit test for this? i want to be sure it works on all platforms
dbt/contracts/connection.py
Outdated
@@ -33,10 +33,19 @@ | |||
Optional('role'): basestring, | |||
}) | |||
|
|||
bigquery_credentials_contract = Schema({ | |||
Required('method'): basestring, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All(basestring, Any(['oauth', 'service-account', 'service-account-json']))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can i just do one of these:
Any(['oauth', 'service-account', 'service-account-json'])
@@ -44,6 +43,7 @@ | |||
|
|||
|
|||
def make_log_dir_if_missing(log_dir): | |||
import dbt.clients.system | |||
dbt.clients.system.make_directory(log_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why move this import in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logger imports the system client, and the system client imports the logger. This breaks the circular import
|
||
dbt.adapters.postgres.PostgresAdapter.drop.assert_not_called() | ||
|
||
mock_adapter_truncate.assert_not_called() | ||
mock_adapter_rename.assert_not_called() | ||
mock_adapter_rename.assert_called_once() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes me nervous. was this not working properly before? my understanding that view models would just be created with the final name if they did not previously exist (vs created and then renamed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm really dissatisfied with our current drop/create/rename logic. It works ok, but that __dbt_tmp
suffix makes a lot of coordination between dbt components unduly difficult. I think this was an attempt at simplifying how it works, but let me revisit. Good catch
@cmcarthur this is pretty much good. I did change the way views are materialized (temp table, then renamed), so the changed unit test is warranted imo. I have a single integration test for BQ which passes on circle and appveyor. Think this is ready for another look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drewbanin this looks ok to ship. i did add one more comment.
dbt/adapters/bigquery.py
Outdated
|
||
@classmethod | ||
def get_status(cls, cursor): | ||
raise Exception("Not implemented") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NotImplementedException // remove code below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch 👍
* wip * minimum viable bq adapter * proper bq quoting * absolute madness * auto-install bq module * catch runtime exception for nodes * cleanup * more cleanup * pip critical logging * add adapter command * major wip * refactorin * closer * handle model errors + dependent skips * cleanup + test transactions (for now) * move model creation to materializers * fix for ephemeral models * override materializer for bq * error handling * bq tests working * commit tweaks for models * service accounts * service account json for bq * better error message if adapter is not installed * fix unit tests * pep8 * fix integration tests * codeclimate * fix typos * fix circular dep for imports * catch programming exception for runners * code review changes * refactoring for code climate * selector cleanup * fix bug for erin * comment * handle cancellation on ctrl-c for bq (log warning) * better bq validation * test bq validation * add uninstall flag to adapter task * remove pip hacking nonsense * bq integration tests * remove initialize call for pg * fix bq integration tests * pep8 * remove -x opt from toxfile * handle notimplemented for bq better * missing import for seed task * notimplemented for bq automatic commit by git-black, original commits: 8b8b3aa