BigQuery: Add ability to pass in a table ID instead of a query to the %%bigquery magic.#9170
Conversation
added default patch to unit tests
6761217 to
77e62c5
Compare
| rows = client.list_rows(table_id, max_results=max_results) | ||
| except Exception as ex: | ||
| error = str(ex) | ||
| if error: |
There was a problem hiding this comment.
could this be simplified to return from the exception?
|
|
||
| error = None | ||
|
|
||
| if not re.search(r"\s", query.rstrip()): |
There was a problem hiding this comment.
What scenario is this protecting against? Is there a scenario where unicode strings will not remove space characters via rstrip?
What do we expect in the else case?
There was a problem hiding this comment.
Related: Does this repository enforce code coverage? Is there a case where we test there being unstoppable whitespace and not taking this branch?
There was a problem hiding this comment.
This condition is actually testing for queries that contain whitespace characters that aren't removed by rstrip (so any whitespace that isn't a trailing newline). The assumption being made (as described in #9105 ) is that anything containing whitespace is a SQL query and won't take this branch, while anything string without whitespace is a table_id and will take this branch.
So anything regular SQL query would fall into the else case. There are tests that check whether query strings without spaces will be interpreted as table IDs and a test that checks if a string without whitespace that isn't a valid table_id raises an appropriate error message.
There was a problem hiding this comment.
I can add a comment in the code if that would be helpful.
There was a problem hiding this comment.
I would also write this down in a comment, i.e. that anything without whitespace is assumed to be table identifier which triggers a different use case of the magic command.
There was a problem hiding this comment.
The change generally works.
I did notice, however, that the logic is sensitive to the leading whitespace:
┌───────────────────────────────────────────┐
│ %%bigquery --max_results=6 │
| bigquery-public-data.samples.shakespeare │
└───────────────────────────────────────────┘
Executing query with job ID: a0f195e6-f748-4c42-8c3b-d89abaa37c9f
Query executing: 0.92s
ERROR:
400 Syntax error: Unexpected identifier "bigquery" at [1:3]
Wouldn't it be better to strip the whitespace on both sides first, or why we only do .rstrip()?
| query_job = _run_query(client, query, job_config=job_config) | ||
| except Exception as ex: | ||
| error = str(ex) | ||
| return _print_error(str(ex), args.destination_var) |
There was a problem hiding this comment.
This can be slightly confusing for the reader at first glance, because _print_error() itself does not return anything, it just has a side effect. I would simply express the same in two lines (a sole return in its own).
|
|
||
| error = None | ||
|
|
||
| if not re.search(r"\s", query.rstrip()): |
There was a problem hiding this comment.
I would also write this down in a comment, i.e. that anything without whitespace is assumed to be table identifier which triggers a different use case of the magic command.
| max_results = None | ||
|
|
||
| if not re.search(r"\s", query.rstrip()): | ||
| table_id = query.rstrip() |
There was a problem hiding this comment.
BTW, stripping whitespace from the same value again is not necessary, we could store the stripped string into a variable the first time we do it.
Third of 3 PRs towards resolving #9105 as described in review for #9147

Screenshot of feature in notebook: