Phase 1 for storing schemas for later use.#7761
Conversation
| file_obj = file_or_path | ||
| else: | ||
| try: | ||
| file_obj = open(file_or_path) |
There was a problem hiding this comment.
An important difference with this path is that we must close file_obj if we open it based on a path, but we must not close file_obj if we are given a file-like object.
I recommend using a with statement when we are provided a path to make sure we always close it. You may want to refactor this to add a _schema_from_json_file_object() private helper method that does the loading part without the file closing part.
There was a problem hiding this comment.
Added helper method per recommendation along with the statement to close the file.
| try: | ||
| file_obj = open(file_or_path) | ||
| except OSError: | ||
| raise TypeError(_NEED_JSON_FILE_ARGUMENT) |
There was a problem hiding this comment.
In general we use ValueError for unexpected input argument.
There was a problem hiding this comment.
Updated with correct exception.
| try: | ||
| json_data = json.load(file_obj) | ||
| except JSONDecodeError: | ||
| raise TypeError(_NEED_JSON_FILE_ARGUMENT) |
There was a problem hiding this comment.
I'd prefer if we let this raise (don't catch it). That way the user doesn't lose context about what is wrong with the input file.
There was a problem hiding this comment.
Took this piece out.
| file_obj = destination | ||
| else: | ||
| try: | ||
| file_obj = open(destination, mode="w") |
There was a problem hiding this comment.
Just as in schema_from_json, we must close file_obj if we open it based on a path, but we must not close file_obj if we are given a file-like object.
There was a problem hiding this comment.
Updated with with statement to close when appropriate.
bigquery/tests/unit/test_client.py
Outdated
| } | ||
| ]""" | ||
| expected = list() | ||
| json_data = json.loads(file_content) |
There was a problem hiding this comment.
You're basically repeating the function definition here. That's not ideal. You've basically written a change-detector test. I'd prefer to see actual schema.SchemaField constructors in tests.
I do like that you've mocked out open and the file contents, so keep that. Just change how you construct expected.
There was a problem hiding this comment.
Made changes to include schema.SchemaField as requested.
bigquery/tests/unit/test_client.py
Outdated
| with pytest.raises(ValueError): | ||
| client._do_multipart_upload(file_obj, {}, file_obj_len + 1, None) | ||
|
|
||
| def test__schema_from_json(self): |
There was a problem hiding this comment.
Nit: since the function name is schema_from_json, the test name should be test_schema_from_json (just one underscore between test and schema.)
Ditto for schema_to_json test function name.
There was a problem hiding this comment.
Updated test names appropriately.
|
|
||
| def schema_from_json(self, file_or_path): | ||
| """Takes a file object or file path that contains json that describes | ||
| a table schema. |
There was a problem hiding this comment.
Nit: This and the other docstrings (except for the first line) look a like they are indented 4 spaces too many.
There was a problem hiding this comment.
Fixed indentations.
| ) | ||
| return row_iterator | ||
|
|
||
| def _schema_from_json_file_object(self, file): |
There was a problem hiding this comment.
Nit: since file is a built-in, use file_ or file_obj as the name.
There was a problem hiding this comment.
Changed name as directed.
| schema_field_list = list() | ||
| json_data = json.load(file) | ||
|
|
||
| for field in json_data: |
There was a problem hiding this comment.
FYI: This loop could be replaced with a Python "list comprehension".
There was a problem hiding this comment.
Changed to be a list comprehension.
| return self._schema_from_json_file_object(file_or_path) | ||
| else: | ||
| try: | ||
| with open(file_or_path) as file: |
There was a problem hiding this comment.
Nit: file should be file_obj since file is a built-in.
There was a problem hiding this comment.
Updated name accordingly.
| """ | ||
| if isinstance(file_or_path, io.IOBase): | ||
| return self._schema_from_json_file_object(file_or_path) | ||
| else: |
There was a problem hiding this comment.
Nit: Since the above line returns, no need for else.
There was a problem hiding this comment.
Removed the else as recommended.
| json_schema_list.append(schema_field) | ||
|
|
||
| if isinstance(destination, io.IOBase): | ||
| destination.write(json.dumps(json_schema_list, indent=2, sort_keys=True)) |
There was a problem hiding this comment.
Use json.dump(schema_list, destination) instead. Then you can use BytesIO in the tests, too.
I'd prefer if json.dump wasn't repeated (here we do want to be DRY). Replace this with file_obj = destination and remove the else.
There was a problem hiding this comment.
Refactored into a helper function because I couldn't get it to work with the context manager below just by removing the else.
bigquery/tests/unit/test_client.py
Outdated
| ]""" | ||
|
|
||
| expected = list() | ||
| expected.append(SchemaField("qtr", "STRING", "REQUIRED", "quarter")) |
There was a problem hiding this comment.
No need for .append. Instead, construct the list inline.
expected = [
SchemaField(...),
SchemaField(...),
...
]
There was a problem hiding this comment.
Switched to suggested method.
bigquery/tests/unit/test_client.py
Outdated
| _mock_file().write.assert_called_once_with(file_content) | ||
| # This assert is to make sure __exit__ is called in the context | ||
| # manager that opens the file in the function | ||
| _mock_file().__exit__.assert_called_once_with(None, None, None) |
There was a problem hiding this comment.
We don't care about the actual arguments, so just assert_called_once will work.
There was a problem hiding this comment.
Removed the arguments from the assert.
bigquery/tests/unit/test_client.py
Outdated
| open_patch = mock.patch("builtins.open", mock.mock_open()) | ||
| with open_patch as _mock_file: | ||
| actual = client.schema_to_json(schema_list, mock_file_path) | ||
| _mock_file.assert_called_once_with(mock_file_path, mode="w") |
There was a problem hiding this comment.
Might need wb when using json.dump. (b for "binary" mode)
| List of schema field objects. | ||
| """ | ||
| json_data = json.load(file_obj) | ||
| return [SchemaField.from_api_repr(f) for f in json_data] |
There was a problem hiding this comment.
Wonderful!
One nit-pick, though. Style-wise in our client libraries and samples, we avoid single-letter variable names, even in list comprehensions. Let's rename f to field.
There was a problem hiding this comment.
Variable name has been updated!
bigquery/tests/unit/test_client.py
Outdated
|
|
||
| open_patch = mock.patch("builtins.open", mock.mock_open()) | ||
| with open_patch as _mock_file: | ||
| with mock.patch("json.dump") as _mock_dump: |
There was a problem hiding this comment.
I like your thinking here. It's definitely nicer to compare lists than it is to compare JSON strings.
Nit: Let's write both with statement on one line. https://stackoverflow.com/a/1073814/101923
Nit: No need for leading underscore, that's usually to indicate a "private" variable, which isn't relevant inside a test.
with open_patch as mock_file, mock.patch("json.dump") as mock_dump:
There was a problem hiding this comment.
Replace existing with statements with suggested line.
bigquery/tests/unit/test_client.py
Outdated
| client = self._make_client() | ||
|
|
||
| client.schema_to_json(schema_list, fake_file) | ||
| assert file_content == fake_file.getvalue() |
There was a problem hiding this comment.
Let's call json.loads(fake_file.getvalue()) and compare to a list of dictionaries like you do in the test of the path version.
There was a problem hiding this comment.
The test has been updated with the list of dictionaries and updated assert.
| try: | ||
| with open(file_or_path) as file_obj: | ||
| return self._schema_from_json_file_object(file_obj) | ||
| except OSError: |
There was a problem hiding this comment.
Oops, coverage is failing on this line and the similar line in the other function. We'd need a test where open() fails.
Honestly, I'd be okay removing the try block and letting these errors just raise, too.
There was a problem hiding this comment.
Removed the try block as suggested for both functions.
bigquery/tests/unit/test_client.py
Outdated
| client = self._make_client() | ||
| mock_file_path = "/mocked/file.json" | ||
|
|
||
| open_patch = mock.patch("builtins.open", mock.mock_open()) |
There was a problem hiding this comment.
From the test logs, it looks like this is tripping up Python 2. https://stackoverflow.com/a/34677735/101923
=================================== FAILURES ===================================
____________ TestClientUpload.test_schema_from_json_with_file_path _____________
self = <tests.unit.test_client.TestClientUpload object at 0x7f47e2dce4d0>
def test_schema_from_json_with_file_path(self):
from google.cloud.bigquery.schema import SchemaField
file_content = """[
{
"description": "quarter",
"mode": "REQUIRED",
"name": "qtr",
"type": "STRING"
},
{
"description": "sales representative",
"mode": "NULLABLE",
"name": "rep",
"type": "STRING"
},
{
"description": "total sales",
"mode": "NULLABLE",
"name": "sales",
"type": "FLOAT"
}
]"""
expected = [
SchemaField("qtr", "STRING", "REQUIRED", "quarter"),
SchemaField("rep", "STRING", "NULLABLE", "sales representative"),
SchemaField("sales", "FLOAT", "NULLABLE", "total sales"),
]
client = self._make_client()
mock_file_path = "/mocked/file.json"
open_patch = mock.patch(
"builtins.open", new=mock.mock_open(read_data=file_content)
)
> with open_patch as _mock_file:
tests/unit/test_client.py:5202:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.nox/unit-2-7/lib/python2.7/site-packages/mock/mock.py:1353: in __enter__
self.target = self.getter()
.nox/unit-2-7/lib/python2.7/site-packages/mock/mock.py:1523: in <lambda>
getter = lambda: _importer(target)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
target = 'builtins'
def _importer(target):
components = target.split('.')
import_path = components.pop(0)
> thing = __import__(import_path)
E ImportError: No module named builtins
Remember you can run against Python 2 by using nox, locally.
We could check the Python version with sys.version_info.major and change what object you are patching.
There was a problem hiding this comment.
Actually six.PY2 is probably a better way to detect. https://pythonhosted.org/six/
There was a problem hiding this comment.
Fixed all the issues and the code is compatible with python 2 now.
Part of feature request: #3419
@tswast @engelke