[Bug] Clone materialization raises an error when cloning Python models #645
Description
Is this a new bug in dbt-athena?
- I believe this is a new bug in dbt-athena
- I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
Running dbt clone
on a Python model raises the following error:
$ dbt clone --select reporting.ratio_stats --state master-cache
16:46:38 Running with dbt=1.7.11
16:46:38 Registered adapter: athena=1.7.1
16:46:39 Found 82 models, 5 seeds, 415 tests, 136 sources, 10 exposures, 0 metrics, 595 macros, 0 groups, 0 semantic models
16:46:39
16:46:44 Concurrency: 5 threads (target='dev')
16:46:44
Failed to execute query.
Traceback (most recent call last):
File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/pyathena/common.py", line 522, in _execute
query_id = retry_api_call(
File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/pyathena/util.py", line 85, in retry_api_call
return retry(func, *args, **kwargs)
File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
do = self.iter(retry_state=retry_state)
File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/tenacity/__init__.py", line 314, in iter
return fut.result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
result = fn(*args, **kwargs)
File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/botocore/client.py", line 565, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/botocore/client.py", line 1021, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 5:5: mismatched input 'None'. Expecting: <query>
Failed to execute query.
16:46:48
16:46:48 Completed with 1 error and 0 warnings:
16:46:48
16:46:48 Runtime Error in model reporting.ratio_stats (models/reporting/reporting.ratio_stats.py)
An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 5:5: mismatched input 'None'. Expecting: <query>
16:46:48
16:46:48 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
The root cause of the error is that dbt's builtin clone materialization macro calls the dbt-athena
view materialization macro, which in turn calls create_or_replace_view
, which references the sql
context object instead of compiled_code
, which returns None
for Python models. This results in a clone view query of the form create or replace view <clone_view> as None
, which raises the above error. Here's the line in create_or_replace_view
that causes the error:
And here are the definitions of compiled_code
and sql
in dbt-core
(source):
@contextproperty()
def compiled_code(self) -> Optional[str]:
# TODO: avoid routing on args.which if possible
if getattr(self.model, "defer_relation", None) and self.config.args.which == "clone":
# TODO https://github.com/dbt-labs/dbt-core/issues/7976
return f"select * from {self.model.defer_relation.relation_name or str(self.defer_relation)}" # type: ignore[union-attr]
elif getattr(self.model, "extra_ctes_injected", None):
# TODO CT-211
return self.model.compiled_code # type: ignore[union-attr]
else:
return None
@contextproperty()
def sql(self) -> Optional[str]:
# only set this for sql models, for backward compatibility
if self.model.language == ModelLanguage.sql: # type: ignore[union-attr]
return self.compiled_code
else:
return None
Expected Behavior
dbt clone
should not raise an error when cloning Python models. It should support clone materialization by referencing the compiled_code
context object when generating the clone view query rather than the sql
context object.
Steps To Reproduce
- Setup a dbt config with two targets,
dev
andprod
- Define and build a dummy Python model that just runs
print("hello world")
in theprod
target - Rename the
target/
directory toprod-state/
- Run
dbt clone --state prod-state
- Confirm you see the same error as listed above
Environment
- OS: Ubuntu 22.04.4
- Python: Python 3.10.12
- dbt: 1.7.11
- dbt-athena-community: 1.7.2
Additional Context
This particular bug is blocking us on our use of clone materialiazation, but I think it also implicates the create_table_csv_upload
macro and a few snapshot macros like hive_snapshot_merge_sql
that also reference the sql
context variable instead of compiled_code
. I see that the docs for Python models explicitly list the lack of snapshot materialization support as a limitation, so I'm wondering if there's a deeper reason why these parts of the codebase haven't yet been transitioned from sql
to compiled_code
?
Either way, we've tested out switching to compiled_code
for clone materialization in our environment and it seems to work, so I'm happy to put up a patch with our changes. I just want to make sure that I'm not barging into a discussion that has been put on the backburner for a good reason.