Skip to content

[Bug] Clone materialization raises an error when cloning Python models #645

Closed
@jeancochrane

Description

Is this a new bug in dbt-athena?

  • I believe this is a new bug in dbt-athena
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Running dbt clone on a Python model raises the following error:

$ dbt clone --select reporting.ratio_stats --state master-cache
16:46:38  Running with dbt=1.7.11
16:46:38  Registered adapter: athena=1.7.1
16:46:39  Found 82 models, 5 seeds, 415 tests, 136 sources, 10 exposures, 0 metrics, 595 macros, 0 groups, 0 semantic models
16:46:39
16:46:44  Concurrency: 5 threads (target='dev')
16:46:44
Failed to execute query.
Traceback (most recent call last):
  File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/pyathena/common.py", line 522, in _execute
    query_id = retry_api_call(
  File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/pyathena/util.py", line 85, in retry_api_call
    return retry(func, *args, **kwargs)
  File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/botocore/client.py", line 565, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/botocore/client.py", line 1021, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 5:5: mismatched input 'None'. Expecting: <query>
Failed to execute query.
16:46:48
16:46:48  Completed with 1 error and 0 warnings:
16:46:48
16:46:48    Runtime Error in model reporting.ratio_stats (models/reporting/reporting.ratio_stats.py)
  An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 5:5: mismatched input 'None'. Expecting: <query>
16:46:48
16:46:48  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

The root cause of the error is that dbt's builtin clone materialization macro calls the dbt-athena view materialization macro, which in turn calls create_or_replace_view, which references the sql context object instead of compiled_code, which returns None for Python models. This results in a clone view query of the form create or replace view <clone_view> as None, which raises the above error. Here's the line in create_or_replace_view that causes the error:

https://github.com/dbt-athena/dbt-athena/blob/59d005a46c4a97d8438d050a55ab499e669c0c04/dbt/include/athena/macros/materializations/models/view/create_or_replace_view.sql#L32

And here are the definitions of compiled_code and sql in dbt-core (source):

    @contextproperty()
    def compiled_code(self) -> Optional[str]:
        # TODO: avoid routing on args.which if possible
        if getattr(self.model, "defer_relation", None) and self.config.args.which == "clone":
            # TODO https://github.com/dbt-labs/dbt-core/issues/7976
            return f"select * from {self.model.defer_relation.relation_name or str(self.defer_relation)}"  # type: ignore[union-attr]
        elif getattr(self.model, "extra_ctes_injected", None):
            # TODO CT-211
            return self.model.compiled_code  # type: ignore[union-attr]
        else:
            return None


    @contextproperty()
    def sql(self) -> Optional[str]:
        # only set this for sql models, for backward compatibility
        if self.model.language == ModelLanguage.sql:  # type: ignore[union-attr]
            return self.compiled_code
        else:
            return None

Expected Behavior

dbt clone should not raise an error when cloning Python models. It should support clone materialization by referencing the compiled_code context object when generating the clone view query rather than the sql context object.

Steps To Reproduce

  1. Setup a dbt config with two targets, dev and prod
  2. Define and build a dummy Python model that just runs print("hello world") in the prod target
  3. Rename the target/ directory to prod-state/
  4. Run dbt clone --state prod-state
  5. Confirm you see the same error as listed above

Environment

- OS: Ubuntu 22.04.4
- Python: Python 3.10.12
- dbt: 1.7.11
- dbt-athena-community: 1.7.2

Additional Context

This particular bug is blocking us on our use of clone materialiazation, but I think it also implicates the create_table_csv_upload macro and a few snapshot macros like hive_snapshot_merge_sql that also reference the sql context variable instead of compiled_code. I see that the docs for Python models explicitly list the lack of snapshot materialization support as a limitation, so I'm wondering if there's a deeper reason why these parts of the codebase haven't yet been transitioned from sql to compiled_code?

Either way, we've tested out switching to compiled_code for clone materialization in our environment and it seems to work, so I'm happy to put up a patch with our changes. I just want to make sure that I'm not barging into a discussion that has been put on the backburner for a good reason.

Metadata

Assignees

No one assigned

    Labels

    type:bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions