Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow custom schema def for tmp tables generated by incremental #659

Merged
merged 7 commits into from
May 28, 2024

Conversation

pierrebzl
Copy link
Contributor

@pierrebzl pierrebzl commented May 24, 2024

Description

As mentioned in #613, I would like for all the temporary intermediate tables to be created in a different schema that we could specify as a config argument of the model.

This would enable us to control and limit access or visibility of those tmp tables.
In some cases of parallel processing using unique_tmp_table_suffix flag, we end up creating a lot of those transient tables and we would like to avoid them to appear under the production/staging schemas of our glue catalog.

Models used to test - Optional

Checklist

  • You followed contributing section
  • You kept your Pull Request small and focused on a single feature or bug fix.
  • You added unit testing when necessary
  • You added functional testing when necessary

@pierrebzl pierrebzl marked this pull request as ready for review May 24, 2024 20:24
@nicor88 nicor88 added the enable-functional-tests Label to trigger functional testing label May 25, 2024
@nicor88
Copy link
Contributor

nicor88 commented May 25, 2024

Few notes/consideration:

  • the integration tests failed because of an issue with snapshots, I retriggered the ci, but if the issue persist must be addressed. Edit: the issue persist, please have a look
  • I'm wondering if such property could apply to materialized tables. There are few cases where tmp tables are created:
    • in case of hive ha tables
    • in case we have more than 100 partitions
    • in case of iceberg the initial table is a tmp table, then renamed - that is the only case where there is an extra layer of complexity, because we need to guarantee that the final S3 layout is properly specified

@pierrebzl
Copy link
Contributor Author

pierrebzl commented May 27, 2024

Thank you for your feedback.

I'm wondering if such property could apply to materialized tables
Yes, I think it definitely could be by adjusting the table materialization macro.
I was thinking to add this

  {% set tmp_schema = config.get('tmp_schema') %}

  {%- if tmp_schema is not none -%}
    {%- set tmp_schema = tmp_schema -%}
  {% else %}
    {%- set tmp_schema = schema -%}
  {%- endif -%}

then pass schema=tmp_schema to api.Relation.create( https://github.com/dbt-athena/dbt-athena/blob/v1.8.1/dbt/include/athena/macros/materializations/models/table/table.sql#L30
Maybe it can be part of a separate PR, what do you think?

I would need to explore more, I'm not very familiar with all the different use case of materialized table you mentioned.
Also this would require to write new tests.

@nicor88
Copy link
Contributor

nicor88 commented May 27, 2024

Let's keep table materialization out, and even consider to use another issue to track that idea :)

@nicor88
Copy link
Contributor

nicor88 commented May 27, 2024

@pierrebzl looks good, few nits, also let's consider to use temp_schema as model parameter for consistency with the code implementation, thanks

Copy link
Contributor

@nicor88 nicor88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job 💯

@nicor88 nicor88 added enable-functional-tests Label to trigger functional testing and removed enable-functional-tests Label to trigger functional testing labels May 28, 2024
@nicor88 nicor88 merged commit 97430f9 into dbt-labs:main May 28, 2024
11 of 13 checks passed
kodiakhq bot referenced this pull request in cloudquery/policies Jun 5, 2024
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [dbt-athena-community](https://togithub.com/dbt-athena/dbt-athena) | minor | `==1.7.2` -> `==1.8.2` |

---

### Release Notes

<details>
<summary>dbt-athena/dbt-athena (dbt-athena-community)</summary>

### [`v1.8.2`](https://togithub.com/dbt-athena/dbt-athena/releases/tag/v1.8.2)

[Compare Source](https://togithub.com/dbt-athena/dbt-athena/compare/v1.8.1...v1.8.2)

### What's Changed

#### Fixes

-   fix: Add wait_random_exponential for query retries by [@&#8203;svdimchenko](https://togithub.com/svdimchenko) in [https://github.com/dbt-athena/dbt-athena/pull/655](https://togithub.com/dbt-athena/dbt-athena/pull/655)
-   fix: Resolve error when cloning Python models ([#&#8203;645](https://togithub.com/dbt-athena/dbt-athena/issues/645)) by [@&#8203;jeancochrane](https://togithub.com/jeancochrane) in [https://github.com/dbt-athena/dbt-athena/pull/651](https://togithub.com/dbt-athena/dbt-athena/pull/651)
-   fix: Fixed table_type for GOVERNED tables by [@&#8203;svdimchenko](https://togithub.com/svdimchenko) in [https://github.com/dbt-athena/dbt-athena/pull/661](https://togithub.com/dbt-athena/dbt-athena/pull/661)

#### Features

-   feat: Set unique table suffix to allow parallel incremental executions by [@&#8203;pierrebzl](https://togithub.com/pierrebzl) in [https://github.com/dbt-athena/dbt-athena/pull/650](https://togithub.com/dbt-athena/dbt-athena/pull/650)
-   feat: Allow custom schema def for tmp tables generated by incremental by [@&#8203;pierrebzl](https://togithub.com/pierrebzl) in [https://github.com/dbt-athena/dbt-athena/pull/659](https://togithub.com/dbt-athena/dbt-athena/pull/659)
-   feat: Implement iceberg retry logic by [@&#8203;svdimchenko](https://togithub.com/svdimchenko) in [https://github.com/dbt-athena/dbt-athena/pull/657](https://togithub.com/dbt-athena/dbt-athena/pull/657)

#### Dependencies

-   chore: Update moto requirement from ~=5.0.7 to ~=5.0.8 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/660](https://togithub.com/dbt-athena/dbt-athena/pull/660)
-   chore: Bumped version to 1.8.2 for release by [@&#8203;svdimchenko](https://togithub.com/svdimchenko) in [https://github.com/dbt-athena/dbt-athena/pull/663](https://togithub.com/dbt-athena/dbt-athena/pull/663)

#### Docs

-   docs: Cleanup README grammar, punctuation, and capitalisation by [@&#8203;dfsnow](https://togithub.com/dfsnow) in [https://github.com/dbt-athena/dbt-athena/pull/654](https://togithub.com/dbt-athena/dbt-athena/pull/654)

#### New Contributors

-   [@&#8203;jeancochrane](https://togithub.com/jeancochrane) made their first contribution in [https://github.com/dbt-athena/dbt-athena/pull/651](https://togithub.com/dbt-athena/dbt-athena/pull/651)
-   [@&#8203;pierrebzl](https://togithub.com/pierrebzl) made their first contribution in [https://github.com/dbt-athena/dbt-athena/pull/650](https://togithub.com/dbt-athena/dbt-athena/pull/650)

**Full Changelog**: dbt-labs/dbt-athena@v1.8.1...v1.8.2

### [`v1.8.1`](https://togithub.com/dbt-athena/dbt-athena/releases/tag/v1.8.1)

[Compare Source](https://togithub.com/dbt-athena/dbt-athena/compare/v1.7.2...v1.8.1)

#### What's Changed

##### Relevant notes

⚠️ 1.8.1 version is equivalent to 1.8.0 in term of features and fixes.

You can install the changes from this release via

    pip install dbt-athena-community==1.8.1

##### Features

-   feat: Add column meta to glue column parameters by [@&#8203;SoumayaMauthoorMOJ](https://togithub.com/SoumayaMauthoorMOJ) in [https://github.com/dbt-athena/dbt-athena/pull/644](https://togithub.com/dbt-athena/dbt-athena/pull/644)

##### Dependencies

-   chore: Update moto requirement from ~=5.0.6 to ~=5.0.7 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/648](https://togithub.com/dbt-athena/dbt-athena/pull/648)

##### Docs

-   docs: Cleanup Python models section of README by [@&#8203;dfsnow](https://togithub.com/dfsnow) in [https://github.com/dbt-athena/dbt-athena/pull/643](https://togithub.com/dbt-athena/dbt-athena/pull/643)

#### New Contributors

-   [@&#8203;dfsnow](https://togithub.com/dfsnow) made their first contribution in [https://github.com/dbt-athena/dbt-athena/pull/643](https://togithub.com/dbt-athena/dbt-athena/pull/643)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 4am on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zODMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM4My4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJhdXRvbWVyZ2UiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enable-functional-tests Label to trigger functional testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants