dbt Artifacts Less Package

Changes the dimensional models (exposures, models, seeds, snapshots, sources, tests) from always inserting every node each time a command is run, to only inserting if it has changed since the last run. Lots of macros have been edited and I removed all other stg and dim/fct models in favor of building out snapshots and marts in the actual dbt project, so I can run those models less frequently.

This requires a snapshot (in your own project) to capture changes over time, but results in much, much smaller source tables. See https://github.com/dbt-labs/dbt-proserv for examples there.

Why did we do this? An example customer has 2 Billion rows in their tests source model and 200 Million in their models source model. This is insane given they currently have 653 models and 6050 data tests.

Video walkthrough utilizing downstream models from dbt-proserv

dbt Artifacts Package

This package builds a mart of tables and views describing the project it is installed in. In pre V1 versions of the package, the artifacts dbt produces were uploaded to the warehouse, hence the name of the package. That's no longer the case, but the name has stuck!

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for more information.

Supported Data Warehouses

The package currently supports

Snowflake ✅

See the generated dbt docs site for documentation on each model.

Quickstart

Add this package to your packages.yml:

packages:
  - git: "https://github.com/dbt-labs/dbt-artifacts-less.git" # git URL
    revision: "main"

👷 Make sure to fix at least the minor version, to avoid issues when a new release is open. See the notes on upgrading below for more detail.

Run dbt deps to install the package

Add an on-run-end hook to your dbt_project.yml

on-run-end:
  - "{{ dbt_artifacts.upload_results(results) }}"

We recommend adding a conditional here so that the upload only occurs in your production environment, such as:

on-run-end:
  - "{% if target.name == 'prod' %}{{ dbt_artifacts.upload_results(results) }}{% endif %}"

Run the tables!
```
dbt run --select dbt_artifacts
```

Notes on upgrading

Due to the structure of the project, when additional fields are added, the package needs to be re-run to ensure the tables include the new field, or it will simply error on the hook. These changes will always be implemented within a new minor version, so make sure that the version you use in packages.yml reflects this.

To upgrade and re-build, update the version number within packages.yml and then run:

dbt deps
dbt run --select dbt_artifacts

Make sure this is updated in any database that you use your code base in.

Configuration

The following configuration can be used to specify where the raw (sources) data is uploaded, and where the dbt models are created:

models:
  ...
  dbt_artifacts:
    +database: your_destination_database # optional, default is your target database
    +schema: your_destination_schema # optional, default is your target schema
    staging:
      +database: your_destination_database # optional, default is your target database
      +schema: your_destination_schema # optional, default is your target schema
    sources:
      +database: your_sources_database # optional, default is your target database
      +schema: your sources_database # optional, default is your target schema

Note that model materializations and on_schema_change configs are defined in this package's dbt_project.yml, so do not set them globally in your dbt_project.yml (see docs on configuring packages):

Configurations made in your dbt_project.yml file will override any configurations in a package (either in the dbt_project.yml file of the package, or in config blocks).

Environment Variables

If the project is running in dbt Cloud, the following five columns (https://docs.getdbt.com/docs/dbt-cloud/using-dbt-cloud/cloud-environment-variables#special-environment-variables) will be automatically populated in the fct_dbt__invocations model:

dbt_cloud_project_id
dbt_cloud_job_id
dbt_cloud_run_id
dbt_cloud_run_reason_category
dbt_cloud_run_reason

To capture other environment variables in the fct_dbt__invocations model in the env_vars column, add them to the env_vars variable in your dbt_project.yml. Note that environment variables with secrets (DBT_ENV_SECRET_) can't be logged.

vars:
  env_vars: [
    'ENV_VAR_1',
    'ENV_VAR_2',
    '...'
  ]

dbt Variables

To capture dbt variables in the fct_dbt__invocations model in the dbt_vars column, add them to the dbt_vars variable in your dbt_project.yml.

vars:
  dbt_vars: [
    'var_1',
    'var_2',
    '...'
  ]

Creating custom marts tables

Multiple modelled dim and fct models have been provided for ease of use, but we recognise that some use cases may require custom ones. To this end, you can disable all but the raw sources tables using the following in your dbt_project.yml file:

# dbt_project.yml

models:
  dbt_artifacts:
    +enabled: false
    sources:
      +enabled: true

In these sources tables, you will find a JSON column all_results which contains a JSON blob of the results object used, which you can use in your own analysis:

exposures
models
seeds
snapshots
sources
tests

This column can cause queries to become too long - particularly in BigQuery. Therefore, if you want to disable this column, you can make use of the dbt_artifacts_exclude_all_results variable, and set this to true in your dbt_project.yml file.

# dbt_project.yml
vars:
  dbt_artifacts_exclude_all_results: true

There is additionally a variable, dbt_artifacts_environment_aware, which if enabled, will add columns to your dim source models (models, seeds, snapshots, sources, tests) to add environment columns:

dbt_cloud_environment_name: The name of the dbt Cloud environment in which dbt is running.
dbt_cloud_environment_type: The type of dbt Cloud environment in which dbt is running, e.g. 'dev', 'staging', or 'prod'.

This is helpful if you run in multiple environments, to show how dimensions might change depending on your selected environment.

# dbt_project.yml
vars:
  dbt_artifacts_environment_aware: true

Upgrading from 1.x to >=2.0.0

If you were using the following variables:

vars:
  dbt_artifacts_database: your_db
  dbt_artifacts_schema: your_schema

You must now move these to the following model configs:

models:
  ...
  dbt_artifacts:
    sources:
      +database: your_db
      +schema: your_schema

That's because the raw tables are now managed as dbt models. Be aware of any impact that generate_database_name and generate_schema_name macros may have on the final database/schema.

Migrating From <1.0.0 to >=1.0.0

To migrate your existing data from the dbt-artifacts versions <=0.8.0, a helper macro and guide is provided. This migration uses the old fct_* and dim_* models' data to populate the new sources. The steps to use the macro are as follows:

If not already completed, run dbt run-operation create_dbt_artifacts_tables to make your source tables.
Run dbt run-operation migrate_from_v0_to_v1 --args '<see-below-for-arguments>'.
Verify that the migration completes successfully.
Manually delete any database objects (sources, staging models, tables/views) from the previous dbt-artifacts version.

The arguments for migrate_from_v0_to_v1 are as follows:

argument	description
`old_database`	the database of the <1.0.0 output (`fct_`/`dim_`) models
`old_schema`	the schema of the <1.0.0 output (`fct_`/`dim_`) models
`new_database`	the target database that the artifact sources are in
`new_schema`	the target schema that the artifact sources are in

The old and new database/schemas do not have to be different, but it is explicitly defined for flexible support.

An example operation is as follows:

dbt run-operation migrate_from_v0_to_v1 --args '{old_database: analytics, old_schema: dbt_artifacts, new_database: analytics, new_schema: artifact_sources}'

Acknowledgements

Thank you to Tails.com for initial development and maintenance of this package. On 2021/12/20, the repository was transferred from the Tails.com GitHub organization to Brooklyn Data Co.

The macros in the early versions package were adapted from code shared by Kevin Chan and Jonathan Talmi of Snaptravel.

Thank you for sharing your work with the community!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
integration_test_project		integration_test_project
macros		macros
models		models
.editorconfig		.editorconfig
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
dbt_project.yml		dbt_project.yml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbt Artifacts Less Package

dbt Artifacts Package

Contributing

Supported Data Warehouses

Quickstart

Notes on upgrading

Configuration

Environment Variables

dbt Variables

Creating custom marts tables

Upgrading from 1.x to >=2.0.0

Migrating From <1.0.0 to >=1.0.0

Acknowledgements

About

Releases

Packages

Languages

License

dbt-labs/dbt-artifacts-less

Folders and files

Latest commit

History

Repository files navigation

dbt Artifacts Less Package

dbt Artifacts Package

Contributing

Supported Data Warehouses

Quickstart

Notes on upgrading

Configuration

Environment Variables

dbt Variables

Creating custom marts tables

Upgrading from 1.x to >=2.0.0

Migrating From <1.0.0 to >=1.0.0

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages