Skip to content

Commit

Permalink
v1.5 final release: quick hits (dbt-labs#3249)
Browse files Browse the repository at this point in the history
resolves dbt-labs#3241
resolves dbt-labs#3201
resolves dbt-labs#3236
resolves dbt-labs#3207
resolves dbt-labs#3200
resolves dbt-labs#3212

## Checklist
Uncomment if you're publishing docs for a prerelease version of dbt
(delete if not applicable):
- [x] Add versioning components, as described in [Versioning
Docs](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-entire-pages)
- [x] Add a note to the prerelease version [Migration
Guide](https://github.com/dbt-labs/docs.getdbt.com/tree/current/website/docs/guides/migration/versions)

Removing or renaming existing pages (delete if not applicable):
- [x] Remove page from `website/sidebars.js`
- [x] Add an entry `_redirects`
- [x] [Ran link
testing](https://github.com/dbt-labs/docs.getdbt.com#running-the-cypress-tests-locally)
to update the links that point to the deleted page

---------

Co-authored-by: mirnawong1 <[email protected]>
  • Loading branch information
jtcohen6 and mirnawong1 authored Apr 25, 2023
1 parent 30b0241 commit ac579e3
Show file tree
Hide file tree
Showing 20 changed files with 221 additions and 103 deletions.
3 changes: 2 additions & 1 deletion _redirects
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ docs/dbt-cloud/using-dbt-cloud/cloud-model-timing-tab /docs/deploy/dbt-cloud-job
/docs/deploy/architecture /docs/cloud/about-cloud/architecture 301
/docs/deploy/single-tenant /docs/cloud/about-cloud/tenancy 301
/docs/deploy/multi-tenant /docs/cloud/about-cloud/tenancy 301
/docs/cloud/manage-access/about-access 301 /docs/cloud/manage-access/about-user-access 301
/docs/collaborate/git/connect-github /docs/cloud/git/connect-github 301
/docs/collaborate/git/connect-gitlab /docs/cloud/git/connect-gitlab 301
/docs/collaborate/git/connect-azure-devops /docs/cloud/git/connect-azure-devops 301
Expand All @@ -32,7 +33,7 @@ docs/dbt-cloud/using-dbt-cloud/cloud-model-timing-tab /docs/deploy/dbt-cloud-job
/docs/collaborate/publish/model-contracts /docs/collaborate/govern/model-contracts 301
/docs/collaborate/publish/model-access /docs/collaborate/govern/model-access 301
/docs/collaborate/publish/model-versions /docs/collaborate/govern/model-versions 301
/docs/collaborate/manage-access/about-access /docs/cloud/manage-access/about-access 301
/docs/collaborate/manage-access/about-access /docs/cloud/manage-access/about-user-access 301
/docs/collaborate/manage-access/seats-and-users /docs/cloud/manage-access/seats-and-users 301
/docs/collaborate/manage-access/self-service-permissions /docs/cloud/manage-access/self-service-permissions 301
/docs/collaborate/manage-access/enterprise-permissions /docs/cloud/manage-access/enterprise-permissions 301
Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/cloud/about-cloud-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ dbt Cloud is the fastest and most reliable way to deploy your dbt jobs. It conta
- [Connecting to a data platform](/docs/cloud/connect-data-platform/about-connections)
- Configuring access to [GitHub](/docs/cloud/git/connect-github), [GitLab](/docs/cloud/git/connect-gitlab), or your own [git repo URL](/docs/cloud/git/import-a-project-by-git-url).
- [Managing users and licenses](/docs/cloud/manage-access/seats-and-users)
- [Configuring secure access](/docs/cloud/manage-access/about-access)
- [Configuring secure access](/docs/cloud/manage-access/about-user-access)

These settings are intended for dbt Cloud administrators. If you need a more detailed first-time setup guide for specific data platforms, read our [quickstart guides](/docs/quickstarts/overview).

Expand Down
10 changes: 7 additions & 3 deletions website/docs/docs/cloud/manage-access/about-access.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
---
title: "About access"
id: "about-access"
title: "About user access in dbt Cloud"
id: "about-user-access"
---

## Overview
:::info "User access" is not "Model access"

**User groups and access** and **model groups and access** mean two different things. "Model groups and access" is a specific term used in the language of dbt-core. Refer to [Model access](/docs/collaborate/govern/model-access) for more info on what it means in dbt-core.

:::

dbt Cloud administrators can use dbt Cloud's permissioning model to control
user-level access in a dbt Cloud account. This access control comes in two flavors:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,5 +91,5 @@ Usage notes:
## Granular permissioning

The dbt Cloud Enterprise plan supports Role-Based access controls for
configuring granular in-app permissions. See [access control](/docs/cloud/manage-access/about-access)
configuring granular in-app permissions. See [access control](/docs/cloud/manage-access/about-user-access)
for more information on Enterprise permissioning.
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ If you're interested in learning more about an Enterprise plan, contact us at sa

The dbt Cloud Enterprise plan supports a number of pre-built permission sets to
help manage access controls within a dbt Cloud account. See the docs on [access
control](/docs/cloud/manage-access/about-access) for more information on Role-Based access
control](/docs/cloud/manage-access/about-user-access) for more information on Role-Based access
control (RBAC).

## Permission Sets
Expand Down
8 changes: 4 additions & 4 deletions website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Currently supported features include:
* Just-in-time provisioning

This document details the steps to integrate dbt Cloud with an identity
provider in order to configure Single Sign On and [role-based access control](/docs/cloud/manage-access/about-access#role-based-access-control).
provider in order to configure Single Sign On and [role-based access control](/docs/cloud/manage-access/about-user-access#role-based-access-control).

## Generic SAML 2.0 integrations

Expand Down Expand Up @@ -59,7 +59,7 @@ Additionally, you may configure the IdP attributes passed from your identity pro
| first_name | Unspecified | ${user.first_name} | The user's first name |
| last_name | Unspecified | ${user.last_name} | The user's last name |

dbt Cloud's [role-based access control](/docs/cloud/manage-access/about-access#role-based-access-control) relies
dbt Cloud's [role-based access control](/docs/cloud/manage-access/about-user-access#role-based-access-control) relies
on group mappings from the IdP to assign dbt Cloud users to dbt Cloud groups. To
use role-based access control in dbt Cloud, also configure your identity
provider to provide group membership information in user attribute called
Expand Down Expand Up @@ -258,7 +258,7 @@ Expected **Attributes**:
| `Last name` | Unspecified | `last_name` | The user's last name. |
| `Primary email`| Unspecified | `email` | The user's email address. |

9. To use [role-based access control](/docs/cloud/manage-access/about-access#role-based-access-control) in dbt Cloud, enter the groups in the **Group membership** field during configuration:
9. To use [role-based access control](/docs/cloud/manage-access/about-user-access#role-based-access-control) in dbt Cloud, enter the groups in the **Group membership** field during configuration:

| Google groups | App attributes |
| -------------- | -------------- |
Expand Down Expand Up @@ -383,7 +383,7 @@ We recommend using the following values:
| first_name | Unspecified | First Name |
| last_name | Unspecified | Last Name |

dbt Cloud's [role-based access control](/docs/cloud/manage-access/about-access#role-based-access-control) relies
dbt Cloud's [role-based access control](/docs/cloud/manage-access/about-user-access#role-based-access-control) relies
on group mappings from the IdP to assign dbt Cloud users to dbt Cloud groups. To
use role-based access control in dbt Cloud, also configure OneLogin to provide group membership information in user attribute called
`groups`:
Expand Down
95 changes: 80 additions & 15 deletions website/docs/docs/collaborate/govern/model-access.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,27 +9,54 @@ description: "Define model access with group capabilities"
This functionality is new in v1.5.
:::

:::info "Model access" is not "User access"

**Model groups and access** and **user groups and access** mean two different things. "User groups and access" is a specific term used in dbt Cloud to manage permissions. Refer to [User access](/docs/cloud/manage-access/about-user-access) for more info.

The two concepts will be closely related, as we develop multi-project collaboration workflows this year:
- Users with access to develop in a dbt project can view and modify **all** models in that project, including private models.
- Users in the same dbt Cloud account _without_ access to develop in a project cannot view that project's private models, and they can take a dependency on its public models only.
:::

## Related documentation
* [`groups`](build/groups)
* [`access`](resource-properties/access)

## Groups

Models can be grouped under a common designation with a shared owner.
Models can be grouped under a common designation with a shared owner. For example, you could group together all models owned by a particular team, related to modeling a specific data source (`github`), or

Why define model `groups`?
- It turns implicit relationships into an explicit grouping
- It enables you to mark specific models as "private" for use _only_ within that group
Why define model `groups`? There are two reasons:
- It turns implicit relationships into an explicit grouping, with a defined owner. By thinking about the interface boundaries _between_ groups, you can have a cleaner (less entangled) DAG. In the future, those interface boundaries could be appropriate as the interfaces between separate projects.
- It enables you to designate certain models as having "private" access—for use exclusively within that group. Other models will be restricted from referencing (taking a dependency on) those models. In the future, they won't be visible to other teams taking a dependency on your project—only "public" models will be.

If you follow our [best practices for structuring a dbt project](how-we-structure/1-guide-overview), you're probably already using subdirectories to organize your dbt project. It's easy to apply a `group` label to an entire subdirectory at once:

<File name="dbt_project.yml">

```yml
models:
my_project_name:
marts:
customers:
+group: customer_success
finance:
+group: finance
```
</File>
Each model can only belong to one `group`, and groups cannot be nested. If you set a different `group` in that model's yaml or in-file config, it will override the `group` applied at the project level.

## Access modifiers

Some models (not all) are designed to be referenced through the [ref](ref) function across groups. Models can set an [access modifier](https://en.wikipedia.org/wiki/Access_modifiers) to indicate their intended level of accessibility.
Some models are implementation details, meant for reference only within their group of related models. Other models should be accessible through the [ref](/reference/dbt-jinja-functions/ref) function across groups and projects. Models can set an [access modifier](https://en.wikipedia.org/wiki/Access_modifiers) to indicate their intended level of accessibility.

| Access | Referenceable by |
|-----------|-------------------------------|
| private | same group |
| protected | same project/package |
| public | any group, package or project |
| Access | Referenceable by |
|-----------|----------------------------------------|
| private | same group |
| protected | same project (or installed as package) |
| public | any group, package or project |

By default, all models are `protected`. This means that other models in the same project can reference them, regardless of their group. This is largely for backwards compatability when assigning groups to an existing set of models, as there may already be existing references across group assignments.

Expand All @@ -38,20 +65,58 @@ However, it is recommended to set the access modifier of a new model to `private
<File name="models/marts/customers.yml">

```yaml
# First, define the group and owner
groups:
- name: cx
- name: customer_success
owner:
name: Customer Success Team
email: [email protected]
# Then, add 'group' + 'access' modifier to specific models
models:
# This is a public model -- it's a stable & mature interface for other teams/projects
- name: dim_customers
group: cx
group: customer_success
access: public
# this is an intermediate transformation -- relevant to the CX team only
- name: int__customer_history_rollup
group: cx
# This is a private model -- it's an intermediate transformation intended for use in this context *only*
- name: int_customer_history_rollup
group: customer_success
access: private
# This is a protected model -- it might be useful elsewhere in *this* project,
# but it shouldn't be exposed elsewhere
- name: stg_customer__survey_results
group: customer_success
access: public
```

</File>

## FAQs

### How does model access relate to database permissions?

These are different!

Specifying `access: public` on a model does not trigger dbt to automagically grant `select` on that model to every user or role in your data platform when you materialize it. You have complete control over managing database permissions on every model/schema, as makes sense to you & your organization.

Of course, dbt can facilitate this by means of [the `grants` config](resource-configs/grants), and other flexible mechanisms. For example:
- Grant access to downstream queriers on public models
- Restrict access to private models, by revoking default/future grants, or by landing them in a different schema

As we continue to develop multi-project collaboration, `access: public` will mean that other teams are allowed to start taking a dependency on that model. This assumes that they've requested, and you've granted them access, to select from the underlying dataset.

### What about referencing models from a package?

For historical reasons, it is possible to `ref` a protected model from another project, _if that protected model is installed as a package_. This is useful for packages containing models for a common data source; you can install the package as source code, and run the models as if they were your own.

dbt Core v1.6 will introduce a new kind of `project` dependency, distinct from a `package` dependency, defined in `dependencies.yml`:
```yml
projects:
- project: jaffle_finance
```

Unlike installing a package, the models in the `jaffle_finance` project will not be pulled down as source code, or selected to run during `dbt run`. Instead, `dbt-core` will expect stateful input that enables it to resolve references to those public models.

Models referenced from a `project`-type dependency must use [two-argument `ref`](#two-argument-variant), including the project name. Only public models can be accessed in this way. That holds true even if the `jaffle_finance` project is _also_ installed as a package (pulled down as source code), such as in a coordinated deployment. If `jaffle_finance` is listed under the `projects` in `dependencies.yml`, dbt will raise an error if a protected model is referenced from outside its project.
31 changes: 22 additions & 9 deletions website/docs/docs/collaborate/govern/model-contracts.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,13 @@ Defining a dbt model is as easy as writing a SQL `select` statement. Your query

While this is ideal for quick and iterative development, for some models, constantly changing the shape of its returned dataset poses a risk when other people and processes are querying that model. It's better to define a set of upfront "guarantees" that define the shape of your model. We call this set of guarantees a "contract." While building your model, dbt will verify that your model's transformation will produce a dataset matching up with its contract, or it will fail to build.

## Where are contracts supported?

At present, model contracts are supported for:
- SQL models. Contracts are not yet supported for Python models.
- Models materialized as `table`, `view`, and `incremental` (with `on_schema_change: append_new_columns`). Views offer limited support for column names and data types, but not `constraints`. Contracts are not supported for `ephemeral`-materialized models.
- The most commonly used [data platforms](/docs/supported-data-platforms), including Snowflake, BigQuery, Databricks, and Redshift. However, the specific `constraints` that are supported or enforced can vary depending on the platform.

## How to define a contract

Let's say you have a model with a query like:
Expand Down Expand Up @@ -78,19 +85,25 @@ When building a model with a defined contract, dbt will do two things differentl

### Which models should have contracts?

Any model can define a contract. Defining contracts for "public" models that are being shared with other groups, teams, and (soon) dbt projects is especially important. For more, read about ["Model access"](model-access).
Any model meeting the criteria described above _can_ define a contract. We recommend defining contracts for ["public" models](model-access) that are being relied on downstream.
- Inside of dbt: Shared with other groups, other teams, and (in the future) other dbt projects.
- Outside of dbt: Reports, dashboards, or other systems & processes that expect this model to have a predictable structure. You might reflect these downstream uses with [exposures](exposures).

### How are contracts different from tests?

A model's contract defines the **shape** of the returned dataset.
A model's contract defines the **shape** of the returned dataset. If the model's logic or input data doesn't conform to that shape, the model does not build.

[Tests](docs/build/tests) are a more flexible mechanism for validating the content of your model. So long as you can write the query, you can run the test. Tests are also more configurable via `severity` and custom thresholds and are easier to debug after finding failures. The model has already been built, and the relevant records can be materialized in the data warehouse by [storing failures](resource-configs/store_failures).
[Tests](docs/build/tests) are a more flexible mechanism for validating the content of your model _after_ it's built. So long as you can write the query, you can run the test. Tests are more configurable, such as with [custom severity thresholds](severity). They are easier to debug after finding failures, because you can query the already-built model, or [store the failing records in the data warehouse](resource-configs/store_failures).

In a parallel for software APIs, the structure of the API response is the contract. Quality and reliability ("uptime") are also very important attributes of an API's quality, but not part of the contract per se, indicating a breaking change or requiring a version bump.
In some cases, you can replace a test with its equivalent constraint. This has the advantage of guaranteeing the validation at build time, and it probably requires less compute (cost) in your data platform. The prerequisites for replacing a test with a constraint are:
- Making sure that your data platform can support and enforce the constraint that you need. Most platforms only enforce `not_null`.
- Materializing your model as `table` or `incremental` (**not** `view` or `ephemeral`).
- Defining a full contract for this model by specifying the `name` and `data_type` of each column.

### Where are contracts supported?
**Why aren't tests part of the contract?** In a parallel for software APIs, the structure of the API response is the contract. Quality and reliability ("uptime") are also very important attributes of an API's quality, but they are not part of the contract per se. When the contract changes in a backwards-incompatible way, it is a breaking change that requires a bump in major version.

At present, model contracts are supported for:
- SQL models (not yet Python)
- Models materialized as `table`, `view`, and `incremental` (with `on_schema_change: append_new_columns`)
- On the most popular data platforms — but which `constraints` are supported/enforced varies by platform
### Can I define a "partial" contract?

Currently, dbt contracts apply to **all** columns defined in a model, and they require declaring explicit expectations about **all** of those columns. The explicit declaration of a contract is not an accident—it's very much the intent of this feature.

We are investigating the feasibility of supporting "inferred" or "partial" contracts in the future. This would enable you to define constraints and strict data typing for a subset of columns, while still detecting breaking changes on other columns by comparing against the same model in production. If you're interested, please upvote or comment on [dbt-core#7432](https://github.com/dbt-labs/dbt-core/issues/7432).
2 changes: 1 addition & 1 deletion website/docs/docs/deploy/dbt-cloud-job.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ today.

- You must have a [dbt Cloud account](https://www.getdbt.com/signup/) and [Developer seat license](/docs/cloud/manage-access/seats-and-users)
- You must have a dbt project connected to a [data platform](/docs/cloud/connect-data-platform/about-connections)
- You must have [access permission](/docs/cloud/manage-access/about-access) to create, edit, and run jobs
- You must have [access permission](/docs/cloud/manage-access/about-user-access) to create, edit, and run jobs
- You must set up a [deployment environment](/docs/collaborate/environments/dbt-cloud-environments)
- Your deployment environment must be on dbt version 1.0 or higher

Expand Down
Loading

0 comments on commit ac579e3

Please sign in to comment.