update branch locally

dbt-labs · Jan 9, 2025 · 2a22de2 · 2a22de2
2 parents f876964 + 4745203
commit 2a22de2
Show file tree

Hide file tree

Showing 62 changed files with 1,110 additions and 309 deletions.
diff --git a/website/blog/2022-04-19-dbt-cloud-postman-collection.md b/website/blog/2022-04-19-dbt-cloud-postman-collection.md
@@ -19,7 +19,7 @@ is_featured: true
 
 The dbt Cloud API has well-documented endpoints for creating, triggering and managing dbt Cloud jobs. But there are other endpoints that aren’t well documented yet, and they’re extremely useful for end-users. These endpoints exposed by the API enable organizations not only to orchestrate jobs, but to manage their dbt Cloud accounts programmatically. This creates some really interesting capabilities for organizations to scale their dbt Cloud implementations.
 
-The main goal of this article is to spread awareness of these endpoints as the docs are being built & show you how to use them. 
+The main goal of this article is to spread awareness of these endpoints as the docs are being built & show you how to use them.
 
 <!--truncate-->
 
@@ -45,7 +45,7 @@ Beyond the day-to-day process of managing their dbt Cloud accounts, many organiz
 
 *Below this you’ll find a series of example requests - use these to guide you or [check out the Postman Collection](https://dbtlabs.postman.co/workspace/Team-Workspace~520c7ac4-3895-4779-8bc3-9a11b5287c1c/request/12491709-23cd2368-aa58-4c9a-8f2d-e8d56abb6b1dlinklink) to try it out yourself.*
 
-## Appendix 
+## Appendix
 
 ### Examples of how to use the Postman Collection
 
@@ -55,7 +55,7 @@ Let’s run through some examples on how to make good use of this Postman Collec
 
 One common question we hear from customers is “How can we migrate resources from one dbt Cloud project to another?” Often, they’ll create a development project, in which users have access to the UI and can manually make changes, and then migrate selected resources from the development project to a production project once things are ready.
 
-There are several reasons one might want to do this, including: 
+There are several reasons one might want to do this, including:
 
 - Probably the most common is separating dev/test/prod environments across dbt Cloud projects to enable teams to build manually in a development project, and then automatically migrate those environments & jobs to a production project.
 - Building “starter projects” they can deploy as templates for new teams onboarding to dbt from a learning standpoint.
@@ -90,10 +90,10 @@ https://cloud.getdbt.com/api/v3/accounts/28885/projects/86704/environments/75286
 
 #### Push the environment to the production project
 
-We take the response from the GET request above, and then to the following: 
+We take the response from the GET request above, and then to the following:
 
 1. Adjust some of the variables for the new environment:
-    - Change the the value of the “project_id” field from 86704 to 86711
+    - Change the value of the “project_id” field from 86704 to 86711
     - Change the value of the “name” field from “dev-staging” to “production–api-generated”
     - Set the “custom_branch” field to “main”
 
@@ -116,7 +116,7 @@ We take the response from the GET request above, and then to the following:
 }
 ```
 
-3. Note the environment ID returned in the response, as we’ll use to create a dbt Cloud job in the next step 
+3. Note the environment ID returned in the response, as we’ll use to create a dbt Cloud job in the next step
 
 #### Pull the job definition from the dev project
 

diff --git a/website/blog/2022-05-17-stakeholder-friendly-model-names.md b/website/blog/2022-05-17-stakeholder-friendly-model-names.md
@@ -29,7 +29,7 @@ In this article, we’ll take a deeper look at why model naming conventions are
 
 >“[Data folks], what we [create in the database]… echoes in eternity.”   -Max(imus, Gladiator)
 
-Analytics Engineers are often centrally located in the company, sandwiched between data analysts and data engineers. This means everything AEs create might be read and need to be understood by both an analytics or customer-facing team and by teams who spend most of their time in code and the database. Depending on the audience, the scope of access differs, which means the user experience and context changes. Let’s elaborate on what that experience might look like by breaking end-users into two buckets: 
+Analytics Engineers are often centrally located in the company, sandwiched between data analysts and data engineers. This means everything AEs create might be read and need to be understood by both an analytics or customer-facing team and by teams who spend most of their time in code and the database. Depending on the audience, the scope of access differs, which means the user experience and context changes. Let’s elaborate on what that experience might look like by breaking end-users into two buckets:
 
 - Analysts / BI users
 - Analytics engineers / Data engineers
@@ -49,21 +49,21 @@ Here we have drag and drop functionality and a skin over top of the underlying `
 **How model names can make this painful:**
 The end users might not even know what tables the data refers to, as potentially everything is joined by the system and they don’t need to write their own queries. If model names are chosen poorly, there is a good chance that the BI layer on top of the database tables has been renamed to something more useful for the analysts. This adds an extra step of mental complexity in tracing the <Term id="data-lineage">lineage</Term> from data model to BI.
 
-#### Read only access to the dbt Cloud IDE docs 
+#### Read only access to the dbt Cloud IDE docs
 If Analysts want more context via documentation, they may traverse back to the dbt layer and check out the data models in either the context of the Project or Database. In the Project view, they will see the data models in the folder hierarchy present in your project’s repository. In the Database view you will see the output of the data models as present in your database, ie. `database / schema / object`.
 
 ![A screenshot depicting the dbt Cloud IDE menu's Database view which shows you the output of your data models. Next to this view, is the Project view.](/img/blog/2022-05-17-stakeholder-friendly-model-names/project-view.png)
 
 **How model names can make this painful:**
-For the Project view, generally abstracted department or organizational structures as folder names presupposes the reader/engineer knows what is contained within the folder beforehand or what that department actually does, or promotes haphazard clicking to open folders to see what is within. Organizing the final outputs by business unit or analytics function is great for end users but doesn't accurately represent all the sources and references that had to come together to build this output, as they often live in another folder. 
+For the Project view, generally abstracted department or organizational structures as folder names presupposes the reader/engineer knows what is contained within the folder beforehand or what that department actually does, or promotes haphazard clicking to open folders to see what is within. Organizing the final outputs by business unit or analytics function is great for end users but doesn't accurately represent all the sources and references that had to come together to build this output, as they often live in another folder.
 
 For the Database view, pray your team has been declaring a logical schema bucketing, or a logical model naming convention, otherwise you will have a long, alphabetized list of database objects to scroll through, where staging, intermediate, and final output models are all intermixed. Clicking into a data model and viewing the documentation is helpful, but you would need to check out the DAG to see where the model lives in the overall flow.
 
 #### The full dropdown list in their data warehouse.
 
 If they have access to Worksheets, SQL runner, or another way to write ad hoc sql queries, then they will have access to the data models as present in your database, ie. `database / schema / object`, but with less documentation attached, and more proclivity towards querying tables to check out their contents, which costs time and money.
 
-![A screenshot of the the SQL Runner menu within Looker showcasing the dropdown list of all data models present in the database.](/img/blog/2022-05-17-stakeholder-friendly-model-names/data-warehouse-dropdown.png)
+![A screenshot of the SQL Runner menu within Looker showcasing the dropdown list of all data models present in the database.](/img/blog/2022-05-17-stakeholder-friendly-model-names/data-warehouse-dropdown.png)
 
 **How model names can make this painful:**
 Without proper naming conventions, you will encounter `analytics.order`, `analytics.orders`, `analytics.orders_new` and not know which one is which, so you will open up a scratch statement tab and attempt to figure out which is correct:
@@ -73,9 +73,9 @@ Without proper naming conventions, you will encounter `analytics.order`, `analyt
 -- select * from analytics.orders  limit 10
 select * from analytics.orders_new  limit 10
 ```
-Hopefully you get it right via sampling queries, or eventually find out there is a true source of truth defined in a totally separate area: `core.dim_orders`. 
+Hopefully you get it right via sampling queries, or eventually find out there is a true source of truth defined in a totally separate area: `core.dim_orders`.
 
-The problem here is the only information you can use to determine what data is within an object or the purpose of the object is within the schema and model name. 
+The problem here is the only information you can use to determine what data is within an object or the purpose of the object is within the schema and model name.
 
 ### The engineer’s user experience
 
@@ -98,7 +98,7 @@ There is not much worse than spending all week developing on a task, submitting
 This is largely the same as the Analyst experience above, except they created the data models or are aware of their etymologies. They are likely more comfortable writing ad hoc queries, but also have the ability to make changes, which adds a layer of thought processing when working.
 
 **How model names can make this painful:**
-It takes time to become a subject matter expert in the database. You will need to know which schema a subject lives in, what tables are the source of truth and/or output models, versus experiments, outdated objects, or building blocks used along the way. Working within this context, engineers know the history and company lore behind why a table was named that way or how its purpose may differ slightly from its name, but they also have the ability to make changes. 
+It takes time to become a subject matter expert in the database. You will need to know which schema a subject lives in, what tables are the source of truth and/or output models, versus experiments, outdated objects, or building blocks used along the way. Working within this context, engineers know the history and company lore behind why a table was named that way or how its purpose may differ slightly from its name, but they also have the ability to make changes.
 
 Change management is hard; how many places would you need to update, rename, re-document, and retest to fix a poor naming choice from long ago? It is a daunting position, which can create internal strife when constrained for time over whether we should continually revamp and refactor for maintainability or focus on building new models in the same pattern as before.
 

diff --git a/website/blog/2024-05-07-unit-testing.md b/website/blog/2024-05-07-unit-testing.md
@@ -223,7 +223,7 @@ group by 1
 
 ### Caveats and pro-tips
 
-See the docs for [helpful information before you begin](https://docs.getdbt.com/docs/build/unit-tests#before-you-begin), including unit testing [incremental models](https://docs.getdbt.com/docs/build/unit-tests#unit-testing-incremental-models), [models that depend on ephemeral model(s)](https://docs.getdbt.com/docs/build/unit-tests#unit-testing-a-model-that-depend-on-ephemeral-models), and platform-specific considerations like `STRUCT`s in BigQuery. In many cases, the [`sql` format](https://docs.getdbt.com/reference/resource-properties/data-formats#sql) can help solve tricky edge cases that come up.
+See the docs for [helpful information before you begin](https://docs.getdbt.com/docs/build/unit-tests#before-you-begin), including unit testing [incremental models](https://docs.getdbt.com/docs/build/unit-tests#unit-testing-incremental-models), [models that depend on ephemeral model(s)](https://docs.getdbt.com/docs/build/unit-tests#unit-testing-a-model-that-depends-on-ephemeral-models), and platform-specific considerations like `STRUCT`s in BigQuery. In many cases, the [`sql` format](https://docs.getdbt.com/reference/resource-properties/data-formats#sql) can help solve tricky edge cases that come up.
 
 Another advanced topic is overcoming issues when non-deterministic factors are involved, such as a current timestamp. To ensure that the output remains consistent regardless of when the test is run, you can set a fixed, predetermined value by using the [`overrides`](https://docs.getdbt.com/reference/resource-properties/unit-test-overrides) configuration.
 

diff --git a/website/docs/docs/build/dimensions.md b/website/docs/docs/build/dimensions.md
@@ -22,6 +22,7 @@ All dimensions require a `name`, `type`, and can optionally include an `expr` pa
 | `description` | A clear description of the dimension. | Optional | String |  
 | `expr` | Defines the underlying column or SQL query for a dimension. If no `expr` is specified, MetricFlow will use the column with the same name as the group. You can use the column name itself to input a SQL expression. | Optional | String |
 | `label` | Defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`).  | Optional | String |
+| [`meta`](/reference/resource-configs/meta) |  Set metadata for a resource and organize resources. Accepts plain text, spaces, and quotes. | Optional | Dictionary | 
 
 Refer to the following for the complete specification for dimensions:
 
@@ -37,6 +38,8 @@ dimensions:
 
 Refer to the following example to see how dimensions are used in a semantic model:
 
+<VersionBlock firstVersion="1.9">
+
 ```yaml
 semantic_models:
   - name: transactions
@@ -59,13 +62,50 @@ semantic_models:
       type_params:
         time_granularity: day
       label: "Date of transaction" # Recommend adding a label to provide more context to users consuming the data
+      config: 
+        meta:
+          data_owner: "Finance team"
       expr: ts
     - name: is_bulk
       type: categorical
       expr: case when quantity > 10 then true else false end
     - name: type
       type: categorical
 ```
+</VersionBlock>
+
+<VersionBlock lastVersion="1.8">
+
+```yaml
+semantic_models:
+  - name: transactions
+    description: A record for every transaction that takes place. Carts are considered multiple transactions for each SKU. 
+    model: {{ ref('fact_transactions') }}
+    defaults:
+      agg_time_dimension: order_date
+# --- entities --- 
+  entities: 
+    - name: transaction
+      type: primary
+      ...
+# --- measures --- 
+  measures: 
+      ... 
+# --- dimensions ---
+  dimensions:
+    - name: order_date
+      type: time
+      type_params:
+        time_granularity: day
+      label: "Date of transaction" # Recommend adding a label to provide more context to users consuming the data
+      expr: ts
+    - name: is_bulk
+      type: categorical
+      expr: case when quantity > 10 then true else false end
+    - name: type
+      type: categorical
+```
+</VersionBlock>
 
 Dimensions are bound to the primary entity of the semantic model they are defined in. For example the dimension `type` is defined in a model that has `transaction` as a primary entity. `type` is scoped to the `transaction` entity, and to reference this dimension you would use the fully qualified dimension name i.e `transaction__type`. 
 
@@ -101,12 +141,28 @@ This section further explains the dimension definitions, along with examples. Di
 
 Categorical dimensions are used to group metrics by different attributes, features, or characteristics such as product type. They can refer to existing columns in your dbt model or be calculated using a SQL expression with the `expr` parameter. An example of a categorical dimension is `is_bulk_transaction`, which is a group created by applying a case statement to the underlying column `quantity`. This allows users to group or filter the data based on bulk transactions.
 
+<VersionBlock firstVersion="1.9">
+
+```yaml
+dimensions: 
+  - name: is_bulk_transaction
+    type: categorical
+    expr: case when quantity > 10 then true else false end
+    config:
+      meta:
+        usage: "Filter to identify bulk transactions, like where quantity > 10."
+```
+</VersionBlock>
+
+<VersionBlock lastVersion="1.8">
+
 ```yaml
 dimensions: 
   - name: is_bulk_transaction
     type: categorical
     expr: case when quantity > 10 then true else false end
 ```
+</VersionBlock>
 
 ## Time
 
@@ -130,12 +186,17 @@ You can set `is_partition` for time to define specific time spans. Additionally,
 
 Use `is_partition: True` to show that a dimension exists over a specific time window. For example, a date-partitioned dimensional table. When you query metrics from different tables, the dbt Semantic Layer uses this parameter to ensure that the correct dimensional values are joined to measures. 
 
+<VersionBlock firstVersion="1.9">
+
 ```yaml
 dimensions: 
   - name: created_at
     type: time
     label: "Date of creation"
     expr: ts_created # ts_created is the underlying column name from the table 
+    config:
+      meta:
+        notes: "Only valid for orders from 2022 onward"
     is_partition: True
     type_params:
       time_granularity: day
@@ -156,6 +217,37 @@ measures:
     expr: 1
     agg: sum
 ```
+</VersionBlock>
+
+<VersionBlock lastVersion="1.8">
+
+```yaml
+dimensions: 
+  - name: created_at
+    type: time
+    label: "Date of creation"
+    expr: ts_created # ts_created is the underlying column name from the table 
+    is_partition: True
+    type_params:
+      time_granularity: day
+  - name: deleted_at
+    type: time
+    label: "Date of deletion"
+    expr: ts_deleted # ts_deleted is the underlying column name from the table
+    is_partition: True 
+    type_params:
+      time_granularity: day
+
+measures:
+  - name: users_deleted
+    expr: 1
+    agg: sum
+    agg_time_dimension: deleted_at
+  - name: users_created
+    expr: 1
+    agg: sum
+```
+</VersionBlock>
 
 </TabItem>