Skip to content

Commit

Permalink
docs(feature-guide) Impact Analysis (#5765)
Browse files Browse the repository at this point in the history
* update sidebar titles to remove About DataHub

* move impact analysis guide to new folder; update links

* update copy in Understand Data in Context section

* adding feature guide template to sidebar

* adding feature guide template

* update docs readme to link to feature guide template

* enhance docs-website readme

* add comments to feature guide template

* add links to graphql and lineage resources

* linter cleanup

* updating reference links

* update to graphql reference links

* add image and gif best practices

* update feature guide template with image details

* fix link

* update template from YouTube -> Videos

* Update docs-website/README.md

Co-authored-by: Harshal Sheth <[email protected]>

* update feature to Lineage Impact Analysis

Co-authored-by: Harshal Sheth <[email protected]>
  • Loading branch information
maggiehays and hsheth2 authored Sep 1, 2022
1 parent 91f6084 commit 4956f5a
Show file tree
Hide file tree
Showing 9 changed files with 337 additions and 12 deletions.
104 changes: 103 additions & 1 deletion docs-website/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,106 @@ To regenerate GraphQL API docs, simply rebuild the docs-website directory.

```console
./gradlew docs-website:build
```
```

## Managing Content

Please use the following steps when adding/managing content for the docs site.

### Leverage Documentation Templates

* [Feature Guide Template](./docs/_feature-guide-template.md)
* [Metadata Ingestion Source Template](./metadata-ingestion/source-docs-template.md)

### Self-Hosted vs. Managed DataHub

The docs site includes resources for both self-hosted (aka open-source) DataHub and Managed DataHub alike.

* All Feature Guides should include the `FeatureAvailability` component within the markdown file itself
* Features only available via Managed DataHub should have the `saasOnly` class if they are included in `sidebar.js` to display the small "cloud" icon:

```
{
type: "doc",
id: "path/to/document",
className: "saasOnly",
},
```

### Sidebar Display Options

`generateDocsDir.ts` has a bunch of logic to auto-generate the docs site Sidebar; here are a few ways to manage how documents are displayed.

1. Leverage the document's H1 value

By default, the Sidebar will display the H1 value of the Markdown file, not the file name itself.

**NOTE:** `generateDocsDir.ts` will strip leading values of `DataHub ` and `About DataHub ` to minimize repetitive values of DataHub in the sidebar

2. Hard-code the section title in `generateDocsDir.ts`

Map the file to a hard-coded value in `const hardcoded_titles`

3. Assign a `title` separate from the H1 value

You can add the following details at the top of the markdown file:

```
---
title: [value to display in the sidebar]
---
```

*This will be ignored your H1 value begins with `DataHub ` or `About DataHub `*

**NOTE:** Assigning a value for `label:` in `sidebar.js` is not reliable, e.g.

```
{ // Don't do this
label: "Usage Guide",
type: "doc",
id: "path/to/document",
},
```

### Determine the Appropriate Sidebar Section

When adding a new document to the site, determine the appropriate sidebar section:

**What is DataHub?**

By the end of this section, readers should understand the core use cases that DataHub addresses, target end-users, high-level architecture, & hosting options.

**Get Started**

The goal of this section is to provide the bare-minimum steps required to:
- Get DataHub Running
- Optionally configure SSO
- Add/invite Users
- Create Polices & assign roles
- Ingest at least one source (i.e., data warehouse)
- Understand high-level options for enriching metadata

**Ingest Metadata**

This section aims to provide a deeper understanding of how ingestion works. Readers should be able to find details for ingesting from all systems, apply transformers, understand sinks, and understand key concepts of the Ingestion Framework (Sources, Sinks, Transformers, and Recipes).

**Enrich Metadata**

The purpose of this section is to provide direction on how to enrich metadata when shift-left isn’t an option.

**Act on Metadata**

This section provides concrete examples of acting on metadata changes in real-time and enabling Active Metadata workflows/practices.

**Deploy DataHub**

The purpose of this section is to provide the minimum steps required to deploy DataHub to the vendor of your choosing.

**Developer Guides**

The purpose of this section is to provide developers & technical users with concrete tutorials on how to work with the DataHub CLI & APIs.

**Feature Guides**

This section aims to provide plain-language feature overviews for both technical and non-technical readers alike.
3 changes: 3 additions & 0 deletions docs-website/generateDocsDir.ts
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,9 @@ function markdown_guess_title(
if (sidebar_label.startsWith("DataHub ")) {
sidebar_label = sidebar_label.slice(8).trim();
}
if (sidebar_label.startsWith("About DataHub ")) {
sidebar_label = sidebar_label.slice(14).trim();
}
if (sidebar_label != title) {
contents.data.sidebar_label = sidebar_label;
}
Expand Down
3 changes: 2 additions & 1 deletion docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ module.exports = {
// className: "saasOnly",
// },
// "docs/wip/metadata-analytics",
// "docs/wip/impact-analysis",
"docs/act-on-metadata/impact-analysis",
// {
// type: "doc",
// id: "docs/wip/events-bridge",
Expand Down Expand Up @@ -513,6 +513,7 @@ module.exports = {
// - "perf-test/README",
// "metadata-jobs/README",
// "docs/how/add-user-data",
// "docs/_feature-guide-template"
// ],
},
};
2 changes: 1 addition & 1 deletion docs-website/src/pages/docs/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ const featureGuideContent = [
{ title: "UI-Based Ingestion", icon: <ApiTwoTone />, to: "docs/ui-ingestion" },
{ title: "Search", icon: <SearchOutlined />, to: "docs/how/search" },
// { title: "Browse", icon: <CompassTwoTone />, to: "/docs/quickstart" },
{ title: "Impact Analysis", icon: <NodeExpandOutlined />, to: "docs/wip/impact-analysis" },
{ title: "Lineage Impact Analysis", icon: <NodeExpandOutlined />, to: "docs/act-on-metadata/impact-analysis" },
{ title: "Metadata Tests", icon: <CheckCircleTwoTone />, to: "docs/wip/metadata-tests" },
{ title: "Approval Flows", icon: <SafetyCertificateTwoTone />, to: "docs/wip/approval-workflows" },
{ title: "Personal Access Tokens", icon: <LockTwoTone />, to: "docs/authentication/personal-access-tokens" },
Expand Down
4 changes: 2 additions & 2 deletions docs-website/src/pages/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,8 @@ function Home() {
</h2>
<p>
DataHub is the one-stop shop for documentation, schemas,
ownership, lineage, pipelines and usage information. Data
quality and data preview information coming soon.
ownership, lineage, pipelines, data quality, usage information,
and more.
</p>
</div>
<div className="col col--6 col--offset-1">
Expand Down
50 changes: 50 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,51 @@
# DataHub Docs Overview

DataHub's project documentation is hosted at [datahubproject.io](https://datahubproject.io/docs)

## Types of Documentation

### Feature Guide

A Feature Guide should follow the [Feature Guide Template](/_feature-guide-template.md), and should provide the following value:

* At a high level, what is the concept/feature within DataHub?
* Why is the feature useful?
* What are the common use cases of the feature?
* What are the simple steps one needs to take to use the feature?

When creating a Feature Guide, please remember to:

* Provide plain-language descriptions for both technical and non-technical readers
* Avoid using industry jargon, abbreviations, or acryonyms
* Provide descriptive screenshots, links out to relevant YouTube videos, and any other relevant resources
* Provide links out to Tutorials for advanced use cases

*Not all Feature Guides will require a Tutorial.*

### Tutorial

A Tutorial is meant to provide very specific steps to accomplish complex workflows and advanced use cases that are out of scope of a Feature Guide.

Tutorials should be written to accomodate the targeted persona, i.e. Developer, Admin, End-User, etc.

*Not all Tutorials require an associated Feature Guide.*

## Docs Best Practices

### Embedding GIFs and or Screenshots

* Store GIFs and screenshots in [datahub-project/static-assets](https://github.com/datahub-project/static-assets); this minimizes unnecessarily large image/file sizes in the main repo
* Center-align screenshots and size down to 70% - this improves readability/skimability within the site

Example snippet:

```
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/impact-analysis-export-full-list.png"/>
</p>
```

* Use the "raw" GitHub image link (right click image from GitHub > Open in New Tab > copy URL):

* Good: https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/dbt-test-logic-view.png
* Bad: https://github.com/datahub-project/static-assets/blob/main/imgs/dbt-test-logic-view.png
83 changes: 83 additions & 0 deletions docs/_feature-guide-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
import FeatureAvailability from '@site/src/components/FeatureAvailability';

# About DataHub [Feature Name]

<!-- All Feature Guides should begin with `About DataHub ` to improve SEO -->

<!--
Update feature availability; by default, feature availabilty is Self-Hosted and Managed DataHub
Add in `saasOnly` for Managed DataHub-only features
-->

<FeatureAvailability/>

<!-- This section should provide a plain-language overview of feature. Consider the following:
* What does this feature do? Why is it useful?
* What are the typical use cases?
* Who are the typical users?
* In which DataHub Version did this become available? -->

## [Feature Name] Setup, Prerequisites, and Permissions

<!-- This section should provide plain-language instructions on how to configure the feature:
* What special configuration is required, if any?
* How can you confirm you configured it correctly? What is the expected behavior?
* What access levels/permissions are required within DataHub? -->

## Using [Feature Name]

<!-- Plain-language instructions of how to use the feature
Provide a step-by-step guide to use feature, including relevant screenshots and/or GIFs
* Where/how do you access it?
* What best practices exist?
* What are common code snippets?
-->

## Additional Resources

<!-- Comment out any irrelevant or empty sections -->

### Videos

<!-- Use the following format to embed YouTube videos:
**Title of YouTube video in bold text**
<p align="center">
<iframe width="560" height="315" src="www.youtube.com/embed/VIDEO_ID" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</p>
-->

<!--
NOTE: Find the iframe details in YouTube by going to Share > Embed
-->

### GraphQL

<!-- Bulleted list of relevant GraphQL docs; comment out section if none -->

### DataHub Blog

<!-- Bulleted list of relevant DataHub Blog posts; comment out section if none -->

## FAQ and Troubleshooting

<!-- Use the following format:
**Question in bold text**
Response in plain text
-->

*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!*

### Related Features

<!-- Bulleted list of related features; comment out section if none -->
93 changes: 93 additions & 0 deletions docs/act-on-metadata/impact-analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
import FeatureAvailability from '@site/src/components/FeatureAvailability';

# About DataHub Lineage Impact Analysis

<FeatureAvailability/>

Lineage Impact Analysis is a powerful workflow for understanding the complete set of upstream and downstream dependencies of a Dataset, Dashboard, Chart, and many other DataHub Entities.

This allows Data Practitioners to proactively identify the impact of breaking schema changes or failed data pipelines on downstream dependencies, rapidly discover which upstream dependencies may have caused unexpected data quality issues, and more.

Lineage Impact Analysis is available via the DataHub UI and GraphQL endpoints, supporting manual and automated workflows.

## Lineage Impact Analysis Setup, Prerequisites, and Permissions

Lineage Impact Analysis is enabled for any Entity that has associated Lineage relationships with other Entities and does not require any additional configuration.

Any DataHub user with “View Entity Page” permissions is able to view the full set of upstream or downstream Entities and export results to CSV from the DataHub UI.

## Using Lineage Impact Analysis

Follow these simple steps to understand the full dependency chain of your data entities.

1. On a given Entity Page, select the **Lineage** tab

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/impact-analysis-lineage-tab.png"/>
</p>

2. Easily toggle between **Upstream** and **Downstream** dependencies

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/impact-analysis-choose-upstream-downstream.png"/>
</p>

3. Choose the **Degree of Dependencies** you are interested in. The default filter is “1 Degree of Dependency” to minimize processor-intensive queries.

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/impact-analysis-filter-dependencies.png"/>
</p>

4. Slice and dice the result list by Entity Type, Platfrom, Owner, and more to isolate the relevant dependencies

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/impact-analysis-apply-filters.png"/>
</p>

5. Export the full list of dependencies to CSV

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/impact-analysis-export-full-list.png"/>
</p>

6. View the filtered set of dependencies via CSV, with details about assigned ownership, domain, tags, terms, and quick links back to those entities within DataHub

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/impact-analysis-view-export-results.png"/>
</p>

## Additional Resources

### Videos

**DataHub 201: Impact Analysis**

<p align="center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/BHG_kzpQ_aQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</p>

### GraphQL

* [searchAcrossLineage](../../graphql/queries.md#searchacrosslineage)
* [searchAcrossLineageInput](../../graphql/inputObjects.md#searchacrosslineageinput)

### DataHub Blog

* [Dependency Impact Analysis, Data Validation Outcomes, and MORE! - Highlights from DataHub v0.8.27 & v.0.8.28](https://blog.datahubproject.io/dependency-impact-analysis-data-validation-outcomes-and-more-1302604da233)


### FAQ and Troubleshooting

**The Lineage Tab is greyed out - why can’t I click on it?**

This means you have not yet ingested Lineage metadata for that entity. Please see the Lineage Guide to get started.

**Why is my list of exported dependencies incomplete?**

We currently limit the list of dependencies to 10,000 records; we suggest applying filters to narrow the result set if you hit that limit.

*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!*

### Related Features

* [DataHub Lineage](./docs/lineage/intro.md)
7 changes: 0 additions & 7 deletions docs/wip/impact-analysis.md

This file was deleted.

0 comments on commit 4956f5a

Please sign in to comment.