Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion) Ingest Tags from s3 bucket on an AWS Glue job and S3 Data Lake Ingest Job #4689

Merged
merged 37 commits into from
Apr 29, 2022

Conversation

Jiafi
Copy link
Contributor

@Jiafi Jiafi commented Apr 19, 2022

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions
Copy link

github-actions bot commented Apr 19, 2022

Unit Test Results (build & test)

  98 files    98 suites   25m 44s ⏱️
718 tests 645 ✔️ 67 💤 6

For more details on these failures, see this check.

Results for commit 544ea97.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Apr 19, 2022

Unit Test Results (metadata ingestion)

       5 files         5 suites   1h 1m 52s ⏱️
   436 tests    436 ✔️   0 💤 0
2 105 runs  2 035 ✔️ 70 💤 0

Results for commit 544ea97.

♻️ This comment has been updated with latest results.

@Jiafi Jiafi changed the title Ingest Tags from s3 bucket on an AWS Glue job Ingest Tags from s3 bucket on an AWS Glue job and S3 Data Lake Ingest Job Apr 19, 2022
@Jiafi Jiafi changed the title Ingest Tags from s3 bucket on an AWS Glue job and S3 Data Lake Ingest Job feat(ingest) Ingest Tags from s3 bucket on an AWS Glue job and S3 Data Lake Ingest Job Apr 19, 2022
@Jiafi
Copy link
Contributor Author

Jiafi commented Apr 19, 2022

The Bucket Tags being ingested are done. Need to ingest Object tags which shouldnt be too much extra work. Could use some feedback on the bucket tag work.

@Jiafi Jiafi changed the title feat(ingest) Ingest Tags from s3 bucket on an AWS Glue job and S3 Data Lake Ingest Job feat(ingestion) Ingest Tags from s3 bucket on an AWS Glue job and S3 Data Lake Ingest Job Apr 19, 2022
@Jiafi
Copy link
Contributor Author

Jiafi commented Apr 20, 2022

Last thing to do is add unit tests for object tagging functionality

@Jiafi Jiafi requested a review from mayurinehate April 20, 2022 19:10
@Jiafi
Copy link
Contributor Author

Jiafi commented Apr 20, 2022

Unit tests added and updated for object tagging. @mayurinehate the PR is ready.

@Jiafi Jiafi requested a review from mayurinehate April 25, 2022 22:42
Copy link
Collaborator

@mayurinehate mayurinehate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Jiafi
This is great ! Thank you. Wondering if you have tested these changes with actual s3 bucket and glue setup ?

metadata-ingestion/source_docs/s3_data_lake.md Outdated Show resolved Hide resolved
if current_tags:
tags_to_add.extend(
[current_tag.tag for current_tag in current_tags.tags]
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we skip updating tags aspect when tags_to_add is empty?
Also, If the configs to add tags is set in source config but tags could not be added due to self.ctx.graph being None, we should stop execution with configuration error. similar to this - https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/transformer/add_dataset_ownership.py#L64

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also applies for s3 source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to not add tags to aspect if they are empty. Also added that ctx.graph configuration error.

Copy link
Contributor Author

@Jiafi Jiafi Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back at this, would we not want to add the tags to the aspects, even if ctx.graph is None, in the case of outputting the tags to a file?
I don't necessarily think it warrants an error being raised because of this. Maybe if the graph is None we can log a warning or a debug message?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jiafi ctx.graph = None does not represent that there are no tags to maintain, the graph is none if using sink other than datahub-rest( like file or datahub kafka sink), in which case datahub_api needs to be explicitly set in recipe file to be able to instantiate ctx.graph.

With current code, if older tags were present and datahub kafka sink it used, tags will be overridden.

I understand your concern that someone who has not setup datahub and uses file sink for testing (and not for ingesting file to datahub later) might end up with ConfigurationError with suggested changes. Maybe, we can continue with just a warning message like you suggested or take additional config whether to ignore ctx unavailable error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jiafi ctx.graph = None does not represent that there are no tags to maintain, the graph is none if using sink other than datahub-rest( like file or datahub kafka sink), in which case datahub_api needs to be explicitly set in recipe file to be able to instantiate ctx.graph.

With current code, if older tags were present and datahub kafka sink it used, tags will be overridden.

I understand your concern that someone who has not setup datahub and uses file sink for testing (and not for ingesting file to datahub later) might end up with ConfigurationError with suggested changes. Maybe, we can continue with just a warning message like you suggested or take additional config whether to ignore ctx unavailable error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think a warning is correct call. If there extra configurations for the source to tell that its using a datahub api, it can very easily get disjointed with actually using the sink. Theres more room for configuration errors as opposed to the warning message being pretty clear about what is going on.

@Jiafi
Copy link
Contributor Author

Jiafi commented Apr 26, 2022

Hey @Jiafi

This is great ! Thank you. Wondering if you have tested these changes with actual s3 bucket and glue setup ?

Yes I have! Feel free to try it locally on your machine as well

Jiafi added 2 commits April 26, 2022 11:44
… if there are no tags. Make readmes more explicit in not adding tags to folders or databases the tables live in
@Jiafi Jiafi requested a review from mayurinehate April 26, 2022 15:45
@mayurinehate mayurinehate requested a review from treff7es April 27, 2022 11:36
@Jiafi
Copy link
Contributor Author

Jiafi commented Apr 27, 2022

Seeing unrelated tests failing on CI/CD. not sure what is up with those.

)
tags_to_add = []
if self.source_config.use_s3_bucket_tags:
bucket_tags = self.s3_client.get_bucket_tagging(Bucket=bucket_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, can you handle the case if no bucket tags?
For me it throws an exception if it does not exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Added a logger.warn for when there are no tags. Not sure if it should be a warn or debug message.

@Jiafi Jiafi requested review from treff7es and mayurinehate April 28, 2022 16:01
Copy link
Contributor

@treff7es treff7es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for the contribution! 🎉

@treff7es treff7es merged commit bbac4a7 into datahub-project:master Apr 29, 2022
alexey-kravtsov added a commit to infobip/datahub that referenced this pull request Jun 3, 2022
* fix(ingest): bigquery - Fix BigQuery Datetime/Timestamp type column partition table profile bug (datahub-project#4658)

* fix BigQuery Datetime type column partition table profile bug

* inplace datetime replace

* extract out 'if' blocks and write a unit-test

* parse logic inside get_partition_range func

* docs: add missing PR numbers (datahub-project#4742)

* docs: add missing PR numbers & specific version where deprecation was done

* fix(azure_ad): silently discard other Azure AD object types (datahub-project#4693) (datahub-project#4704)

* fix(datahub-frontend): OIDC discovery URL will not have NONE as auth_methods_supported (datahub-project#4710)

* fix(docs): fix links (datahub-project#4703)

* feat(ingest): feast - add support for Feast 0.18, deprecate older integration (datahub-project#4094)

* rephrasing soft delete banner (datahub-project#4753)

* feat(ebeans): Add metrics to track connection pool (datahub-project#4755)

* fix(ingest): aws - When using aws_profile, grab temporary credentials from the session. (datahub-project#4751)

* allow for temporary credentials generated when using an aws_profile.  Mostly used for SSO and temporary credentials

* feat(ingestion): aws - Custom endpoint url and proxies in S3. (datahub-project#4708)

This feature will allow user to specify custom endpoint url with custom proxies to connect to dedicated S3 bucket not associated with the amazon aws.

* fix(tableau): miscellaneous tableau fixes for lineage, browse path, non-embedded datasets (datahub-project#4724)

* fix(tableau): add config whether to emit aspects for external datasets

other changes:
- do not set browse path in absence of datasource or project name
- remove unused nodes from tableau metadata query

* fix(tableau): remove redundant (transitive) lineage edges between tables, datasource, sheet

other changes:
- update subtypes for datasource to be more specific

* fix(tableau): fix browse paths for custom sql and embedded datasource

other changes:
- do not set browse path if any intermediate folder level in browse path is empty

* docs(tableau): update tableau doc

* docs(dev): add warning for JDK (datahub-project#4761)

* fix(ui): fix expandedName for dataset (datahub-project#4762)

* fix(ui): Users and Groups UI bug fixes (datahub-project#4746)

* fix(azure_ad): make redirect and graph_url optional parameters and update docs (datahub-project#4754)

* docs(ingestion): glue - clarify that table regex patterns should be fully-qualified (datahub-project#4747)

* fixing ml model feature tab (datahub-project#4769)

* fix(lint): lib upgrade caused (datahub-project#4773)

* fix(lineage) Filter dataset -> dataset lineage edges if data is transformed (datahub-project#4732)

Co-authored-by: Chris Collins <[email protected]>

* Fix breaking changes from GE 0.15.3 that are affecting our Python3.6 smoke_tests. (datahub-project#4779)

* fix(ingest): fwk - fix how we import DataHub actions (datahub-project#4784)

* fix(ingest): fwk - datahub_api should be initialized by datahub-rest … (datahub-project#4786)

* feat(ingestion): glue/s3 - Ingest Tags from s3 bucket on an AWS Glue job and S3 Data Lake Ingest Job (datahub-project#4689)

* fix(snowflake): improve debug log for external tables (datahub-project#4772)

* feat(snowflake): add option to disable checking role grants (datahub-project#4760)

* feat(ingest): add option to disable checking role grants

* fix(m1): tweak m1 preflight (datahub-project#4771)

* feat(ingestion): add Pulsar source (datahub-project#4721)

* fix(gms): Fixes delete logic in MAE consumer (datahub-project#4790)

* feat(analytics): display glossary term percentage coverage (datahub-project#4782)

* refactor(entity change events): Removing unused source field (datahub-project#4781)

* feat(versionedDataset): adds a versionStamp to timeline response & adds versionStamp param to dataset graphql (datahub-project#4727)

* fix(s3): improved handling for corner cases (datahub-project#4774)

* fix(ingest): databricks - hive ingestion should not fail on table comment (datahub-project#4787)

* fix(ui ingest): Unschedule all sources on ingestion source refresh, fix delete not being enforced (datahub-project#4792)

* feat(tracking) Configure whether mixpanel is enabled with env variable (datahub-project#4768)

* feat(ingest): docs - overhaul source connector docs to make it code driven (datahub-project#4798)

Co-authored-by: MugdhaHardikar-GSLab <[email protected]>

* fix(docs): Fixing outdated language in policies doc (datahub-project#4799)

* fix(ui): update default preview component with new ui design (datahub-project#4783)

* feat(operation): display the reported time for last updated in the UI (datahub-project#4800)

* feat(blame) - add schema history blame UI (datahub-project#4793)

* fix(ingest): avro - fix schema field type for avro logical types (datahub-project#4801)

* Create sample_pii_glossary.yml (datahub-project#4795)

* fix(ci): fix presto_on_hive tests. (datahub-project#4802)

* fix(bigquery): improve handling of extracted audit log sql queries (datahub-project#4735)

* fix(snowflake): get external tables when there is default namespace (datahub-project#4803)

* fix(snowflake): passing connect args should not cause failures (datahub-project#4764)

* fix(snowflake): passing connect args should not cause failures

Co-authored-by: Ravindra Lanka <[email protected]>

* fix(scrolling) Fixes scrolling and weird heights for embeddedListSearch across entities (datahub-project#4805)

* fix(ui): update default preview card description text (datahub-project#4796)

* fix(ui): preview card UI design update (datahub-project#4808)

* fix(blame): make view blame prior to button work properly (datahub-project#4810)

* fix(docgen): fix failure count incrementing during doc generation (datahub-project#4806)

* fix(search) Fixes a UI issue so results and filters are always separated (datahub-project#4811)

Co-authored-by: Chris Collins <[email protected]>

* feat(platform): openapi - initial post, get, and delete endpoints for entities (datahub-project#4775)

* feat(protobuf): adding deprecation support for datasets and fields (datahub-project#4634)

* fix(build) Bumps hadoop-client to 3.2.1 which has no security vulnerability (datahub-project#4816)

* fix(doc): improving docs across multiple sources (datahub-project#4815)

* fix(ui): Fix/UI bug on default preview component (datahub-project#4818)

* fix(docs): fix the metadata service auth documentation and frontend clarifications (datahub-project#4722)

* feat(ingestion): kafka - add protobuf schema support (datahub-project#4819)

Co-authored-by: Luis Angel Vicente Sanchez <[email protected]>

* removing unused module (datahub-project#4823)

* fix(ci):  split out expensive build steps, increase memory (datahub-project#4825)

* fix(ingest): great-expectations - fix failure to serialize type Decimal (datahub-project#4763)

* refactor(deps): upgrade Jackson Databind to avoid CVE (datahub-project#4822)

* fix(security): update glue dependency (datahub-project#4828)

* fix(ingestion): bigquery - extract temp table prefix as config, fix reporting, logging (datahub-project#4766)

* feat: updates for 0.8.34 (datahub-project#4829)

* fix(blame) - fixes UI issues on small viewports for the schema blame view (datahub-project#4827)

Co-authored-by: Shirshanka Das <[email protected]>

* feat(ingest): great-expectations - add more logs (datahub-project#4832)

* fix(docs): Adds access policy documentation (datahub-project#4813)

* chore(deps): upgrade spring and parquet dependencies (datahub-project#4807)

* feat(ingest): s3 - add support for multiple pathspecs in one recipe (datahub-project#4777)

* chore(deps): pinning jackson dataformat cbor (datahub-project#4826)

* refactor(metadata-service): remove redundant file (datahub-project#4836)

* hide soft deleted entities in lineage (datahub-project#4835)

* docs(schema-history): add usage guide for schema history (datahub-project#4817)

* feat(ui): entity profile add copy url option update (datahub-project#4821)

* chore(deps): move from velocity 1.7 to 2.3 (datahub-project#4837)

* fix(policies): change order of operations for policies bootstrap step to update index after database (datahub-project#4841)

* chore(deps): upgrade dependency io.netty:netty-all to address vulnerability (datahub-project#4840)

* fix(jetty): upgrade jetty dependency for CVE (datahub-project#4838)

* Revert "fix(jetty): upgrade jetty dependency for CVE (datahub-project#4838)" (datahub-project#4844)

This reverts commit 1697bd0.

* fix(ingestion): Allow profiling of only those tables that are allowed by the table_pattern. (datahub-project#4842)

* implemented kafka ingester

* chore(deps): upgrade play dependencies to remove CVE vulnerabilities (datahub-project#4820)

* chore(deps): bump minimist from 1.2.5 to 1.2.6 in /docs-website (datahub-project#4847)

Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)

---
updated-dependencies:
- dependency-name: minimist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump async from 2.6.3 to 2.6.4 in /docs-website (datahub-project#4846)

Bumps [async](https://github.com/caolan/async) from 2.6.3 to 2.6.4.
- [Release notes](https://github.com/caolan/async/releases)
- [Changelog](https://github.com/caolan/async/blob/v2.6.4/CHANGELOG.md)
- [Commits](caolan/async@v2.6.3...v2.6.4)

---
updated-dependencies:
- dependency-name: async
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Revert "chore(deps): upgrade play dependencies to remove CVE vulnerabilities (datahub-project#4820)" (datahub-project#4861)

This reverts commit fa4abea.

* ssl configuration support for elasticsearch source (datahub-project#4843)

* chore(deps): upgrade play to remove CVEs (datahub-project#4864)

* fix(bigquery-usage): dataset allow filter impl (datahub-project#4776)


Co-authored-by: Ravindra Lanka <[email protected]>

* chore(jetty): upgrade jetty to 9.4.46 for CVE (datahub-project#4857)

* Revert "chore(deps): upgrade play to remove CVEs (datahub-project#4864)" (datahub-project#4868)

This reverts commit 84a026b.

* Use ingest proposal to submit status updates (datahub-project#4600)

* fix(ingestion): dependencies - Downgrading typing-extension dependency to work with Airflow 2.0.2 (datahub-project#4855)

* Downgrading typing-extension dependency to work with Airflow 2.0.2 restricting typing-extension on python 3.7

* fix(ui): search filter entity ui update (datahub-project#4866)

* fix: update const list in SearchFilter

* fix(ui): update inline css with styled components

* fix(ui): update space between title and description

* fix(docs): ingest - sort modules, fix small typos (datahub-project#4880)

* feat(dataPlatformInstance) - Resolve and display dataPlatformInstance on entities (datahub-project#4867)

Co-authored-by: Chris Collins <[email protected]>

* feat(ci): docker actions simplify, add vulnerability scanner, simplify smoke-tests (datahub-project#4881)

co-authored-by: Dexter Lee <[email protected]>

* fix(ci): remove multiplatform builds from containers that don't support it (datahub-project#4883)

* fix(sql-parsing): improve error handling (datahub-project#4862)

* fix(ci): remove buildx and qemu for non multi-platform images (datahub-project#4885)

* fix(ci): docker - either load or push, don't do both (datahub-project#4887)

* fix(ingest): lookml - add view definitions for all views (datahub-project#4875)

* fix(ci): clean up docker workflow for multi-tags (datahub-project#4889)

* chore(deps): play - upgrade for CVEs (datahub-project#4891)

* fix(ci): remove logging statement (datahub-project#4893)

* fix(ingestion): bigquery-usage: Fix biquery usage table deny pattern template (datahub-project#4898)

* default values

* default values

* default values

* query

* chore(deps): bump axios from 0.21.1 to 0.21.4 in /datahub-web-react (datahub-project#4865)

Bumps [axios](https://github.com/axios/axios) from 0.21.1 to 0.21.4.
- [Release notes](https://github.com/axios/axios/releases)
- [Changelog](https://github.com/axios/axios/blob/master/CHANGELOG.md)
- [Commits](axios/axios@v0.21.1...v0.21.4)

---
updated-dependencies:
- dependency-name: axios
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix(ingestion): ElasticSearch when no properties from elastic_mappings, gracefully continue (datahub-project#4853)

* when no properties from elastic_mappings, gracefully continue

Co-authored-by: Ravindra Lanka <[email protected]>

* fix(mxe-consumer): exclude CassandraAutoConfiguration from consumer boot (datahub-project#4890)

* fix(ui): fix side panel resize css (datahub-project#4892)

* fix(docs): Update developing.md to mention directory context (datahub-project#4899)

* fix(usage): pull usage from environment source rather than args (datahub-project#4824)

* revert(bigquery-usage): dataset allow filter impl (datahub-project#4901)

* Revert "fix(ingestion): bigquery-usage: Fix biquery usage table deny pattern template (datahub-project#4898)"

* doc(ingestion): add note for UI ingestion & custom sources (datahub-project#4902)

* fix(deps): reduce frontend dependency (datahub-project#4884)

* query

* query

* query

* browse paths fix

* limit

* rid-limit

* fix(build): Fix breaking changes from GE 0.15.3 (datahub-project#4905)

* docs(website): add banner and nav item for metadata day 2022 (datahub-project#4906)

* doc(biqquery): add caveat for materialized view (datahub-project#4859)

* fix(docs): Metadata day 2022: Fix year (datahub-project#4908)

* feat(ingestion): For all usage connectors, allow exclusion of top_n_queries from ingestion via a config param. (datahub-project#4839)

* feat(redshift-usage): allow users to not ingest top_n_queries

Co-authored-by: Ravindra Lanka <[email protected]>

* docs(frontend): update build command for partial build (datahub-project#4911)

* feat(gms): Add support for deleting reference pointers when deleting by urn (datahub-project#4791)

* feat(containers) Get and display all parent containers in header and search (datahub-project#4910)

Co-authored-by: Chris Collins <[email protected]>

* fix(doc): update doc url to generated docs (datahub-project#4860)

* refactor(API): Add "Filter" support for Assertion Run Events, Dataset Profiles, Dataset Operations (datahub-project#4869)

* refactor(actions): Migrate to use new datahub-actions container (datahub-project#4903)

* docs(transformer): update custom transform example to add missing super init (datahub-project#4912)

* docs(ingest): remove incorrectly annotated lineage capability (datahub-project#4914)

* fix(ui) Fix some spacing issues on the search card (datahub-project#4916)

Co-authored-by: Chris Collins <[email protected]>

* fix(idea): change location of coercer to make intellij not complain about classes (datahub-project#4918)

* feat(spark-lineage): add support for iceberg and cache based plans (datahub-project#4882)

Co-authored-by: magzhu <[email protected]>

* fix(kafka-setup): Add ssl.keystore.type and ssl.truststore.type  (datahub-project#4923)

* feat(ui): Adding Search Bar to all List Views (groups, users, domains, policies, ingestion) (datahub-project#4919)

* fix(kafka-setup): Check if keystore/truststore location env variables are set (datahub-project#4924)

* OIDC discovery URL will not have none as auth method

* Fix code style

* fix compilation error

* add keystore/truststore type env variables

* check keystore/truststore location is set

Co-authored-by: Dexter Lee <[email protected]>

* feat(graphql): Adding resolvers for adding multiple tags, terms, and owners (datahub-project#4917)

* feat(telemetry): add server side telemetry (datahub-project#4925)

Co-authored-by: Kevin Hu <[email protected]>

* fix(lint): lint failure due to mypy upgrade (datahub-project#4933)

* fix(lint): lint failure due to mypy upgrade

* feat(dbt): enable data platform instance on dbt (datahub-project#4926)

* fix(env): provide default for unset telemetry variable (datahub-project#4937)

* make graphql OperationType enum match up w/ pdl (datahub-project#4944)

* Fix docker unified (datahub-project#4948)

* feat(transformers): add transformers to provide tags & terms to schema fields based on regex patterns (datahub-project#4936)

* add tag & term transformers for schemas

* added documentation

* lint fixes

* add clarification that only first set of matching terms is applied

* fix(frontend): Update run-local-frontend to reflect the new Play changes (datahub-project#4951)

* fix(workflow): fix mysql credentials (datahub-project#4947)

* fix(ci): add artifact cleaner, make docker publish sections consistent (datahub-project#4950)

* fix(ci): docker - remove multiplatform builds for unsupported images (datahub-project#4953)

* fix(metadata-service): timeline - ignore platform and schema changes (datahub-project#4952)

* fix(bigquery): add dataset_id for bigquery (datahub-project#4932)

* fix(ci): remove scheduled artifact deletion run to avoid api rate limiting (datahub-project#4954)

* Revert "feat(spark-lineage): add support for iceberg and cache based plans (datahub-project#4882)" (datahub-project#4945)

This reverts commit 46760a7.

* feat: updates for 0.8.35 (datahub-project#4960)

* feat(release): update CLI version (datahub-project#4962)

* feat(release): update CLI version

* add typscript

* fix(ui): do not show copy URN buttons when Clipboard API is not available (datahub-project#4963)

* feat(cli): raise error if get entity api fails (datahub-project#4922)

* fix(data platforms): Update data_platforms.json (datahub-project#4966)

Fixed clickHouse data source icon not being displayed after DataHub startup because there is no network

* docs(datahub-kafka-sink): add topic_routes config to doc of datahub-kafka-sink (datahub-project#4965)

* docs(website): Remove banner and nav item for metadata day 2022 (datahub-project#4968)

* fix(ui): policy outside modal click issue update (datahub-project#4909)

* feat(bigquery): reduce logging (datahub-project#4961)

* feat(bigquery): reduce logging

* doc: add entry for behaviour change

* fix(datahub-client): support utf8 encoding (datahub-project#4878)

fixes datahub-project#4700

* doc(telemetry): fix telemetry doc (datahub-project#4969)

* doc(ingest): mysql - describe required grants (datahub-project#4958)

* fix(cli): graph - get_aspect_v2 method fails to deserialize aspects correctly (datahub-project#4971)

* fix(bigquery): add rate limiting for api calls made (datahub-project#4967)

* feat(bigquery): add partition key tag (datahub-project#4974)

* feat(spark-lineage): support for persist API (datahub-project#4980)

* docs(townhall): update invite links and townhall history (datahub-project#4977)

* fix(UI) Fix multiple UI usability issues (datahub-project#4975)

* fix(ingest): mode - dashboards without creator info fails to process (datahub-project#4983)

* fix(metadata-service): telemetry - fix hardcoded aspect name, suppress errors when producing MAE (datahub-project#4981)

* chore(deps): upgrade datastax libs version (datahub-project#4986)

* fix(redash): use dashboard id if slug does not work (datahub-project#4985)

* fix(ingest): remove new schema field usage (datahub-project#4987)

* feat(graphql) Add new Revokable Token API (datahub-project#4970)

* doc(ingestion): default boolean fix, broken bigquery docgen (datahub-project#4984)

* feat(authorization): Adding AuthorizerContext + ResourceSpecResolver to context (datahub-project#4982)

* refactor(metadata-io): introduce a storage-independent in-memory entity aspect model (datahub-project#4957)

* feat(great-expectations): allow DATAHUB_DEBUG env var to enable debug logs in GE Action (datahub-project#4972)

* feat(ingestion): optionally disable some kafka schema warnings (datahub-project#4169)

Co-authored-by: Claudio Benfatto <[email protected]>
Co-authored-by: Ravindra Lanka <[email protected]>

* feat(run): Create a describe run endpoint for fetching aspects created by the ingestion run (datahub-project#4964)

* fix(smoke-tests) Increases sleep timeout in rollback test to prevent flakiness (datahub-project#4979)

* feat(ingest): s3 - speeding up ingestion with sampling (datahub-project#4927)

* doc(ingest): update golden file command (datahub-project#4992)

* metabase chart are missing from dashboard (datahub-project#4942)

* refactor(redash): emit charts first and try with id based dashboard API first (datahub-project#4991)

* feat(model): add created, lastModified auditstamps to SchemaField (datahub-project#4943)

* fix(ingest): tableau - fix chart custom properties None key error, update docs (datahub-project#4931)

* feat(airflow): Airflow lineage ingestion plugin (datahub-project#4833)

feat(airflow): Airflow lineage ingestion plugin (datahub-project#4833)

* feat(dbt): enable dbt read artifacts from s3 (datahub-project#4935)

Co-authored-by: Shirshanka Das <[email protected]>

* chore(deps): upgrade gson version (datahub-project#4993)

* fix(airflow): Fix for Airflow 1 support (datahub-project#4995)

* feat(Tests): Metadata Tests Models + APIs + UI (Part 1)  (datahub-project#4989)

* fix(docs): Fixes token docs (datahub-project#5000)

* fix(cli): timeline - adjust for timeline API changes on server (datahub-project#4998)

* feat(DataHub Operations): Adding GraphQL mutation for reporting Dataset operations (datahub-project#4988)

* doc(delete): add example for dataflow and datajob (datahub-project#4994)

* doc(delete): add example for dataflow and datajob

* fix(ui): ui bug fix - fixing search card vertical margin (datahub-project#5002)

* fix(gms): Fix incorrect StatefulTokenService init (datahub-project#5004)

StatefulTokenService was not getting correctly initialised in GMS due to a typo on the configuration for the salt leading to inconsistent hashing logic.

* fix(bigquery): restrict protobuf version (datahub-project#5007)

* fix(dbt): missing aws dependency (datahub-project#5008)

* feat(ingest): Add Source from Vertica (datahub-project#4555)


Co-authored-by: Ravindra Lanka <[email protected]>

* Fix pulsar source docs. (datahub-project#5011)

* feat(ingest): Added new ingestion source SAP HANA (datahub-project#4376)


Co-authored-by: Ravindra Lanka <[email protected]>

* fix(redash): improve logging for debugging, add validation for dataset urn, some refactoring (datahub-project#4997)

* fix(bigquery-usage): fix audit metadata query template  (datahub-project#5001)

* feat(ingestion): Add Iceberg source (datahub-project#5010)


Co-authored-by: cccs-eric <[email protected]>
Co-authored-by: Shirshanka Das <[email protected]>

* fix(ingestion): use raw strings for regexes (datahub-project#5006)

* test(ingestion): change class names to avoid unittest warnings (datahub-project#5005)

* feat(Tests): Make DataHub Tests Feature configurable via env variable (datahub-project#5020)

* fix(build): fix for hana build failure for aarch64. (datahub-project#5019)

* fix(ui): arrow click position update (datahub-project#5016)

* fix(bigquery): reduce number of calls for details of partitioning (datahub-project#5014)

* fix(ingestion): Remove hana from base_dev_requirements to unblock m1 users (datahub-project#5024)

* adjusted logback

* fix(parsing): improve sql parsing, some debugging redash (datahub-project#5025)

* refactor(ui): UI Integration  to add multiple tags, terms and owners (datahub-project#4938)

* fix(parsing): incorrect parsing for commas (datahub-project#5027)

* dont set platform instances for sources (datahub-project#5028)

* feat: telemetry improvements (datahub-project#5029)

* fix(telemetry): exclude configuration from standalone apps (datahub-project#5034)

* fix(timelineAPI): fix issue with semantic versioning (datahub-project#5033)

* fix(build): docgen should fail if plugin is not loadable (datahub-project#5038)

* docs(townhall): update townhall rsvp link and add may townhall detail (datahub-project#5021)

* fix(cli): don't use env for container, add example (datahub-project#5012)

* fix(ingest): common - fix nullability determination for the AVRO fixed type. (datahub-project#5023)

* fix(spark-lineage): remove need for sparksession.stop call (datahub-project#4940)

Co-authored-by: Shirshanka Das <[email protected]>

* feat(glossary) Business Glossary updates (datahub-project#5026)

* fix(build): m1 build fails to install hdb-cli (datahub-project#5040)

* fix(profiling): bigquery - Fix for Bigquery temp table creation on GE >= 0.15.3 (datahub-project#5035)

* feat(ingest): glue - enable profiling (datahub-project#4879)

* fix(doc) - Specify docker-compose version to avoid compatibility issues (datahub-project#5030)

* doc(bigquery): fix missing permissions (datahub-project#5041)

* chore(deps): bump minimist in /smoke-test/tests/cypress (datahub-project#4845)

Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases)
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6)

---
updated-dependencies:
- dependency-name: minimist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shirshanka Das <[email protected]>

* fix(restore): Add RESTATE ChangeType to MCL / MCP to permit restore indices (datahub-project#5022)

* fix(redash): fix bug with names, add option for page size, debugging info (datahub-project#5045)

* updated to 0.8.35

Co-authored-by: Sebo Kim <[email protected]>
Co-authored-by: Aseem Bansal <[email protected]>
Co-authored-by: cccs-eric <[email protected]>
Co-authored-by: chen4119 <[email protected]>
Co-authored-by: David Haglund <[email protected]>
Co-authored-by: Danilo Peixoto <[email protected]>
Co-authored-by: Gabe Lyons <[email protected]>
Co-authored-by: Dexter Lee <[email protected]>
Co-authored-by: Jordan Wolinsky <[email protected]>
Co-authored-by: Paweł Iwiński <[email protected]>
Co-authored-by: mayurinehate <[email protected]>
Co-authored-by: Shubham Thakre <[email protected]>
Co-authored-by: Aditya Radhakrishnan <[email protected]>
Co-authored-by: Chris Collins <[email protected]>
Co-authored-by: Chris Collins <[email protected]>
Co-authored-by: Ravindra Lanka <[email protected]>
Co-authored-by: John Joyce <[email protected]>
Co-authored-by: Shirshanka Das <[email protected]>
Co-authored-by: Jordan Wolinsky <[email protected]>
Co-authored-by: vanmeete <[email protected]>
Co-authored-by: Pedro Silva <[email protected]>
Co-authored-by: RyanHolstien <[email protected]>
Co-authored-by: MugdhaHardikar-GSLab <[email protected]>
Co-authored-by: mitchelllovessoftware123 <[email protected]>
Co-authored-by: Vladislavs Gaidass <[email protected]>
Co-authored-by: Chris Collins <[email protected]>
Co-authored-by: leifker <[email protected]>
Co-authored-by: Tamas Nemeth <[email protected]>
Co-authored-by: Ronald Angel <[email protected]>
Co-authored-by: Luis Angel Vicente Sanchez <[email protected]>
Co-authored-by: Ethan Claassen <[email protected]>
Co-authored-by: akravtsov <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Cristian Calugaru <[email protected]>
Co-authored-by: Zach Bluhm <[email protected]>
Co-authored-by: Justin Marozas <[email protected]>
Co-authored-by: Sagar Tiwari <[email protected]>
Co-authored-by: Jeff Merrick <[email protected]>
Co-authored-by: BZ <[email protected]>
Co-authored-by: maggie-zhu <[email protected]>
Co-authored-by: magzhu <[email protected]>
Co-authored-by: Kevin Hu <[email protected]>
Co-authored-by: Aditya Radhakrishnan <[email protected]>
Co-authored-by: Felix Lüdin <[email protected]>
Co-authored-by: liyuhui666 <[email protected]>
Co-authored-by: Mert Tunç <[email protected]>
Co-authored-by: Maggie Hays <[email protected]>
Co-authored-by: Pedro Silva <[email protected]>
Co-authored-by: Justin Marozas <[email protected]>
Co-authored-by: Claudio Benfatto <[email protected]>
Co-authored-by: Claudio Benfatto <[email protected]>
Co-authored-by: mohdsiddique <[email protected]>
Co-authored-by: Ebu (えぶ) <[email protected]>
Co-authored-by: buggythepirate <[email protected]>
Co-authored-by: Patrick Franco Braz <[email protected]>
Co-authored-by: Harshal Sheth <[email protected]>
Co-authored-by: Ankit keshari <[email protected]>
Co-authored-by: Ndamulelo Nemakhavhani <[email protected]>
maggiehays pushed a commit to maggiehays/datahub that referenced this pull request Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants