Skip to content

Conversation

@treff7es
Copy link
Contributor

@treff7es treff7es commented May 28, 2025

This pull request adds Unity Catalog tag extraction.

To support Tag extraction, it introduces new abstractions that handle external entities, tags, and text processing within the DataHub ingestion framework. The changes include the addition of a repository for managing platform resources, utilities for external tags, and a new RestrictedText type for handling sanitized and truncated strings.

External Entities and Resources:

External Tag Management:

  • ExternalTag Class: Added a class to handle external tags that integrate with systems like DataHub. It supports parsing, creating, and converting tags to and from DataHub URNs, while ensuring proper sanitization and validation. (metadata-ingestion/src/datahub/api/entities/external/external_tag.py, metadata-ingestion/src/datahub/api/entities/external/external_tag.pyR1-R145)

Text Processing:

Thank you for contributing to DataHub!

Before you submit your PR, please go through the checklist below:

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

-->

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label May 28, 2025
@codecov
Copy link

codecov bot commented May 28, 2025

Codecov Report

Attention: Patch coverage is 71.51052% with 149 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...src/datahub/ingestion/source/unity/tag_entities.py 42.71% 59 Missing ⚠️
...c/datahub/api/entities/external/restricted_text.py 70.78% 26 Missing ⚠️
...stion/src/datahub/ingestion/source/unity/source.py 65.71% 24 Missing ⚠️
...estion/src/datahub/ingestion/source/unity/proxy.py 67.18% 21 Missing ⚠️
...datahub/api/entities/external/external_entities.py 91.08% 9 Missing ⚠️
...ntities/external/unity_catalog_external_entites.py 84.31% 8 Missing ⚠️
.../src/datahub/api/entities/external/external_tag.py 95.45% 2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (71.51%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

📢 Thoughts on this report? Let us know!

@alwaysmeticulous
Copy link

alwaysmeticulous bot commented May 28, 2025

🔴 Meticulous spotted visual differences in 3 of 1475 screens tested: view and approve differences detected.

Meticulous evaluated ~9 hours of user flows against your PR.

Last updated for commit 9e8d54b. This comment will update as new commits are pushed.

Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments to PR review. How did you design, what choices you made, what are the critical parts to review.

Did you get alignment on whether we need to mint the TagKey aspect?

@jjoyce0510
Copy link
Collaborator

Please also take some time to address Sergio's comments

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Jun 5, 2025
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Jun 5, 2025
@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Jun 6, 2025
@treff7es treff7es merged commit 0eef7a0 into master Jun 6, 2025
71 of 74 checks passed
@treff7es treff7es deleted the databricks_tag_ingestion branch June 6, 2025 11:24
kartikey-visa pushed a commit to kartikey-visa/datahub that referenced this pull request Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants