-
Notifications
You must be signed in to change notification settings - Fork 3.3k
feat(ingest/unity-catalog): Tag extraction #13642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found. ❌ Your patch status has failed because the patch coverage (71.51%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage. 📢 Thoughts on this report? Let us know! |
|
🔴 Meticulous spotted visual differences in 3 of 1475 screens tested: view and approve differences detected. Meticulous evaluated ~9 hours of user flows against your PR. Last updated for commit 9e8d54b. This comment will update as new commits are pushed. |
jjoyce0510
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments to PR review. How did you design, what choices you made, what are the critical parts to review.
Did you get alignment on whether we need to mint the TagKey aspect?
|
Please also take some time to address Sergio's comments |
This pull request adds Unity Catalog tag extraction.
To support Tag extraction, it introduces new abstractions that handle external entities, tags, and text processing within the DataHub ingestion framework. The changes include the addition of a repository for managing platform resources, utilities for external tags, and a new
RestrictedTexttype for handling sanitized and truncated strings.External Entities and Resources:
PlatformResourceRepository: Added a repository class for managing platform resources, with methods for searching, creating, retrieving, and deleting resources. It includes caching to optimize performance. (metadata-ingestion/src/datahub/api/entities/external/external_entities.py, metadata-ingestion/src/datahub/api/entities/external/external_entities.pyR1-R239)ExternalEntityand Related Classes: IntroducedExternalEntity,ExternalEntityId, andMissingExternalEntityto represent entities external to DataHub. These classes provide methods for linking external entities to DataHub resources and handling missing entities. (metadata-ingestion/src/datahub/api/entities/external/external_entities.py, metadata-ingestion/src/datahub/api/entities/external/external_entities.pyR1-R239)External Tag Management:
ExternalTagClass: Added a class to handle external tags that integrate with systems like DataHub. It supports parsing, creating, and converting tags to and from DataHub URNs, while ensuring proper sanitization and validation. (metadata-ingestion/src/datahub/api/entities/external/external_tag.py, metadata-ingestion/src/datahub/api/entities/external/external_tag.pyR1-R145)Text Processing:
RestrictedTextClass: Introduced a custom Pydantic type for handling strings with configurable truncation, character replacement, and original value preservation. This is useful for ensuring sanitized and standardized text input. (metadata-ingestion/src/datahub/api/entities/external/restricted_text.py, metadata-ingestion/src/datahub/api/entities/external/restricted_text.pyR1-R247)<!--Thank you for contributing to DataHub!
Before you submit your PR, please go through the checklist below:
-->