-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest): add ingestion source for SAP Analytics Cloud #10958
feat(ingest): add ingestion source for SAP Analytics Cloud #10958
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe recent updates significantly enhance the integration of SAP Analytics Cloud (SAC) into the application, focusing on improved data ingestion and user configuration capabilities. Key changes include the introduction of SAC-related constants, a dedicated source connector for SAC, and comprehensive documentation. These enhancements empower users to seamlessly connect to SAC, manage their data sources effectively, and streamline workflows, ultimately boosting productivity. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant App
participant SACSource
participant SACAPI
User->>App: Configure SAC
App->>SACSource: Initialize with config
SACSource->>SACAPI: Authenticate
SACAPI-->>SACSource: Access Token
SACSource->>SACAPI: Retrieve Resources
SACAPI-->>SACSource: Resource Metadata
SACSource->>App: Return Resources
App->>User: Display Resources
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
datahub-web-react/src/images/saclogo.svg
is excluded by!**/*.svg
Files selected for processing (19)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts (3 hunks)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts (1 hunks)
- datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx (1 hunks)
- datahub-web-react/src/app/ingest/source/builder/constants.ts (3 hunks)
- datahub-web-react/src/app/ingest/source/builder/sources.json (1 hunks)
- datahub-web-react/src/app/ingest/source/conf/sac/sac.ts (1 hunks)
- datahub-web-react/src/app/ingest/source/conf/sources.tsx (2 hunks)
- metadata-ingestion/docs/sources/sac/sac_pre.md (1 hunks)
- metadata-ingestion/docs/sources/sac/sac_recipe.yml (1 hunks)
- metadata-ingestion/setup.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sac/init.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py (1 hunks)
- metadata-ingestion/src/datahub/utilities/logging_manager.py (1 hunks)
- metadata-ingestion/tests/integration/sac/metadata.xml (1 hunks)
- metadata-ingestion/tests/integration/sac/sac_mces_golden.json (1 hunks)
- metadata-ingestion/tests/integration/sac/test_sac.py (1 hunks)
- metadata-service/war/src/main/resources/boot/data_platforms.json (1 hunks)
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/sac/__init__.py
1-1:
datahub.ingestion.source.sac.sac.SACSource
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
yamllint
metadata-ingestion/docs/sources/sac/sac_recipe.yml
[error] 11-11: syntax error: found unknown escape character '$'
(syntax)
LanguageTool
metadata-ingestion/docs/sources/sac/sac_pre.md
[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...(ENGLISH_WORD_REPEAT_RULE)
[formatting] ~40-~40: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...Story or an Application will be ingested, because there is no dedicated API to retrieve m...(COMMA_BEFORE_BECAUSE)
[formatting] ~41-~41: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...rowse Paths for models cannot be created, because the folder where the models are saved i...(COMMA_BEFORE_BECAUSE)
[formatting] ~42-~42: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ... is only ingested for Import Data Models, because there is no possibility to get the sche...(COMMA_BEFORE_BECAUSE)
[formatting] ~43-~43: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...or Import Data Models cannot be ingested, because the API is not providing any informatio...(COMMA_BEFORE_BECAUSE)
[uncategorized] ~44-~44: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...t providing any information about it. - Currently only SAP BW and SAP HANA are supported ...(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[typographical] ~45-~45: The word “therefore” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...the models are Live Data or Import Data Models, therefore these models will be ingested only with...(THUS_SENTENCE)
Additional comments not posted (68)
datahub-web-react/src/app/ingest/source/conf/sac/sac.ts (1)
1-26
: LGTM!The configuration for the new SAP Analytics Cloud ingestion source is well-structured and follows best practices. The placeholder recipe provides clear guidance on configuring the source.
metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py (1)
1-45
: LGTM!The data models are well-defined and follow best practices. The use of
dataclass(frozen=True)
ensures immutability, and optional fields are correctly annotated.metadata-ingestion/docs/sources/sac/sac_recipe.yml (5)
1-5
: LGTM!The source type and stateful ingestion configuration are correctly specified.
7-8
: LGTM!The tenant and token URL configuration are correctly specified with placeholders and comments.
14-18
: LGTM!The ingestion configuration for stories and applications is correctly specified.
20-30
: LGTM!The resource ID, name, and folder pattern configuration are correctly specified to allow all.
32-40
: LGTM!The connection mapping configuration is correctly specified for SAP Analytics Cloud connections.
datahub-web-react/src/app/ingest/source/conf/sources.tsx (2)
20-20
: LGTM!The import statement for
sacConfig
is correctly specified.
52-52
: LGTM!The
sacConfig
is correctly added to theSOURCE_TEMPLATE_CONFIGS
array.metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py (2)
21-23
: LGTM!The new constants
SAC_MODEL
,SAC_IMPORT_DATA_MODEL
, andSAC_LIVE_DATA_MODEL
are correctly specified.
79-80
: LGTM!The new constants
SAC_STORY
andSAC_APPLICATION
are correctly specified.metadata-ingestion/docs/sources/sac/sac_pre.md (3)
28-37
: LGTM!The concept mapping section is clear and accurate.
38-45
: LGTM!The limitations section is clear and accurate.
Tools
LanguageTool
[formatting] ~40-~40: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...Story or an Application will be ingested, because there is no dedicated API to retrieve m...(COMMA_BEFORE_BECAUSE)
[formatting] ~41-~41: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...rowse Paths for models cannot be created, because the folder where the models are saved i...(COMMA_BEFORE_BECAUSE)
[formatting] ~42-~42: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ... is only ingested for Import Data Models, because there is no possibility to get the sche...(COMMA_BEFORE_BECAUSE)
[formatting] ~43-~43: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...or Import Data Models cannot be ingested, because the API is not providing any informatio...(COMMA_BEFORE_BECAUSE)
[uncategorized] ~44-~44: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...t providing any information about it. - Currently only SAP BW and SAP HANA are supported ...(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[typographical] ~45-~45: The word “therefore” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...the models are Live Data or Import Data Models, therefore these models will be ingested only with...(THUS_SENTENCE)
5-5
: Possible typo: repeated wordThe word "Access" is repeated.
- - Access: - - Data Import Service + - Access: Data Import ServiceLikely invalid or redundant comment.
Tools
LanguageTool
[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...(ENGLISH_WORD_REPEAT_RULE)
datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx (1)
107-117
: LGTM!The sorting logic is correct and efficiently places "custom" at the end while sorting the remaining items alphabetically.
datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts (12)
1-12
: LGTM!The
SAC_TENANT_URL
field is correctly defined and the configuration is accurate.
14-23
: LGTM!The
SAC_TOKEN_URL
field is correctly defined and the configuration is accurate.
25-34
: LGTM!The
SAC_CLIENT_ID
field is correctly defined and the configuration is accurate.
36-45
: LGTM!The
SAC_CLIENT_SECRET
field is correctly defined and the configuration is accurate.
47-55
: LGTM!The
INGEST_STORIES
field is correctly defined and the configuration is accurate.
57-65
: LGTM!The
INGEST_APPLICATIONS
field is correctly defined and the configuration is accurate.
67-81
: LGTM!The
RESOURCE_ID_ALLOW
field is correctly defined and the configuration is accurate.
83-97
: LGTM!The
RESOURCE_ID_DENY
field is correctly defined and the configuration is accurate.
99-113
: LGTM!The
RESOURCE_NAME_ALLOW
field is correctly defined and the configuration is accurate.
115-129
: LGTM!The
RESOURCE_NAME_DENY
field is correctly defined and the configuration is accurate.
131-145
: LGTM!The
FOLDER_ALLOW
field is correctly defined and the configuration is accurate.
147-161
: LGTM!The
FOLDER_DENY
field is correctly defined and the configuration is accurate.datahub-web-react/src/app/ingest/source/builder/constants.ts (4)
37-37
: Import statement forsacLogo
looks good.The import path is consistent with other logo imports.
126-126
: Constant declaration forSAC
looks good.The constant follows the existing pattern and is correctly defined.
127-127
: URN declaration forSAC
looks good.The URN follows the existing pattern and is correctly defined.
167-167
: Mapping ofSAC_URN
tosacLogo
looks good.The mapping is correctly added and follows the existing pattern.
metadata-ingestion/src/datahub/utilities/logging_manager.py (1)
281-283
: Logging configurations forpyodata
look good.The logging levels are correctly set to
WARNING
for the specified components.metadata-ingestion/tests/integration/sac/test_sac.py (10)
1-18
: Imports and mock constants look good.The imports and constants are correctly defined and necessary for the tests.
20-51
:test_sac
function and mock token URL matching look good.The function and mock request are correctly defined and necessary for the tests.
53-65
: Metadata mock request and matching function look good.The mock request and matching function are correctly defined and necessary for the tests.
67-120
: Resources mock request and matching function look good.The mock request and matching function are correctly defined and necessary for the tests.
122-178
: Resource mock request and matching function look good.The mock request and matching function are correctly defined and necessary for the tests.
180-199
: Models mock request and matching function look good.The mock request and matching function are correctly defined and necessary for the tests.
201-273
: Model metadata mock request and matching function look good.The mock request and matching function are correctly defined and necessary for the tests.
275-296
: Pipeline creation and execution look good.The pipeline creation and execution are correctly defined and necessary for the tests.
297-302
: Golden file check looks good.The golden file check is correctly defined and necessary for the tests.
305-310
: Authorization check function looks good.The authorization check function is correctly defined and necessary for the tests.
metadata-service/war/src/main/resources/boot/data_platforms.json (1)
668-676
: Ensure the new SAP Analytics Cloud entry follows the correct format.The new entry for SAP Analytics Cloud seems correctly formatted and consistent with other entries in the JSON structure. Ensure that the
logoUrl
points to a valid location and that theurn
and other attributes are accurate and necessary for integration.datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts (4)
86-86
: New import for SAP Analytics Cloud constants.The new import statement for SAP Analytics Cloud constants is correctly added. Ensure that the imported constants are defined and used appropriately in the rest of the file.
174-187
: New constants for SAP Analytics Cloud.The new constants related to SAP Analytics Cloud are correctly imported from the
sac
module. Ensure that these constants are defined in thesac
module and are used appropriately in theRECIPE_FIELDS
object.
536-549
: New entry inRECIPE_FIELDS
for SAP Analytics Cloud.The new entry for SAP Analytics Cloud in the
RECIPE_FIELDS
object is correctly added. Ensure that thefields
,filterFields
, andadvancedFields
are correctly defined and consistent with other entries.
554-561
: Update toCONNECTORS_WITH_TEST_CONNECTION
to include SAP Analytics Cloud.The update to include SAP Analytics Cloud in the
CONNECTORS_WITH_TEST_CONNECTION
set is correctly added. Ensure that the test connection functionality for SAP Analytics Cloud is implemented and works as expected.metadata-ingestion/tests/integration/sac/sac_mces_golden.json (1)
1-662
: New test data for SAP Analytics Cloud integration.The new JSON file contains comprehensive test data for SAP Analytics Cloud integration. Ensure that the test data covers all necessary aspects and scenarios for SAP Analytics Cloud integration, including dashboards, datasets, and upstream lineage.
The structure and content of the test data appear to be consistent and well-formed. Verify that the
entityUrn
,aspectName
, and other attributes are accurate and necessary for the intended tests.datahub-web-react/src/app/ingest/source/builder/sources.json (6)
291-291
: LGTM! Theurn
attribute is correctly formatted.The
urn
uniquely identifies the data platform as "urn:li:dataPlatform:sac".
292-292
: LGTM! Thename
attribute is correctly formatted.The
name
attribute is set to "sac".
293-293
: LGTM! ThedisplayName
attribute is correctly formatted.The
displayName
attribute is set to "SAP Analytics Cloud".
294-294
: LGTM! Thedescription
attribute is clear and informative.The
description
provides an overview of the functionality, indicating that it allows for the importation of stories, applications, and models from SAP Analytics Cloud.
295-295
: LGTM! ThedocsUrl
attribute is correctly formatted.The
docsUrl
points to the relevant documentation for this data source.
296-296
: LGTM! Therecipe
attribute is clear and informative.The
recipe
outlines the configuration parameters required for integration, includingtenant_url
,token_url
,client_id
, andclient_secret
, with placeholders for sensitive information.metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (8)
1-11
: LGTM! The imports are necessary and relevant.The imports include standard libraries, third-party libraries, and internal modules.
92-105
: LGTM! The ConnectionMappingConfig class is well-defined.The class includes fields for platform, platform_instance, and env, providing the necessary configuration for connection mapping.
108-161
: LGTM! The SACSourceConfig class is comprehensive.The class includes various fields for configuration, such as tenant_url, token_url, client_id, client_secret, and patterns for selecting resources.
164-166
: LGTM! The SACSourceReport class is well-defined.The class is a dataclass that extends StaleEntityRemovalSourceReport.
169-221
: LGTM! The SACSource class is comprehensive and well-implemented.The class includes various methods for handling data ingestion, such as
__init__
,create
,test_connection
,get_workunit_processors
,get_workunits_internal
,get_resources
, and others.
421-630
: LGTM! The get_model_workunits method is well-implemented.The method generates work units for datasets based on the provided model, handling various aspects of dataset ingestion, such as dataset properties, schema metadata, and upstream lineage.
632-733
: LGTM! The get_resources method is well-implemented.The method retrieves resources from the SAP Analytics Cloud tenant and generates Resource objects, handling various aspects of resource retrieval, such as filtering, pattern matching, and parsing resource models.
735-805
: LGTM! The utility methods are well-implemented.The methods include
get_import_data_model_ids
,get_import_data_model_columns
,get_query_name
,get_view_name
,get_schema_field_data_type
, and_add_sap_sac_custom_auth_header
, providing necessary functionality for the SAC source connector.metadata-ingestion/tests/integration/sac/metadata.xml (4)
1-2
: LGTM! The XML header and root element are correctly formatted.The XML header and root element provide the necessary metadata for the XML document.
3-402
: LGTM! The Schema element is well-defined.The Schema element defines the namespace and includes various EntityType and Association elements, providing the necessary structure for the XML document.
300-401
: LGTM! The EntityContainer element is well-defined.The EntityContainer element defines various EntitySet and AssociationSet elements, providing the necessary structure for the XML document.
403-404
: LGTM! The closing tags are correctly formatted.The closing tags for the Schema and root elements provide the necessary structure for the XML document.
metadata-ingestion/setup.py (2)
306-310
: Add SAC dependency group.The
sac
dependency group is correctly added with the necessary dependencies.
728-728
: Register SAC entry point.The
sac
entry point is correctly registered, linking to theSACSource
class.
metadata-ingestion/src/datahub/ingestion/source/sac/__init__.py
Outdated
Show resolved
Hide resolved
88da592
to
8b9509c
Compare
8b9509c
to
a125fa2
Compare
a125fa2
to
6befa6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (5)
metadata-ingestion/docs/sources/sac/sac_pre.md (5)
3-3
: Add comma after 'Currently'.A comma is needed after "Currently".
- Currently only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed. + Currently, only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed.
12-12
: Add missing comma.A comma is needed after "environments".
- To map individual connections in SAP Analytics Cloud to platforms, platform instances or environments the `connection_mapping` configuration can be used within the recipe: + To map individual connections in SAP Analytics Cloud to platforms, platform instances, or environments, the `connection_mapping` configuration can be used within the recipe:Tools
LanguageTool
[uncategorized] ~12-~12: Possible missing comma found.
Context: ...oud to platforms, platform instances or environments theconnection_mapping
configuration ...(AI_HYDRA_LEO_MISSING_COMMA)
40-43
: Remove comma before 'because'.The comma before "because" is unnecessary and should be removed.
- Only models which are used in a Story or an Application will be ingested, because there is no dedicated API to retrieve models (only for Stories and Applications). - Browse Paths for models cannot be created, because the folder where the models are saved is not returned by the API. - Schema metadata is only ingested for Import Data Models, because there is no possibility to get the schema metadata of the other model types. - Lineages for Import Data Models cannot be ingested, because the API is not providing any information about it. + Only models which are used in a Story or an Application will be ingested because there is no dedicated API to retrieve models (only for Stories and Applications). + Browse Paths for models cannot be created because the folder where the models are saved is not returned by the API. + Schema metadata is only ingested for Import Data Models because there is no possibility to get the schema metadata of the other model types. + Lineages for Import Data Models cannot be ingested because the API is not providing any information about it.
45-45
: Separate 'therefore' from the sentence.The word "therefore" should be separated from the sentence.
- For some models (e.g., builtin models) it cannot be detected whether the models are Live Data or Import Data Models, therefore these models will be ingested only with the `Story` subtype. + For some models (e.g., builtin models) it cannot be detected whether the models are Live Data or Import Data Models. Therefore, these models will be ingested only with the `Story` subtype.
5-5
: Possible typo: repeated word.The word "Access" is repeated.
- Purpose: API Access - Access: - Data Import Service + Purpose: API Access + Data Import ServiceTools
LanguageTool
[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...(ENGLISH_WORD_REPEAT_RULE)
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
datahub-web-react/src/images/saclogo.svg
is excluded by!**/*.svg
Files selected for processing (18)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts (3 hunks)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts (1 hunks)
- datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx (1 hunks)
- datahub-web-react/src/app/ingest/source/builder/constants.ts (3 hunks)
- datahub-web-react/src/app/ingest/source/builder/sources.json (1 hunks)
- datahub-web-react/src/app/ingest/source/conf/sac/sac.ts (1 hunks)
- datahub-web-react/src/app/ingest/source/conf/sources.tsx (2 hunks)
- metadata-ingestion/docs/sources/sac/sac_pre.md (1 hunks)
- metadata-ingestion/docs/sources/sac/sac_recipe.yml (1 hunks)
- metadata-ingestion/setup.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py (1 hunks)
- metadata-ingestion/src/datahub/utilities/logging_manager.py (1 hunks)
- metadata-ingestion/tests/integration/sac/metadata.xml (1 hunks)
- metadata-ingestion/tests/integration/sac/sac_mces_golden.json (1 hunks)
- metadata-ingestion/tests/integration/sac/test_sac.py (1 hunks)
- metadata-service/war/src/main/resources/boot/data_platforms.json (1 hunks)
Files skipped from review due to trivial changes (1)
- metadata-ingestion/docs/sources/sac/sac_recipe.yml
Files skipped from review as they are similar to previous changes (15)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts
- datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx
- datahub-web-react/src/app/ingest/source/builder/constants.ts
- datahub-web-react/src/app/ingest/source/builder/sources.json
- datahub-web-react/src/app/ingest/source/conf/sac/sac.ts
- datahub-web-react/src/app/ingest/source/conf/sources.tsx
- metadata-ingestion/setup.py
- metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py
- metadata-ingestion/src/datahub/ingestion/source/sac/sac.py
- metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py
- metadata-ingestion/src/datahub/utilities/logging_manager.py
- metadata-ingestion/tests/integration/sac/metadata.xml
- metadata-ingestion/tests/integration/sac/sac_mces_golden.json
- metadata-ingestion/tests/integration/sac/test_sac.py
Additional context used
LanguageTool
metadata-ingestion/docs/sources/sac/sac_pre.md
[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~12-~12: Possible missing comma found.
Context: ...oud to platforms, platform instances or environments theconnection_mapping
configuration ...(AI_HYDRA_LEO_MISSING_COMMA)
Additional comments not posted (1)
metadata-service/war/src/main/resources/boot/data_platforms.json (1)
667-676
: Ensure logo URL is correct.Verify that the logo URL
/assets/platforms/saclogo.svg
is correct and the file exists in the specified location.
6befa6a
to
3beb17a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (5)
metadata-ingestion/docs/sources/sac/sac_pre.md (5)
1-1
: Add a comma after 'Currently'.A comma is needed after "Currently" for better readability.
- Currently only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed. + Currently, only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed.
12-12
: Add a comma after 'environments'.A comma is needed after "environments" for better readability.
- To map individual connections in SAP Analytics Cloud to platforms, platform instances or environments the `connection_mapping` configuration can be used within the recipe: + To map individual connections in SAP Analytics Cloud to platforms, platform instances or environments, the `connection_mapping` configuration can be used within the recipe:Tools
LanguageTool
[uncategorized] ~12-~12: Possible missing comma found.
Context: ...oud to platforms, platform instances or environments theconnection_mapping
configuration ...(AI_HYDRA_LEO_MISSING_COMMA)
40-43
: Remove comma before 'because'.The comma before "because" is unnecessary and should be removed.
- Only models which are used in a Story or an Application will be ingested, because there is no dedicated API to retrieve models (only for Stories and Applications). - Browse Paths for models cannot be created, because the folder where the models are saved is not returned by the API. - Schema metadata is only ingested for Import Data Models, because there is no possibility to get the schema metadata of the other model types. - Lineages for Import Data Models cannot be ingested, because the API is not providing any information about it. + Only models which are used in a Story or an Application will be ingested because there is no dedicated API to retrieve models (only for Stories and Applications). + Browse Paths for models cannot be created because the folder where the models are saved is not returned by the API. + Schema metadata is only ingested for Import Data Models because there is no possibility to get the schema metadata of the other model types. + Lineages for Import Data Models cannot be ingested because the API is not providing any information about it.
44-44
: Add comma after 'Currently'.A comma is needed after "Currently" for better readability.
- Currently only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed. + Currently, only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed.
45-45
: Separate 'therefore' from the sentence.The word "therefore" should be separated from the sentence.
- For some models (e.g., builtin models) it cannot be detected whether the models are Live Data or Import Data Models, therefore these models will be ingested only with the `Story` subtype. + For some models (e.g., builtin models) it cannot be detected whether the models are Live Data or Import Data Models. Therefore, these models will be ingested only with the `Story` subtype.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
datahub-web-react/src/images/saclogo.svg
is excluded by!**/*.svg
Files selected for processing (18)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts (3 hunks)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts (1 hunks)
- datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx (1 hunks)
- datahub-web-react/src/app/ingest/source/builder/constants.ts (3 hunks)
- datahub-web-react/src/app/ingest/source/builder/sources.json (1 hunks)
- datahub-web-react/src/app/ingest/source/conf/sac/sac.ts (1 hunks)
- datahub-web-react/src/app/ingest/source/conf/sources.tsx (2 hunks)
- metadata-ingestion/docs/sources/sac/sac_pre.md (1 hunks)
- metadata-ingestion/docs/sources/sac/sac_recipe.yml (1 hunks)
- metadata-ingestion/setup.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py (1 hunks)
- metadata-ingestion/src/datahub/utilities/logging_manager.py (1 hunks)
- metadata-ingestion/tests/integration/sac/metadata.xml (1 hunks)
- metadata-ingestion/tests/integration/sac/sac_mces_golden.json (1 hunks)
- metadata-ingestion/tests/integration/sac/test_sac.py (1 hunks)
- metadata-service/war/src/main/resources/boot/data_platforms.json (1 hunks)
Files skipped from review due to trivial changes (2)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts
- metadata-ingestion/docs/sources/sac/sac_recipe.yml
Files skipped from review as they are similar to previous changes (14)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts
- datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx
- datahub-web-react/src/app/ingest/source/builder/constants.ts
- datahub-web-react/src/app/ingest/source/builder/sources.json
- datahub-web-react/src/app/ingest/source/conf/sac/sac.ts
- datahub-web-react/src/app/ingest/source/conf/sources.tsx
- metadata-ingestion/setup.py
- metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py
- metadata-ingestion/src/datahub/ingestion/source/sac/sac.py
- metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py
- metadata-ingestion/src/datahub/utilities/logging_manager.py
- metadata-ingestion/tests/integration/sac/metadata.xml
- metadata-ingestion/tests/integration/sac/sac_mces_golden.json
- metadata-ingestion/tests/integration/sac/test_sac.py
Additional context used
LanguageTool
metadata-ingestion/docs/sources/sac/sac_pre.md
[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~12-~12: Possible missing comma found.
Context: ...oud to platforms, platform instances or environments theconnection_mapping
configuration ...(AI_HYDRA_LEO_MISSING_COMMA)
Additional comments not posted (1)
metadata-service/war/src/main/resources/boot/data_platforms.json (1)
669-676
: LGTM!The new entry for SAP Analytics Cloud is correctly added and consistent with other entries.
3beb17a
to
b56983f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
datahub-web-react/src/images/saclogo.svg
is excluded by!**/*.svg
Files selected for processing (18)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts (3 hunks)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts (1 hunks)
- datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx (1 hunks)
- datahub-web-react/src/app/ingest/source/builder/constants.ts (3 hunks)
- datahub-web-react/src/app/ingest/source/builder/sources.json (1 hunks)
- datahub-web-react/src/app/ingest/source/conf/sac/sac.ts (1 hunks)
- datahub-web-react/src/app/ingest/source/conf/sources.tsx (2 hunks)
- metadata-ingestion/docs/sources/sac/sac_pre.md (1 hunks)
- metadata-ingestion/docs/sources/sac/sac_recipe.yml (1 hunks)
- metadata-ingestion/setup.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py (1 hunks)
- metadata-ingestion/src/datahub/utilities/logging_manager.py (1 hunks)
- metadata-ingestion/tests/integration/sac/metadata.xml (1 hunks)
- metadata-ingestion/tests/integration/sac/sac_mces_golden.json (1 hunks)
- metadata-ingestion/tests/integration/sac/test_sac.py (1 hunks)
- metadata-service/war/src/main/resources/boot/data_platforms.json (1 hunks)
Files skipped from review due to trivial changes (2)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts
- metadata-ingestion/src/datahub/utilities/logging_manager.py
Files skipped from review as they are similar to previous changes (15)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts
- datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx
- datahub-web-react/src/app/ingest/source/builder/constants.ts
- datahub-web-react/src/app/ingest/source/builder/sources.json
- datahub-web-react/src/app/ingest/source/conf/sac/sac.ts
- datahub-web-react/src/app/ingest/source/conf/sources.tsx
- metadata-ingestion/docs/sources/sac/sac_recipe.yml
- metadata-ingestion/setup.py
- metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py
- metadata-ingestion/src/datahub/ingestion/source/sac/sac.py
- metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py
- metadata-ingestion/tests/integration/sac/metadata.xml
- metadata-ingestion/tests/integration/sac/sac_mces_golden.json
- metadata-ingestion/tests/integration/sac/test_sac.py
- metadata-service/war/src/main/resources/boot/data_platforms.json
Additional context used
LanguageTool
metadata-ingestion/docs/sources/sac/sac_pre.md
[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...(ENGLISH_WORD_REPEAT_RULE)
Additional comments not posted (7)
metadata-ingestion/docs/sources/sac/sac_pre.md (7)
40-40
: Remove comma before 'because'The comma before "because" is unnecessary and should be removed.
- Only models which are used in a Story or an Application will be ingested, because there is no dedicated API to retrieve models (only for Stories and Applications). + Only models which are used in a Story or an Application will be ingested because there is no dedicated API to retrieve models (only for Stories and Applications).
41-41
: Remove comma before 'because'The comma before "because" is unnecessary and should be removed.
- Browse Paths for models cannot be created, because the folder where the models are saved is not returned by the API. + Browse Paths for models cannot be created because the folder where the models are saved is not returned by the API.
42-42
: Remove comma before 'because'The comma before "because" is unnecessary and should be removed.
- Schema metadata is only ingested for Import Data Models, because there is no possibility to get the schema metadata of the other model types. + Schema metadata is only ingested for Import Data Models because there is no possibility to get the schema metadata of the other model types.
43-43
: Remove comma before 'because'The comma before "because" is unnecessary and should be removed.
- Lineages for Import Data Models cannot be ingested, because the API is not providing any information about it. + Lineages for Import Data Models cannot be ingested because the API is not providing any information about it.
44-44
: Add comma after 'Currently'A comma is needed after "Currently".
- Currently only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed. + Currently, only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed.
45-45
: Separate 'therefore' from the sentenceThe word "therefore" should be separated from the sentence.
- For some models (e.g., builtin models) it cannot be detected whether the models are Live Data or Import Data Models, therefore these models will be ingested only with the `Story` subtype. + For some models (e.g., builtin models) it cannot be detected whether the models are Live Data or Import Data Models. Therefore, these models will be ingested only with the `Story` subtype.
5-5
: Possible typo: repeated word "Access"The word "Access" is repeated, which might be a typo.
- - Access: + - Data Import ServiceLikely invalid or redundant comment.
Tools
LanguageTool
[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...(ENGLISH_WORD_REPEAT_RULE)
b56983f
to
7d5ee53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
datahub-web-react/src/images/saclogo.svg
is excluded by!**/*.svg
Files selected for processing (18)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts (3 hunks)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts (1 hunks)
- datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx (1 hunks)
- datahub-web-react/src/app/ingest/source/builder/constants.ts (3 hunks)
- datahub-web-react/src/app/ingest/source/builder/sources.json (1 hunks)
- datahub-web-react/src/app/ingest/source/conf/sac/sac.ts (1 hunks)
- datahub-web-react/src/app/ingest/source/conf/sources.tsx (2 hunks)
- metadata-ingestion/docs/sources/sac/sac_pre.md (1 hunks)
- metadata-ingestion/docs/sources/sac/sac_recipe.yml (1 hunks)
- metadata-ingestion/setup.py (4 hunks)
- metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py (1 hunks)
- metadata-ingestion/src/datahub/utilities/logging_manager.py (1 hunks)
- metadata-ingestion/tests/integration/sac/metadata.xml (1 hunks)
- metadata-ingestion/tests/integration/sac/sac_mces_golden.json (1 hunks)
- metadata-ingestion/tests/integration/sac/test_sac.py (1 hunks)
- metadata-service/war/src/main/resources/boot/data_platforms.json (1 hunks)
Files skipped from review due to trivial changes (1)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts
Files skipped from review as they are similar to previous changes (14)
- datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts
- datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx
- datahub-web-react/src/app/ingest/source/builder/constants.ts
- datahub-web-react/src/app/ingest/source/builder/sources.json
- datahub-web-react/src/app/ingest/source/conf/sac/sac.ts
- datahub-web-react/src/app/ingest/source/conf/sources.tsx
- metadata-ingestion/docs/sources/sac/sac_recipe.yml
- metadata-ingestion/setup.py
- metadata-ingestion/src/datahub/ingestion/source/sac/sac.py
- metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py
- metadata-ingestion/src/datahub/utilities/logging_manager.py
- metadata-ingestion/tests/integration/sac/metadata.xml
- metadata-ingestion/tests/integration/sac/sac_mces_golden.json
- metadata-ingestion/tests/integration/sac/test_sac.py
Additional context used
LanguageTool
metadata-ingestion/docs/sources/sac/sac_pre.md
[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...(ENGLISH_WORD_REPEAT_RULE)
Additional comments not posted (9)
metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py (2)
21-23
: New constants for SAP Analytics Cloud added toDatasetSubTypes
.The new constants
SAC_MODEL
,SAC_IMPORT_DATA_MODEL
, andSAC_LIVE_DATA_MODEL
are correctly defined and consistent with existing constants.
77-80
: New constants for SAP Analytics Cloud added toBIAssetSubTypes
.The new constants
SAC_STORY
andSAC_APPLICATION
are correctly defined and consistent with existing constants.metadata-ingestion/docs/sources/sac/sac_pre.md (6)
30-36
: Concept mapping is accurate.The mappings between SAP Analytics Cloud and DataHub concepts are accurate and consistent.
40-40
: Remove comma before 'because'.The comma before "because" is unnecessary and should be removed.
44-44
: Add comma after 'Currently'.A comma is needed after "Currently".
45-45
: Separate 'therefore' from the sentence.The word "therefore" should be separated from the sentence.
1-1
: Possible typo: repeated word "Access".There seems to be a repeated word "Access" in the configuration notes.
- - Access: + -Likely invalid or redundant comment.
5-5
: Possible typo: repeated word "Access".There seems to be a repeated word "Access" in the configuration notes.
- - Access: + -Likely invalid or redundant comment.
Tools
LanguageTool
[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...(ENGLISH_WORD_REPEAT_RULE)
metadata-service/war/src/main/resources/boot/data_platforms.json (1)
669-676
: New entry for SAP Analytics Cloud added.The new entry for SAP Analytics Cloud is correctly defined and consistent with existing entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly focused on the python code in this pass - this is looking pretty good
@@ -471,6 +477,7 @@ | |||
"fivetran": snowflake_common | bigquery_common | sqlglot_lib, | |||
"qlik-sense": sqlglot_lib | {"requests", "websocket-client"}, | |||
"sigma": sqlglot_lib | {"requests"}, | |||
"sac": sac, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo sac isn't a sufficiently well-know abbreviation - let's name the source sap-analytics-cloud
or sap-sac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in context of SAP, SAC ist a common abbreviation for SAP Analytics Cloud...if I search for SAC in Google or Bing in an incognito window, besides the Swiss Alpine Club (😂), the next results are for the SAP Analytics Cloud (but of course there is still some kind of geo localization and the results are probably specific for Germany).
-> sap-sac is definitely wrong (the S in SAC stands for SAP) and sap-analytics-cloud is too long...
As the abbreviation would be also used for the platform urn, I would like to stick with SAC...;-)
|
||
# SAP Analytics Cloud | ||
SAC_STORY = "Story" | ||
SAC_APPLICATION = "Application" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
env: PROD | ||
``` | ||
|
||
The key in the connection mapping is the technical name of the connection resp. its id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bit confused by this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it a bit, not sure if it is less confusing then...
You have to maintain the connections to the different systems in the SAC...there you have to specify a name: https://developers.sap.com/tutorials/hana-cloud-analytics-cloud-consume-2/jcr:content.github-proxy.1633841405.file/step7-I7.png
The key in the connection mapping dictionary is this name and is used to find the connection in the dictionary...
platform_instance = connection.platform_instance | ||
env = connection.env | ||
else: | ||
platform = model.system_type.lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to print some debug logs here? e.g. no connection_mapping found, doing best-effort inference
or are the logs below sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a log with info level (I think debug is not enough?) - besides the log message you would notice that the best-effort approach was used, because there will be platform instances which are named after the connections in SAC...
elif data_type in ("decimal", "int32"): | ||
return SchemaFieldDataTypeClass(type=NumberTypeClass()) | ||
else: | ||
logger.warning(f"Unknown data type {data_type} found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.warning(f"Unknown data type {data_type} found") | |
logger.info(f"Unknown data type {data_type} found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have replaced the log with report_warning -> I think it is better to have this as a warning...usually there are not many possibilities for the data types in import data models - in case there will be ever an unknown data type, we should know it to enhance the method accordingly...I am affraid a log with info level will be missed easily...
if property_type == "DATE": | ||
return SchemaFieldDataTypeClass(type=DateTypeClass()) | ||
else: | ||
if data_type == "string": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so property_type is capitalized and data_type is lower case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we shouldn't really question things like this...SAP and REST APIs is a "complicated" topic (at least in context of SAC)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- metadata-ingestion/docs/sources/sac/sac_pre.md (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
- metadata-ingestion/src/datahub/utilities/logging_manager.py (1 hunks)
- metadata-ingestion/tests/integration/sac/test_sac.py (1 hunks)
Files skipped from review as they are similar to previous changes (3)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac.py
- metadata-ingestion/src/datahub/utilities/logging_manager.py
- metadata-ingestion/tests/integration/sac/test_sac.py
Additional context used
LanguageTool
metadata-ingestion/docs/sources/sac/sac_pre.md
[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~12-~12: Possible missing comma found.
Context: ...oud to platforms, platform instances or environments theconnection_mapping
configuration ...(AI_HYDRA_LEO_MISSING_COMMA)
Additional comments not posted (7)
metadata-ingestion/docs/sources/sac/sac_pre.md (7)
40-40
: Remove comma before 'because'The comma before "because" is unnecessary and should be removed.
41-41
: Remove comma before 'because'The comma before "because" is unnecessary and should be removed.
42-42
: Remove comma before 'because'The comma before "because" is unnecessary and should be removed.
43-43
: Remove comma before 'because'The comma before "because" is unnecessary and should be removed.
44-44
: Add comma after 'Currently'A comma is needed after "Currently".
45-45
: Separate 'therefore' from the sentenceThe word "therefore" should be separated from the sentence.
3-3
: Possible typo: repeated wordThe word "the" might be repeated here.
- The OAuth client is required to have the following properties: + The OAuth client is required to have the following properties:Likely invalid or redundant comment.
@hsheth2 I have noticed, that the ingestion of the schema metadata of Import Data Models is extremely slow (20 seconds vs. 15 minutes with our productive SAC tenant), because the Import Data Service of the SAC is extremely slow...therefore I have added a config (by default it is enabled) to be able to disable the ingestion of schema metadata. Additionally the Import Data Service of the SAC is frequently returning an 500 Internal Server Error...therefore I have also added a HTTPAdapter with a Retry to the Session object...ingesting the schema metadata is relatively new (I have implemented it before creating the PR, while we are using the rest of the source for some time): In the last two weeks the Internal Server Error already occurred three times, not sure if the Retry will help to solve this issue, but we will see... |
…odels and Retry to session HTTPAdapter
b43b6e0
to
74e0c7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac.py
aspect=BrowsePathsV2Class( | ||
path=[ | ||
BrowsePathEntryClass(id=folder_name) | ||
for folder_name in resource.ancestor_path.split("/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any danger of folder names having /
in them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/
are replaced with %2F
, when the folders which are returned by the API are evaluated:
datahub/metadata-ingestion/src/datahub/ingestion/source/sac/sac.py
Lines 620 to 623 in 74e0c7f
ancestors = json.loads(entity.ancestorPath) | |
ancestor_path = "/".join( | |
ancestor.replace("/", "%2F") for ancestor in ancestors | |
) |
The replacement was originally in there because of the browse paths v1, which are still ingested along the browse paths v2 - do we need the v1 browse paths anymore or should I remove them? - ...in the old browse paths the slashes were used to separate the levels, I don't think the replacement is needed for v2 anymore...? However for the config.folder_pattern it's probably easier if the folders are joined using the slash inkl. the replacement of slashes in folder names...:/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need v1 browse paths anymore, but I agree that having this around makes folder_pattern
simpler so it's fine to leave it in
|
||
if (b.name === 'custom') { | ||
return -1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just to push the "custom" button to the bottom, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Outside diff range, codebase verification and nitpick comments (4)
metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (4)
282-374
: Consider adding debug logs for traceability.Adding debug logs in the
get_resource_workunits
method can help trace the generation of workunits and diagnose issues more effectively.logger.debug(f"Generating workunits for resource: {resource.resource_id}")
585-686
: Add error handling for API calls.Consider adding error handling for API calls in the
get_resources
method to enhance robustness and manage potential exceptions.try: entities = self.client.entity_sets.Resources.get_entities()... except Exception as e: logger.error(f"Failed to retrieve resources: {e}")
688-697
: Add error handling for API call.Consider adding error handling for the API call in
get_import_data_model_ids
to manage potential exceptions and improve robustness.try: response = self.session.get(...) response.raise_for_status() except requests.RequestException as e: logger.error(f"Failed to retrieve import data model IDs: {e}")
699-724
: Add error handling for API call.Consider adding error handling for the API call in
get_import_data_model_columns
to enhance robustness and manage potential exceptions.try: response = self.session.get(...) response.raise_for_status() except requests.RequestException as e: logger.error(f"Failed to retrieve columns for model {model_id}: {e}")
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
Additional comments not posted (10)
metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (10)
93-106
: LGTM!The
ConnectionMappingConfig
class is well-defined and uses Pydantic'sField
appropriately.
109-167
: LGTM!The
SACSourceConfig
class is well-structured, with comprehensive configuration options and a useful validator for URLs.
200-205
: LGTM!The
__init__
method correctly initializes the SACSource with configuration and context.
207-209
: LGTM!The
close
method properly manages resource cleanup by closing the session and calling the superclass'sclose
method.
211-214
: LGTM!The
create
method is well-implemented, using a factory pattern to instantiateSACSource
.
217-238
: LGTM!The
test_connection
method is robust, with proper error handling and connectivity checks.
240-250
: LGTM!The
get_workunit_processors
method effectively extends the base processors with additional functionality for incremental lineage and stale entity removal.
279-280
: LGTM!The
get_report
method correctly returns the source report.
544-583
: LGTM!The
get_sac_connection
method is well-implemented, with robust retry logic and proper session configuration.
771-775
: LGTM!The
_add_sap_sac_custom_auth_header
function is simple and correctly adds the custom authentication header.
class SACSource(StatefulIngestionSourceBase, TestableSource): | ||
config: SACSourceConfig | ||
report: SACSourceReport | ||
platform = "sac" | ||
|
||
session: OAuth2Session | ||
client: pyodata.Client | ||
|
||
ingested_dataset_entities: Set[str] = set() | ||
ingested_upstream_dataset_keys: Set[str] = set() | ||
|
||
def __init__(self, config: SACSourceConfig, ctx: PipelineContext): | ||
super().__init__(config, ctx) | ||
self.config = config | ||
self.report = SACSourceReport() | ||
|
||
self.session, self.client = SACSource.get_sac_connection(self.config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider refactoring lengthy methods.
The SACSource
class is well-organized, but some methods, such as get_workunits_internal
, are lengthy and could be broken down into smaller helper methods for better readability and maintainability.
def get_workunits_internal(self) -> Iterable[MetadataWorkUnit]: | ||
if self.config.ingest_stories or self.config.ingest_applications: | ||
resources = self.get_resources() | ||
|
||
for resource in resources: | ||
datasets = [] | ||
|
||
for resource_model in resource.resource_models: | ||
dataset_urn = make_dataset_urn_with_platform_instance( | ||
platform=self.platform, | ||
name=f"{resource_model.namespace}:{resource_model.model_id}", | ||
platform_instance=self.config.platform_instance, | ||
env=self.config.env, | ||
) | ||
|
||
if dataset_urn not in datasets: | ||
datasets.append(dataset_urn) | ||
|
||
if dataset_urn in self.ingested_dataset_entities: | ||
continue | ||
|
||
self.ingested_dataset_entities.add(dataset_urn) | ||
|
||
yield from self.get_model_workunits(dataset_urn, resource_model) | ||
|
||
yield from self.get_resource_workunits(resource, datasets) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor get_workunits_internal
for clarity.
The get_workunits_internal
method is lengthy and could be refactored into smaller helper methods to enhance readability and maintainability.
namespace: Optional[str] = None | ||
if len(schema) < len(namespace_with_schema): | ||
namespace = namespace_with_schema[len(f"{schema}.") :] | ||
view = view[:-1] | ||
upstream_dataset_name = self.get_view_name(schema, namespace, view) | ||
|
||
if upstream_dataset_name is not None: | ||
if model.connection_id in self.config.connection_mapping: | ||
connection = self.config.connection_mapping[model.connection_id] | ||
platform = ( | ||
connection.platform | ||
if connection.platform | ||
else model.system_type.lower() | ||
) | ||
platform_instance = connection.platform_instance | ||
env = connection.env | ||
else: | ||
platform = model.system_type.lower() | ||
platform_instance = model.connection_id | ||
env = DEFAULT_ENV | ||
|
||
logger.info( | ||
f"No connection mapping found for connection with id {model.connection_id}, connection id will be used as platform instance" | ||
) | ||
|
||
upstream_dataset_urn = make_dataset_urn_with_platform_instance( | ||
platform=platform, | ||
name=upstream_dataset_name, | ||
platform_instance=platform_instance, | ||
env=env, | ||
) | ||
|
||
if upstream_dataset_urn not in self.ingested_upstream_dataset_keys: | ||
mcp = MetadataChangeProposalWrapper( | ||
entityUrn=upstream_dataset_urn, | ||
aspect=dataset_urn_to_key(upstream_dataset_urn), | ||
) | ||
|
||
yield mcp.as_workunit(is_primary_source=False) | ||
|
||
self.ingested_upstream_dataset_keys.add(upstream_dataset_urn) | ||
|
||
mcp = MetadataChangeProposalWrapper( | ||
entityUrn=dataset_urn, | ||
aspect=UpstreamLineageClass( | ||
upstreams=[ | ||
UpstreamClass( | ||
dataset=upstream_dataset_urn, | ||
type=DatasetLineageTypeClass.COPY, | ||
), | ||
], | ||
), | ||
) | ||
|
||
yield mcp.as_workunit() | ||
else: | ||
self.report.report_warning( | ||
"unknown-upstream-dataset", | ||
f"Unknown upstream dataset for model with id {model.namespace}:{model.model_id} and external id {model.external_id}", | ||
) | ||
elif model.system_type is not None: | ||
self.report.report_warning( | ||
"unknown-system-type", | ||
f"Unknown system type {model.system_type} for model with id {model.namespace}:{model.model_id} and external id {model.external_id}", | ||
) | ||
|
||
mcp = MetadataChangeProposalWrapper( | ||
entityUrn=dataset_urn, | ||
aspect=StatusClass( | ||
removed=False, | ||
), | ||
) | ||
|
||
yield mcp.as_workunit() | ||
|
||
if model.external_id and model.connection_id and model.system_type: | ||
type_name = DatasetSubTypes.SAC_LIVE_DATA_MODEL | ||
elif model.is_import: | ||
type_name = DatasetSubTypes.SAC_IMPORT_DATA_MODEL | ||
else: | ||
type_name = DatasetSubTypes.SAC_MODEL | ||
|
||
mcp = MetadataChangeProposalWrapper( | ||
entityUrn=dataset_urn, | ||
aspect=SubTypesClass( | ||
typeNames=[type_name], | ||
), | ||
) | ||
|
||
yield mcp.as_workunit() | ||
|
||
mcp = MetadataChangeProposalWrapper( | ||
entityUrn=dataset_urn, | ||
aspect=DataPlatformInstanceClass( | ||
platform=make_data_platform_urn(self.platform), | ||
instance=self.config.platform_instance, | ||
), | ||
) | ||
|
||
yield mcp.as_workunit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor get_model_workunits
for readability.
The get_model_workunits
method is comprehensive but lengthy. Consider breaking it into smaller helper methods to improve readability and maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/setup.py (4 hunks)
Additional comments not posted (3)
metadata-ingestion/setup.py (3)
311-315
: Verify the necessity and compatibility ofsac
dependencies.Ensure that the added dependencies "requests", "pyodata>=1.11.1", and "Authlib" are necessary for the SAC integration and compatible with existing dependencies.
Verification successful
Dependencies for
sac
are used and necessaryThe
sac
dependencies "requests", "pyodata>=1.11.1", and "Authlib" are utilized in various parts of the codebase, confirming their necessity for the SAC integration. Specifically,pyodata
andAuthlib
are used in thesac
source implementation, whilerequests
is widely used across the codebase. No conflicts were identified.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the necessity and compatibility of `sac` dependencies. # Test: Check for any conflicting dependencies in the project. # Expect: No conflicting dependencies. pip checkLength of output: 82
Script:
#!/bin/bash # Description: Search for usage of "requests", "pyodata", and "Authlib" in the codebase. # Search for "requests" usage rg "requests" -A 3 # Search for "pyodata" usage rg "pyodata" -A 3 # Search for "Authlib" usage rg "Authlib" -A 3Length of output: 258418
734-734
: Verify the correctness of thesac
entry point definition.Ensure that the
sac
entry point is correctly defined and points to the intended module and classSACSource
.Verification successful
The
sac
entry point is correctly defined.The
SACSource
class is present in the expected moduledatahub.ingestion.source.sac.sac
, confirming that the entry point insetup.py
is accurate. No further action is needed.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the correctness of the `sac` entry point definition. # Test: Check for the existence and correctness of the `SACSource` class in the specified module. # Expect: The `SACSource` class should be defined in the `datahub.ingestion.source.sac.sac` module. ast-grep --lang python --pattern $'class SACSource { $$$ }'Length of output: 506
Script:
#!/bin/bash # Description: Verify the correctness of the `sac` entry point definition. # Test: Correctly check for the existence of the `SACSource` class in the specified module. # Expect: The `SACSource` class should be defined in the `datahub.ingestion.source.sac.sac` module. ast-grep --lang python --pattern 'class SACSource:'Length of output: 53
Script:
#!/bin/bash # Description: Search for the `SACSource` class across the entire codebase. # Test: Use `rg` to find any occurrence of the `SACSource` class. # Expect: Locate the definition of `SACSource` if it exists anywhere in the repository. rg 'class SACSource' --type pyLength of output: 349
480-480
: Verify the integration and naming conventions of thesac
plugin.Ensure that the
sac
plugin is correctly integrated into theplugins
dictionary and follows the project's naming conventions.
@Masterchen09 looks like the cypress tests that are failing are |
@hsheth2 Thanks for mentioning it, I didn't noticed that some tests are failing. For I have currently no idea regarding |
@hsheth2 The fix for
This is not the reason for the first attempt to fail, only the reason for the second and third attempt...the error message is caused by the failing first attempt, because the secret already exists and will not be created in the second and third attempt - the reason for failing the first attempt is the same reason why There is also another fix for the frozen dataclasses...from a functional point view it makes no difference, but is not correct to have a set in a frozen dataclass - now it's a frozenset. ;-) |
8870594
to
ee22f17
Compare
@Masterchen09 merged master to fix a separate issue with CI. Let's see if the cypress tests pass now |
CI failures are unrelated (nifi, nested domains), so merging through. |
@hsheth2 As discussed, here is the ingestion source for the SAP Analytics Cloud with everything included (docs, tests, configuration in the UI, ...).
Two remarks regarding two changes which are also included in this PR, but have nothing to do with the SAP Analytics Cloud source:
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests