Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(integration/dagster): dagster connector to emit dagster pipeline #8384

Conversation

shubhamjagtap639
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata docs Issues and Improvements to docs labels Jul 10, 2023
@treff7es
Copy link
Contributor

@shubhamjagtap639, please, can you fix the linter errors?

@shubhamjagtap639
Copy link
Contributor Author

@shubhamjagtap639, please, can you fix the linter errors?

@treff7es There is no linter errors. CI is getting failed bacause of below error:

ERROR: Cannot install apache-airflow[snowflake]==2.0.2, apache-airflow[snowflake]==2.1.0, apache-airflow[snowflake]==2.1.1, apache-airflow[snowflake]==2.1.2, apache-airflow[snowflake]==2.1.3, apache-airflow[snowflake]==2.1.4, apache-airflow[snowflake]==2.2.5, apache-airflow[snowflake]==2.3.0, apache-airflow[snowflake]==2.3.1, apache-airflow[snowflake]==2.3.2, apache-airflow[snowflake]==2.3.3, apache-airflow[snowflake]==2.3.4, apache-airflow[snowflake]==2.4.0, apache-airflow[snowflake]==2.4.1, apache-airflow[snowflake]==2.4.2, apache-airflow[snowflake]==2.4.3, apache-airflow[snowflake]==2.5.0, apache-airflow[snowflake]==2.5.1, apache-airflow[snowflake]==2.5.2, apache-airflow[snowflake]==2.5.3, apache-airflow[snowflake]==2.6.0, apache-airflow[snowflake]==2.6.1, apache-airflow[snowflake]==2.6.2, apache-airflow[snowflake]==2.6.3 and apache-airflow~=2.2.0 because these package versions have conflicting dependencies.

Copy link
Collaborator

@mayurinehate mayurinehate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed only build and test(CI) files. Needs some changes.

- python-version: "3.10"
extraPythonRequirement: "dagster>=1.3.3"
- python-version: "3.10"
extraPythonRequirement: "dagster>=1.3.3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matrix doesn't look right. We've added such test matrix for airflow as different versions of airflow have very different dependencies and some changes in logic.
For dagster, looks like its always dagster<=1.3.3 so extraPythonRequirement is not needed. The only variations are in python-version to run on python 3.7 and python 3.10 - similar to this - https://github.com/datahub-project/datahub/blob/master/.github/workflows/metadata-ingestion.yml#L34.

@@ -392,6 +392,9 @@ def get_long_description():
"powerbi-report-server": powerbi_report_server,
"vertica": sql_common | {"vertica-sqlalchemy-dialect[vertica-python]==0.0.8"},
"unity-catalog": databricks | sqllineage_lib,
"dagster": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be required. I think, we should allow only - pip install acryl-datahub-dagster-plugin and not pip install acryl-datahub[dagster]. Its there in case of airflow for backward compatibility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @hsheth2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup agreed

@@ -647,6 +650,7 @@ def get_long_description():
"file = datahub.ingestion.reporting.file_reporter:FileReporter",
],
"apache_airflow_provider": ["provider_info=datahub_provider:get_provider_info"],
"dagster_provider": ["provider_info=datahub_dagster_plugin:get_provider_info"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this is airflow specific thing and can be removed unless dagster also needs it.

@@ -0,0 +1,48 @@
from dagster import (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required to be included in plugin source code that gets included in pypi package ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Issues and Improvements to docs ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants