Skip to content

Commit abd92aa

Browse files
harsha-mandadi-4026shirshanka
authored andcommitted
feat(ingest/s3): support path_specs of different S3 buckets in the same recipe (datahub-project#7514)
1 parent 5d53fdd commit abd92aa

15 files changed

+2547
-43
lines changed

gradle.properties

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,3 @@ org.gradle.internal.repository.initial.backoff=1000
1212

1313
# Needed to publish to Nexus from a sub-module
1414
gnsp.disableApplyOnlyOnRootProjectEnforcement=true
15-

metadata-ingestion/examples/recipes/csv_enricher_to_datahub_rest.dhub.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,4 @@ source:
1313
sink:
1414
type: "datahub-rest"
1515
config:
16-
server: "http://localhost:8080"
16+
server: "http://localhost:8080"

metadata-ingestion/examples/recipes/hana_to_datahub.dhub.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,4 @@ source:
1818
sink:
1919
type: "datahub-rest"
2020
config:
21-
server: "http://localhost:8080"
21+
server: "http://localhost:8080"

metadata-ingestion/examples/recipes/mode_to_datahub.dhub.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,4 @@ source:
1717
sink:
1818
type: "datahub-rest"
1919
config:
20-
server: "http://localhost:8080"
20+
server: "http://localhost:8080"

metadata-ingestion/examples/recipes/okta_to_datahub.dhub.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ source:
66
sink:
77
type: "datahub-rest"
88
config:
9-
server: "http://localhost:8080"
9+
server: "http://localhost:8080"

metadata-ingestion/examples/recipes/s3_to_file.dhub.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,3 @@ sink:
1616
type: "file"
1717
config:
1818
filename: "./s3_data_lake_mces.json"
19-

metadata-ingestion/examples/recipes/snowflake_to_datahub.dhub.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,4 @@ source:
1818
email_domain: mycompany.com
1919

2020
classification:
21-
enabled: True
21+
enabled: True

metadata-ingestion/examples/recipes/tableau_to_datahub.dhub.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,4 @@ source:
1717
sink:
1818
type: "datahub-rest"
1919
config:
20-
server: "http://localhost:8080"
20+
server: "http://localhost:8080"

metadata-ingestion/scripts/datahub_preflight.sh

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,4 +117,3 @@ fi
117117

118118

119119
printf "\n\e[38;2;0;255;0m✅ Preflight was successful\e[38;2;255;255;255m\n"
120-

metadata-ingestion/src/datahub/ingestion/source/s3/config.py

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
from datahub.configuration.validate_field_rename import pydantic_renamed_field
1313
from datahub.ingestion.source.aws.aws_common import AwsConnectionConfig
1414
from datahub.ingestion.source.aws.path_spec import PathSpec
15-
from datahub.ingestion.source.aws.s3_util import get_bucket_name
1615
from datahub.ingestion.source.s3.profiling import DataLakeProfilerConfig
1716

1817
# hide annoying debug errors from py4j
@@ -92,16 +91,6 @@ def check_path_specs_and_infer_platform(
9291
)
9392
guessed_platform = guessed_platforms.pop()
9493

95-
# If platform is s3, check that they're all the same bucket.
96-
if guessed_platform == "s3":
97-
bucket_names = set(
98-
get_bucket_name(path_spec.include) for path_spec in path_specs
99-
)
100-
if len(bucket_names) > 1:
101-
raise ValueError(
102-
f"All path_specs should reference the same s3 bucket. Got {bucket_names}"
103-
)
104-
10594
# Ensure s3 configs aren't used for file sources.
10695
if guessed_platform != "s3" and (
10796
values.get("use_s3_object_tags") or values.get("use_s3_bucket_tags")

0 commit comments

Comments
 (0)