Skip to content

Ingestion for superset failed on LDAP #10566

@fzhan

Description

@fzhan

Describe the bug
Superset is setup with AzureAD only, tried to provide both username and login for AzureAD and had ldap as provider but failed with the follow message: datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the source (superset): 'access_token'

To Reproduce
Steps to reproduce the behavior:

  1. Superset has been setup with AzureAD only access
  2. Config the ingestion with source: type: superset config: connect_uri: 'http://superset.data-platform:8088' display_uri: 'https://bi.company' username: user@company password: password-from-ad provider: ldap
  3. Run the ingestion
  4. See error
[2024-05-22 02:56:36,069] DEBUG    {datahub.entrypoints:206} - Python version: 3.10.13 (main, Jan 17 2024, 06:53:56) [GCC 12.2.0] at /tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/bin/python3 on Linux-5.15.0-94-generic-x86_64-with-glibc2.36
[2024-05-22 02:56:36,069] DEBUG    {datahub.entrypoints:211} - GMS config {'models': {}, 'patchCapable': True, 'versions': {'acryldata/datahub': {'version': 'v0.13.2', 'commit': '0a8ec376b7c6963772a167e08837dce8b480af7c'}}, 'managedIngestion': {'defaultCliVersion': '0.13.1.2', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'timeZone': 'GMT', 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'datahub': {'serverType': 'prod'}, 'noCode': 'true'}
[exec_id=abd62b81-d969-485e-b0cb-d17ca27cb888] 2024-05-22 03:59:28.099623 INFO: Starting execution for task with name=RUN_INGEST
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] Obtaining venv creation lock...
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] Acquired venv creation lock
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] venv is already set up
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] venv setup time = 0 sec
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] This version of datahub supports report-to functionality
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] + exec datahub --debug ingest run -c /tmp/datahub/ingest/abd62b81-d969-485e-b0cb-d17ca27cb888/recipe.yml --report-to /tmp/datahub/ingest/abd62b81-d969-485e-b0cb-d17ca27cb888/ingestion_report.json
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:29,546] DEBUG    {datahub.telemetry.telemetry:286} - Sending init Telemetry
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:30,217] DEBUG    {datahub.telemetry.telemetry:315} - Sending telemetry for function-call
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:30,513] INFO     {datahub.cli.ingest_cli:147} - DataHub CLI version: 0.13.1.2
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:30,589] DEBUG    {datahub.ingestion.sink.datahub_rest:111} - Setting env variables to override config
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:30,590] DEBUG    {datahub.ingestion.sink.datahub_rest:113} - Setting gms config
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:30,590] DEBUG    {datahub.ingestion.run.pipeline:238} - Sink type datahub-rest (<class 'datahub.ingestion.sink.datahub_rest.DatahubRestSink'>) configured
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:30,590] INFO     {datahub.ingestion.run.pipeline:239} - Sink configured successfully. DataHubRestEmitter: configured to talk to http://datahub-datahub-gms:8080
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:30,597] DEBUG    {datahub.ingestion.run.pipeline:313} - Reporter type:file,<class 'datahub.ingestion.reporting.file_reporter.FileReporter'> configured.
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:30,620] INFO     {datahub.ingestion.source.state.stateful_ingestion_base:241} - Stateful ingestion will be automatically enabled, as datahub-rest sink is used or `datahub_api` is specified
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:30,630] DEBUG    {datahub.ingestion.source.state.stateful_ingestion_base:286} - Successfully created datahub state provider.
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:30,671] DEBUG    {datahub.telemetry.telemetry:315} - Sending telemetry for function-call
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:30,975] ERROR    {datahub.entrypoints:201} - Command failed: Failed to configure the source (superset): 'access_token'
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] Traceback (most recent call last):
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 121, in _add_init_error_context
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     yield
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 252, in __init__
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     self.source = source_class.create(
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/datahub/ingestion/source/superset.py", line 220, in create
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     return cls(ctx, config)
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/datahub/ingestion/source/superset.py", line 193, in __init__
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     self.access_token = login_response.json()["access_token"]
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] KeyError: 'access_token'
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] 
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] The above exception was the direct cause of the following exception:
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] 
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] Traceback (most recent call last):
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/datahub/entrypoints.py", line 188, in main
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     sys.exit(datahub(standalone_mode=False, **kwargs))
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     return self.main(*args, **kwargs)
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/click/core.py", line 1078, in main
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     rv = self.invoke(ctx)
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     return _process_result(sub_ctx.command.invoke(sub_ctx))
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     return _process_result(sub_ctx.command.invoke(sub_ctx))
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     return ctx.invoke(self.callback, **ctx.params)
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/click/core.py", line 783, in invoke
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     return __callback(*args, **kwargs)
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 454, in wrapper
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     raise e
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 403, in wrapper
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     res = func(*args, **kwargs)
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 201, in run
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     return future.result()
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 170, in run_ingestion_and_check_upgrade
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     pipeline = Pipeline.create(
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 363, in create
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     return cls(
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 251, in __init__
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     with _add_init_error_context(f"configure the source ({source_type})"):
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     self.gen.throw(typ, value, traceback)
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]   File "/tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 123, in _add_init_error_context
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs]     raise PipelineInitError(f"Failed to {step}: {e}") from e
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the source (superset): 'access_token'
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:30,978] DEBUG    {datahub.entrypoints:203} - DataHub CLI version: 0.13.1.2 at /tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/lib/python3.10/site-packages/datahub/__init__.py
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:30,978] DEBUG    {datahub.entrypoints:206} - Python version: 3.10.13 (main, Jan 17 2024, 06:53:56) [GCC 12.2.0] at /tmp/datahub/ingest/venv-superset-2b9c1ab97dc6cd7f/bin/python3 on Linux-5.15.0-94-generic-x86_64-with-glibc2.36
[abd62b81-d969-485e-b0cb-d17ca27cb888 logs] [2024-05-22 03:59:30,978] DEBUG    {datahub.entrypoints:211} -

Expected behavior
Provide suggestions for Superset ingestion with ldap enabled as only access

Desktop (please complete the following information):

  • OS: [e.g. iOS] Windows
  • Browser [e.g. chrome, safari] Edge
  • Version [e.g. 22] 125

Additional context
Datahub is running on a local k8s cluster, with superset in another namespace.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions