Skip to content

Conversation

@yingyingqiqi
Copy link

@yingyingqiqi yingyingqiqi commented Oct 15, 2024

A session can consist of multiple query_id. If multiple SQL commands are used to create several temp tables in a session, only one temp table lineage will be generated. This PR fixes this issue.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Oct 15, 2024
Copy link
Contributor

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly seems reasonable, just had one question

REGEXP_REPLACE(REGEXP_SUBSTR(REGEXP_REPLACE(query_text,'\\\\n','\\n'), '(CREATE(?:[\\n\\s\\t]+(?:temp|temporary))?(?:[\\n\\s\\t]+)table(?:[\\n\\s\\t]+)[^\\n\\s\\t()-]+)', 0, 1, 'ipe'),'[\\n\\s\\t]+',' ',1,'p') AS create_command,
ROW_NUMBER() OVER (
PARTITION BY session_id, query_text
PARTITION BY session_id, query_text, query_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to partition by both?

Suggested change
PARTITION BY session_id, query_text, query_id
PARTITION BY session_id, query_id

@datahub-cyborg datahub-cyborg bot added the pending-submitter-response Issue/request has been reviewed but requires a response from the submitter label Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata pending-submitter-response Issue/request has been reviewed but requires a response from the submitter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants