-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingest): replace sqllineage/sqlparse with our SQL parser #12020
fix(ingest): replace sqllineage/sqlparse with our SQL parser #12020
Conversation
029ed63
to
8f9db11
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable.
Awaiting response on comments and few changes around reporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will also need updates to gx-plugin
"column-timestamp", | ||
] | ||
assert sorted(SqlLineageSQLParser(sql_query).get_tables()) == expected_tables | ||
assert sorted(SqlLineageSQLParser(sql_query).get_columns()) == expected_columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep these tests around, and make it test with the new parser
see
class BigQuerySQLParser: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @hsheth2,
I've updated the tests to align with SQLGlot, and I noticed some changes in the outputs. Could you please verify them?
CC. @mayurinehate
@@ -1,5 +0,0 @@ | |||
Note! The integration can use an SQL parser to try to parse the tables the chart depends on. This parsing is disabled by default, | |||
but can be enabled by setting `parse_table_names_from_sql: true`. The default parser is based on the [`sqllineage`](https://pypi.org/project/sqllineage/) package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we still require them to set parse_table_names_from_sql
right?
let's update the docs here instead of removing them entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we need parse_table_names_from_sql
, Added again
try: | ||
runner = LineageRunner(query) | ||
sql_parser_in_tables = create_lineage_sql_parsed_result( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably be setting the default db / schema when calling create_lineage_sql_parsed_result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @hsheth2 , We don’t need it here since we only require table names. In the previous implementation as well, we didn’t include the database/schema.
tables_list.sort() | ||
assert tables_list == ["my_view.SQL_TABLE_NAME"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change in particular seems problematic - can you look at the git history to understand why this test was added, and what we need to do to actually maintain most of the previous functionality? we'll likely need to tweak one of our sources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @hsheth2
It was added for looker token, but right now its not being used anymore
so technically this Test-Case is invalid
Let me your thought on this ?
CC. @mayurinehate
ans.add(col_ref.column) | ||
return list(ans) | ||
|
||
def get_downstream_columns(self) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like the new get_downstream_columns
is similar in functionality to the original get_columns
? or am I mistaken on that?
overall I'm a bit confused why you added that extra method, when we previously didn't have anything like it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @hsheth2 ,
I added the get_downstream_columns
method to verify what we’re getting in downstream columns since our new parser provides both upstream and downstream column lineage, which wasn’t possible with sqllineage
.
Additionally, I noticed that the get_columns
method from the sqllineage
parser isn’t used anywhere in the source code—it’s only referenced in test cases.
I’ve now removed the get_downstream_columns
method.
103eb28
to
c1d7596
Compare
c1d7596
to
135561c
Compare
Checklist