Skip to content

Conversation

@acrylJonny
Copy link
Collaborator

No description provided.

@alwaysmeticulous
Copy link

alwaysmeticulous bot commented May 22, 2025

✅ Meticulous spotted visual differences in 3 of 1318 screens tested, but all differences have already been approved: view differences detected.

Meticulous evaluated ~9 hours of user flows against your PR.

Last updated for commit f42a9ef. This comment will update as new commits are pushed.

@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label May 22, 2025
@codecov
Copy link

codecov bot commented May 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

❌ Unsupported file format

Upload processing failed due to unsupported file format. Please review the parser error message:
Error parsing JUnit XML in /home/runner/work/datahub/datahub/metadata-io/build/test-results/test/TEST-com.linkedin.metadata.graph.search.elasticsearch.SearchGraphServiceElasticSearchTest.xml at 117:1058

Caused by:
RuntimeError: Error converting computed name to ValidatedString

Caused by:
    string is too long</code></pre>

For more help, visit our troubleshooting guide.

:loudspeaker: Thoughts on this report? Let us know!

Map<String, MetadataChangeProposalWrapper> schemaMap) {
List<MetadataChangeProposalWrapper> mcps = new ArrayList<>();

String pipelineName = conf.getOpenLineageConf().getPipelineName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this out to a separate method? This method became quite long

boolean isMergeIntoCommand = job.getName().contains("execute_merge_into_command_edge");
String tableName = null;

// If this is a MERGE INTO command and enhanced extraction is enabled, try to extract the target table name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this as well into a separate method to make this method a bit more digestable

// Method 1: Check for table name in the SQL facet (most reliable)
if (job.getFacets() != null && job.getFacets().getSql() != null) {
String sqlQuery = job.getFacets().getSql().getQuery();
if (sqlQuery != null && sqlQuery.toUpperCase().contains("MERGE INTO")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move MERGE INTO to a constant?

}
}

// Method 3: Check for table identifiers in symlinks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like these methods can be one helper method

if (isMergeIntoCommand && tableName != null && datahubConf.isEnhancedMergeIntoExtraction()) {
// Create modified job names that include the table name
String tablePart = tableName.replace(".", "_").replace(" ", "_").toLowerCase();
String enhancedJobName = job.getName() + "." + tablePart;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does jobname contain the operation (I wonder if we should add merge_into to the job name to identify these jobs


// Try to get table from 'catalogTable' if it exists
try {
Method getCatalogTable = relation.getClass().getMethod("catalogTable");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we identify the class, cast, and use the catalog table method?
I'm afraid of how to detect if Openlineage will rename this method to catalogTable2 or something.

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels May 26, 2025
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels May 29, 2025
@acrylJonny acrylJonny changed the title feat(integration/spark): extract table name on delta merges adding table location to urn feat(integration/spark): extract table name on delta merges adding table location to urn & spark streaming lineage fixes Jun 4, 2025
}

/** Install a Logback appender using reflection. */
private static boolean installLogbackAppender(Object logger) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These looks magical. I hope this works :)

/** Extract the query ID from a log message. */
private static String extractQueryId(String logMessage) {
// The query ID is typically in the format [queryId]
java.util.regex.Pattern pattern = java.util.regex.Pattern.compile("\\[([0-9a-f-]+)\\]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to static to make sure it is recompiled only once?

// Precompiled patterns for better performance
  private static final java.util.regex.Pattern BRACKET_QUERY_ID_PATTERN = 
      java.util.regex.Pattern.compile("\\[([0-9a-f-]+)\\]");
  private static final java.util.regex.Pattern EQUALS_QUERY_ID_PATTERN = 
      java.util.regex.Pattern.compile("id = ([0-9a-f-]+)");

// Extract offset information for commit messages
if (logMessage.contains("Committing offset")) {
java.util.regex.Pattern offsetPattern =
java.util.regex.Pattern.compile("offset: (\\{[^\\}]+\\})");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

metadata.put("logicalPlan", plan);

// Look for table references in the plan
java.util.regex.Pattern tablePattern =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

}
}

// Method 3: Check for table identifiers in symlinks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you split this into multiple methods?

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Jun 5, 2025
@acrylJonny acrylJonny closed this Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants