Skip to content

Implements sequence method for SourceInfo in Oracle connector#5998

Closed
JuanmaBM wants to merge 1 commit intodebezium:mainfrom
JuanmaBM:DBZ-4489
Closed

Implements sequence method for SourceInfo in Oracle connector#5998
JuanmaBM wants to merge 1 commit intodebezium:mainfrom
JuanmaBM:DBZ-4489

Conversation

@JuanmaBM
Copy link
Contributor

This PR implemented the following feature request: https://issues.redhat.com/projects/DBZ/issues/DBZ-4489?filter=allopenissues

@github-actions
Copy link

Hi @JuanmaBM, thanks for your contribution. Please prefix the commit message(s) with the DBZ-xxx JIRA issue key.

Comment on lines +193 to +200
@Override
protected String sequence() {
return Arrays.asList(
this.scn,
this.sourceTime != null ? this.sourceTime.toEpochMilli() : null
)
.toString();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oracle's change time is always second-based precision, so I don't believe it will help identify the order of transactions if they're committed within proximity.

Is there a reason why the transaction metadata topic cannot be used to resolve the order for post-processing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so perhaps using SCN and TransactionID as metadata to identify the order would be a better approach?

Regarding your question, I noticed in the feature request that the reporter mentioned they couldn't follow the order of transactions because they occur within milliseconds. They couldn't use the metadata because, as they stated: "the transaction ID in the metadata is alphanumeric and doesn’t allow us to identify the order of transactions with matching timestamps, as it’s not like a sequence generator."

He requested a way to identify the order, and in the comments, an implementation of io.debezium.connector.oracle.SourceInfo.sequence() was proposed to address this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JuanmaBM, in all honesty, you can't rely on SCN either because it's not unique. Oracle can assign up to 16k change vectors to the same SCN on Oracle 12 R2 and 96k on Oracle 19+. This means one or more transactions can share the same SCN.

The Oracle transaction identifier is also unsuitable for ordering since it is a composite value derived from 3 rollback segment attributes, which, when combined, are always unique. The transaction identifier does not monotonically increase either.

One way to achieve this would be to write all table rows to the same topic, where this topic has a single partition. This can be a bottleneck in high-volume traffic scenarios, but you get the ordering guarantee automatically out of the box. In short, the Kafka topic will be chronologically ordered by commits.

Another approach would be to use KStreams to consume the transaction metadata topic, using the transaction commit order from that topic to enrich and write all table-level topic events to a new topic that has a single partition and would be chronologically ordered by commit. In short, you end up with the same type of topic as the above, except you don't take the performance bottleneck of the single partition while producing the data to Kafka; you take it during the post-processing.

The final option would be to consume the transaction metadata and table-level topics as-is and reorder the changes by transaction in Spark as part of the data analysis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Naros

So, if you agree, I think I will close this PR and add this information to the issue to document these approaches. This way, if anyone encounters the same problems, they can use or test them as a solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuanmaBM Chris is on PTO this week. The documentation sounds like a good plan!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpechane Oh, I see. Great! I'll go ahead and add the information to the issue, then close the PR. Thanks!

@JuanmaBM JuanmaBM closed this Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants