Implements sequence method for SourceInfo in Oracle connector#5998
Implements sequence method for SourceInfo in Oracle connector#5998JuanmaBM wants to merge 1 commit intodebezium:mainfrom
Conversation
|
Hi @JuanmaBM, thanks for your contribution. Please prefix the commit message(s) with the DBZ-xxx JIRA issue key. |
| @Override | ||
| protected String sequence() { | ||
| return Arrays.asList( | ||
| this.scn, | ||
| this.sourceTime != null ? this.sourceTime.toEpochMilli() : null | ||
| ) | ||
| .toString(); | ||
| } |
There was a problem hiding this comment.
Oracle's change time is always second-based precision, so I don't believe it will help identify the order of transactions if they're committed within proximity.
Is there a reason why the transaction metadata topic cannot be used to resolve the order for post-processing?
There was a problem hiding this comment.
I see, so perhaps using SCN and TransactionID as metadata to identify the order would be a better approach?
Regarding your question, I noticed in the feature request that the reporter mentioned they couldn't follow the order of transactions because they occur within milliseconds. They couldn't use the metadata because, as they stated: "the transaction ID in the metadata is alphanumeric and doesn’t allow us to identify the order of transactions with matching timestamps, as it’s not like a sequence generator."
He requested a way to identify the order, and in the comments, an implementation of io.debezium.connector.oracle.SourceInfo.sequence() was proposed to address this.
There was a problem hiding this comment.
Hi @JuanmaBM, in all honesty, you can't rely on SCN either because it's not unique. Oracle can assign up to 16k change vectors to the same SCN on Oracle 12 R2 and 96k on Oracle 19+. This means one or more transactions can share the same SCN.
The Oracle transaction identifier is also unsuitable for ordering since it is a composite value derived from 3 rollback segment attributes, which, when combined, are always unique. The transaction identifier does not monotonically increase either.
One way to achieve this would be to write all table rows to the same topic, where this topic has a single partition. This can be a bottleneck in high-volume traffic scenarios, but you get the ordering guarantee automatically out of the box. In short, the Kafka topic will be chronologically ordered by commits.
Another approach would be to use KStreams to consume the transaction metadata topic, using the transaction commit order from that topic to enrich and write all table-level topic events to a new topic that has a single partition and would be chronologically ordered by commit. In short, you end up with the same type of topic as the above, except you don't take the performance bottleneck of the single partition while producing the data to Kafka; you take it during the post-processing.
The final option would be to consume the transaction metadata and table-level topics as-is and reorder the changes by transaction in Spark as part of the data analysis.
There was a problem hiding this comment.
Hello @Naros
So, if you agree, I think I will close this PR and add this information to the issue to document these approaches. This way, if anyone encounters the same problems, they can use or test them as a solution.
There was a problem hiding this comment.
@JuanmaBM Chris is on PTO this week. The documentation sounds like a good plan!
There was a problem hiding this comment.
@jpechane Oh, I see. Great! I'll go ahead and add the information to the issue, then close the PR. Thanks!
This PR implemented the following feature request: https://issues.redhat.com/projects/DBZ/issues/DBZ-4489?filter=allopenissues