-
Notifications
You must be signed in to change notification settings - Fork 14.9k
KAFKA-19975 Fix blocked reassignment completion due to ISR missing replicas #21114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
KAFKA-19975 Fix blocked reassignment completion due to ISR missing replicas #21114
Conversation
e225092 to
5d530e9
Compare
metadata/src/main/java/org/apache/kafka/controller/PartitionReassignmentReplicas.java
Show resolved
Hide resolved
|
|
||
| @Test | ||
| public void testDoesNotCompleteReassignmentIfIsrDoesNotHaveAllTargetReplicas() { | ||
| public void testCompleteReassignmentIfIsrDoesNotHaveAnExistingTargetReplica() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit on name: testCanCompleteReassignment....
ahuang98
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the changes! just requesting for maybe one or two more test cases. otherwise lgtm
|
|
||
| @Test | ||
| public void testDoesNotCompleteReassignmentIfIsrDoesNotHaveAllTargetReplicas() { | ||
| public void testCanCompleteReassignmentIfIsrDoesNotHaveAnExistingTargetReplica() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we test when replica 0 is missing as well? when both replica 0 and replica 1 are missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added testCanCompleteReassignmentIfIsrDoesNotHaveBothExistingTargetReplica, should be sufficient.
junrao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgoyal-confluent : Thanks for the PR. Just a minor comment.
|
|
||
| // Replica 1 is not in sync | ||
| Optional<PartitionReassignmentReplicas.CompletedReassignment> reassignmentOptional = | ||
| replicas.maybeCompleteReassignment(List.of(0, 2, 3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just use {3} as the target ISR as in the original test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the newly added testCanCompleteReassignmentIfIsrDoesNotHaveBothExistingTargetReplica test cover the case?
ahuang98
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the improvement Deepak!
The reassignment completion logic currently requires that all target replicas be present in the in-sync replica set (ISR). If an unhealthy broker exists in both the current and target replica sets (i.e., it is neither added nor removed during the reassignment), it still blocks the reassignment from completing.
For example, the following reassignments would fail to complete if replica 2 were unhealthy and unable to join the ISR, even though completing the reassignment would not reduce the number of replicas in the ISR:
We should allow reassignments to complete if there are replicas in the target replica set that are not in the ISR and are not part of the adding set. This is safe because completing the reassignment would not decrease the ISR size. However, if the target replica set is smaller than the current replica set, we will continue to require that all replicas in the target set are in the ISR to prevent the reassignment from unintentionally shrinking the ISR.