-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch Deployments to new hashing algo w/ collision avoidance mechanism #44774
Conversation
pkg/apis/apps/v1beta1/types.go
Outdated
|
||
// Monotonically increasing counter that tracks hash collisions for | ||
// the Deployment. Used as a collision avoidance mechanism by the | ||
// Deployment controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need this tag in the internal types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we do
// Add uniquifier in the hash if it exists. | ||
if uniquifier != nil { | ||
uniquifierBytes := make([]byte, 8) | ||
binary.PutVarint(uniquifierBytes, *uniquifier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll panic with a larger uniquifier
. Use binary.LittleEndian.PutUint64
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can an int64 integer be larger than 8 bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a side note, I still need to unit test this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some experiment https://play.golang.org/p/3DKSA8wYup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Updated and also added a simple unit test.
We sync status by calling |
Yeah, on my list to fix. |
@janetkuo @kubernetes/sig-apps-pr-reviews this is ready for reviews. I will follow up with unit tests for getNewReplicaSet in a different PR because I want to break down that function into pieces before unit-testing it and I don't want to make reviews harder here. |
@lukaszo can you also have a look? |
pkg/apis/apps/v1beta1/types.go
Outdated
// the Deployment. Used as a collision avoidance mechanism by the | ||
// Deployment controller. | ||
// +optional | ||
Uniquifier *int64 `json:"uniquifier,omitempty" protobuf:"varint,8,opt,name=uniquifier"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we keep this in status? It seems little strange.
Also, who can update status? Only controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting it in status was discussed in the proposal for hash collision. We don't want users to set it or even think about it.
API server can update status as well (if we implement it).
@@ -346,6 +346,9 @@ func ValidateDeploymentStatus(status *extensions.DeploymentStatus, fldPath *fiel | |||
allErrs = append(allErrs, apivalidation.ValidateNonnegativeField(int64(status.ReadyReplicas), fldPath.Child("readyReplicas"))...) | |||
allErrs = append(allErrs, apivalidation.ValidateNonnegativeField(int64(status.AvailableReplicas), fldPath.Child("availableReplicas"))...) | |||
allErrs = append(allErrs, apivalidation.ValidateNonnegativeField(int64(status.UnavailableReplicas), fldPath.Child("unavailableReplicas"))...) | |||
if status.Uniquifier != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Sorry, didn't have time to review this further. Taking a look now. |
test/e2e/deployment.go
Outdated
framework.Logf("cannot get deployment %q: %v", deploymentName, err) | ||
return false, nil | ||
} | ||
framework.Logf("deployment status: %v", d.Status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uniquifier
is a pointer and this won't print its int64
value for debugging. It's easier to debug if field names are printed too.
Use framework.Logf(spew.Sprintf("deployment status: %#v", d.Status))
to address both. You'll need to import "github.com/davecgh/go-spew/spew"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -346,6 +346,9 @@ func ValidateDeploymentStatus(status *extensions.DeploymentStatus, fldPath *fiel | |||
allErrs = append(allErrs, apivalidation.ValidateNonnegativeField(int64(status.ReadyReplicas), fldPath.Child("readyReplicas"))...) | |||
allErrs = append(allErrs, apivalidation.ValidateNonnegativeField(int64(status.AvailableReplicas), fldPath.Child("availableReplicas"))...) | |||
allErrs = append(allErrs, apivalidation.ValidateNonnegativeField(int64(status.UnavailableReplicas), fldPath.Child("unavailableReplicas"))...) | |||
if status.Uniquifier != nil { | |||
allErrs = append(allErrs, apivalidation.ValidateNonnegativeField(*status.Uniquifier, fldPath.Child("uniquifier"))...) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want something like validating that uniquifier won't decrease and will only be increased by 1 at a time in ValidateDeploymentStatusUpdate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strong for tightening this to check that the field increases by 1 - either way if we stabilize to a value, we won't care. Added validation that it won't decrease.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My worry is that some stale Deployment status in cache may cause uniquifier to decrease or increase too much. We need the value to be stable for idempotent RS names. This can be a follow-up.
@@ -3646,7 +3648,7 @@ type updateRsFunc func(d *extensions.ReplicaSet) | |||
func UpdateReplicaSetWithRetries(c clientset.Interface, namespace, name string, applyUpdate updateRsFunc) (*extensions.ReplicaSet, error) { | |||
var rs *extensions.ReplicaSet | |||
var updateErr error | |||
pollErr := wait.PollImmediate(10*time.Millisecond, 1*time.Minute, func() (bool, error) { | |||
pollErr := wait.PollImmediate(1*time.Second, 1*time.Minute, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) create const
for poll interval and timeout so that we have consistent value in all func UpdateXxxWithRetries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to leave this change for a follow-up
I will add the appropriate labels at 12:00 CET tomorrow if there are no comments until then. |
@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this |
I've seen "uniquifier" referred to as "fingerprint" in other apis. Is there significance to the monotonically incrementing or is that an implementation detail? Is it guaranteed dense? |
There is significance in that it's incrementing so in case of a collision we are essenitally changing the algo used to compute a hash and it should be a new one. It is used as a counter of the actual collisions. |
@k8s-bot pull-kubernetes-federation-e2e-gce test this |
I vote for
|
|
PSA: I don't receive ANY github notifications now. If something needs my attention, please resort to other means, such as slack or email. Clayton asked for my opinion via slack. I said I was ok with The fact that it is sequential is not really material, nor strongly guaranteed, since there are scenarios where it could double count or reset, but "count" makes it easy to understand what it's doing. I oppose any name containing "index", "revision", or "generation", that might imply stronger semantics, be confused with other fields of the same names, or otherwise lead to an expectation that the value would appear in labels, annotations, or resource names. |
Please remove the word "monotonically" from the API field descriptions. |
Also: @ncdc suggested hashCollisions, which isn't bad, though I'm not 100% committed to the idea of keeping a hash forever. Just |
I'm concerned that this is kept in Status, but doesn't really pass the litmus test of "could be rebuilt by observation", |
I see your point and we could potentially annotate ReplicaSets with the collisionCounter they were created and reflect the highest adopted value back to the Deployment status but eventually this specific field doesn't need to be rebuilt, mostly due to our defaults (new Deployments in the apps group are defaulted to retaining 3 old ReplicaSets, coupled with fnv it's unlike that we will ever have a collision again). |
Signed-off-by: Michail Kargakis <[email protected]>
Signed-off-by: Michail Kargakis <[email protected]>
Signed-off-by: Michail Kargakis <[email protected]>
Signed-off-by: Michail Kargakis <[email protected]>
Signed-off-by: Michail Kargakis <[email protected]>
Updated to use |
Approving assuming consensus on the name. Adding lgtm based on prior lgtm. |
Automatic merge from submit-queue |
Automatic merge from submit-queue Implement Daemonset history ~Depends on #45867 (the 1st commit, ignore it when reviewing)~ (already merged) Ref kubernetes/community#527 and kubernetes/community#594 @kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews @erictune @kow3ns @lukaszo @Kargakis --- TODOs: - [x] API changes - [x] (maybe) Remove rollback subresource if we decide to do client-side rollback - [x] deployment controller - [x] controller revision - [x] owner ref (claim & adoption) - [x] history reconstruct (put revision number, hash collision avoidance) - [x] de-dup history and relabel pods - [x] compare ds template with history - [x] hash labels (put it in controller revision, pods, and maybe deployment) - [x] clean up old history - [x] Rename status.uniquifier when we reach consensus in #44774 - [x] e2e tests - [x] unit tests - [x] daemoncontroller_test.go - [x] update_test.go - [x] ~(maybe) storage_test.go // if we do server side rollback~ kubectl part is in #46144 --- **Release note**: ```release-note ```
Implements kubernetes/community#477
@kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews
Fixes #29735
Fixes #43948