Skip to content

Conversation

@wangzhen127
Copy link
Member

What this PR does / why we need it:
When node lease feature is enabled, kubelet reports node status to api server only if there is some change or it didn't report over last report interval. This is part of KEP-0009, feature #589 and issue #14733.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

kubelet: When node lease feature is enabled, kubelet reports node status to api server only if there is some change or it didn't report over last report interval.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. area/kubeadm area/kubelet sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 12, 2018
@wangzhen127
Copy link
Member Author

/assign @wojtek-t
/assign @yujuhong

@wangzhen127
Copy link
Member Author

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Oct 12, 2018
@neolit123
Copy link
Member

/cc @kubernetes/sig-cluster-lifecycle-pr-reviews

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall - minor comments mostly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is any change or enough time passed from the last sync, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if updatePodCIDR would fail, theoretically node could not change.
But I'm assuming in that case, it doesn't cost us much to report status anyway, so better to be on the safe side.

Maybe worth adding a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic here. PTAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'm not a fan of reflection if it's possible to avoid that.
How about something like that:

if nodeConditionsHaveChanged(originalStatus.Conditions, status.Conditions) {
  return true
}
originalConditions, conditions := originalStatus.Conditions. status.Conditions
originalStatus.Conditions = nil
status.Conditions = nil
defer func() {
  originalStatus.Conditions = originalConditions
  status.Conditions = conditions
}
return !apiequality.Semantic.DeepEqual(originalStatus, status)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Just curious, why is reflection not preferred?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reflection is really slow - this is the reason why we autogenerate a bunch of code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we generating conditions in the same deterministic order?
If so, maybe we can avoid sorting them?

@yujuhong - wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubelet does generate node conditions in a deterministic order, but there are other components that could patch the node status (e.g., NPD).

To be safe, I think we should still sort the conditions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - sgtm. thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the assignment of nodeHasChanged to itself intentional here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the assignment happens after ||. Anyway, I changed the logic here, so the original if statement is obsolete.

@spiffxp
Copy link
Contributor

spiffxp commented Oct 17, 2018

/milestone v1.13
I'm adding this to the v1.13 milestone since it relates to kubernetes/enhancements#589 which is currently planned for v1.13

@k8s-ci-robot k8s-ci-robot added this to the v1.13 milestone Oct 17, 2018
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you still need to compare here if one of them is nil? Am I missing something?...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full version is:

if originalStatus == nil && status == nil {
    return false
} else if originalStatus == nil && status != nil {
    return true
} else if originalStatus != nil && status == nil {
    return true
}

So we still need the comparison here if we use a shorter version of code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels quite unnecessary to invoke apiequality.Semantic.DeepEqual to check for the special cases (one of them is nil). I'd probably go with the code below, but I'll leave the decision to you.

if originalStatus == nil && status == nil {
    return false
}
if originalStatus == nil || status == nil {
    return true
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubelet does generate node conditions in a deterministic order, but there are other components that could patch the node status (e.g., NPD).

To be safe, I think we should still sort the conditions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Member Author

@wangzhen127 wangzhen127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. PTAL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic here. PTAL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the assignment happens after ||. Anyway, I changed the logic here, so the original if statement is obsolete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full version is:

if originalStatus == nil && status == nil {
    return false
} else if originalStatus == nil && status != nil {
    return true
} else if originalStatus != nil && status == nil {
    return true
}

So we still need the comparison here if we use a shorter version of code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Just curious, why is reflection not preferred?

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 18, 2018
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note the change here. We need this return value to know if pod CIDR is actually changed or not.

Copy link
Member Author

@wangzhen127 wangzhen127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt, thanks for the review! I have updated the PR. PTAL

/test pull-kubernetes-node-e2e-alpha
/test pull-kubernetes-e2e-gce-alpha-features

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see your point. Updated the comments to make it more clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update frequency and report frequency are independent according to KEP-0009. For example, when node lease feature is enabled, I think it is possible that one use wants to override update frequency to 20s and still keep the default (1 min) report frequency. In such case, the user would just specify the update frequency and the report frequency should not derive from the update frequency.

/cc @wojtek-t

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryUpdateNodeStatus does not touch label. I don't think we need to check labels here. Also confirmed with @yujuhong offline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditions are merged by keys, where the keys are the type: "patchStrategy:"merge" patchMergeKey:"type"". See

Conditions []NodeCondition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type" protobuf:"bytes,4,rep,name=conditions"`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done every update frequency, i.e., once every 10 sec with default configuration. I updated it to use the copy now.

@k8s-ci-robot k8s-ci-robot requested a review from wojtek-t November 6, 2018 19:59
@wangzhen127
Copy link
Member Author

/retest
/test pull-kubernetes-node-e2e-alpha
/test pull-kubernetes-e2e-gce-alpha-features

@wangzhen127
Copy link
Member Author

/test pull-kubernetes-node-e2e-alpha
/test pull-kubernetes-e2e-gce-alpha-features

@wangzhen127
Copy link
Member Author

I have updated NodeStatusReportFrequency's default value to derive from NodeStatusUpdateFrequency if it is set explicitly. PTAL

/test pull-kubernetes-node-e2e-alpha
/test pull-kubernetes-e2e-gce-alpha-features

@wojtek-t
Copy link
Member

wojtek-t commented Nov 7, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 7, 2018
@wojtek-t
Copy link
Member

wojtek-t commented Nov 7, 2018

/approve

Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

/hold
noted a few documentation issues, @wojtek-t can re-lgtm once those are resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nodeStatusReportFrequency, as this value is used in documentation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document the default uses the explicit nodeStatusUpdateFrequency value if provided, otherwise, 1m

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include the improved documentation about what happens if a status update is computed in this comment as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 7, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, wangzhen127, wojtek-t, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 7, 2018
When node lease feature is enabled, kubelet reports node status to api server
only if there is some change or it didn't report over last report interval.
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 7, 2018
Copy link
Member Author

@wangzhen127 wangzhen127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the documentation. PTAL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@wangzhen127
Copy link
Member Author

/test pull-kubernetes-node-e2e-alpha
/test pull-kubernetes-e2e-gce-alpha-features

@wojtek-t
Copy link
Member

wojtek-t commented Nov 7, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 7, 2018
@wangzhen127
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 7, 2018
@yujuhong
Copy link
Contributor

yujuhong commented Nov 7, 2018

/lgtm

@wangzhen127
Copy link
Member Author

/retest

for i := range conditionsCopy {
originalConditionsCopy[i].LastHeartbeatTime = replacedheartbeatTime
conditionsCopy[i].LastHeartbeatTime = replacedheartbeatTime
if !apiequality.Semantic.DeepEqual(&originalConditionsCopy[i], &conditionsCopy[i]) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this work if additional conditions got added or removed in the node ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should just work, right?

},
status: &v1.NodeStatus{
Conditions: []v1.NodeCondition{readyCondition, memoryPressureCondition},
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a UT for a condition getting removed. So original has 2 or 3 and current node status has less or removed ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Do you want to add one? I can help review. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.