Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure desired state of world populator runs before volume reconstructor #44781

Merged
merged 1 commit into from
May 11, 2017

Conversation

wongma7
Copy link
Contributor

@wongma7 wongma7 commented Apr 21, 2017

If the kubelet's volumemanager reconstructor for actual state of world runs before the desired state of world has been populated, the pods in the actual state of world will have some incorrect volume information: namely outerVolumeSpecName, which if incorrect leads to part of the issue here #43515, because WaitForVolumeAttachAndMount searches the actual state of world with the correct outerVolumeSpecName and won't find it so reports 'timeout waiting....', etc. forever for existing pods. The comments acknowledge that this is a known issue

The all sources ready check doesn't work because the sources being ready doesn't necessarily mean the desired state of world populator added pods from the sources. So instead let's put the all sources ready check in the populator, and when the sources are ready, it will be able to populate the desired state of world and make "HasAddedPods()" return true. THEN, the reconstructor may run.

@jingxu97 PTAL, you wrote all of the reconstruction stuff

Fix erroneous FailedSync and FailedMount events being periodically and indefinitely posted on Pods after kubelet is restarted

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 21, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 21, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @wongma7. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Apr 21, 2017
@jingxu97
Copy link
Contributor

@wongma7 Thanks for fixing the issue. Your approach should work, I just has one small concern. If we put allSources.Ready check before loop though the pod list to populate desired state, it might delay the reconciler to start mount the volumes. In case of some error and allSources becomes ready after a long period, desiredStatePopulator will not process any pods and their volumes.

I am thinking maybe we could put allSourcesReady check in the HasAddedPods function. This function will return true only if allSourceReady and populator finishes looping the pod list. In populatorLoopFunc, hasAddedPods is set to false at the begining, and change to true at the end of the function.

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 24, 2017

@jingxu97 in my testing, checking allSourcesReady in HasAddedPods still doesn't guarantee that the desired state of world is ready before reconciler's reconstruction. The timing is quite tight, see this example:

I0424 11:48:09.384200 19213 volume_manager.go:241] The desired_state_of_world populator starts
I0424 11:48:14.399392 19213 reconciler.go:153] Desired state of world has been populated with pods, starting reconstruct state function
I0424 11:48:14.400094 19213 reconciler.go:483] Reconciler sync states: could not find pod information in desired or actual states or pending operation, update it in both states: &{volumeName:kubernetes.io/aws-ebs/aws://us-east-1d/vol-06425954fe553e31d podName:2d103d0c-2905-11e7-ae76-0e06794f72a4 volumeSpec:0xc421083280 outerVolumeSpecName:pvc-291faaf2-2905-11e7-ae76-0e06794f72a4 pod:0xc421ba7200 pluginIsAttachable:true volumeGidValue: devicePath: reportedInUse:false mounter:0xc421babbc0}
I0424 11:48:14.404771 19213 desired_state_of_world_populator.go:286] Added volume "dynamic" (volSpec="pvc-291faaf2-2905-11e7-ae76-0e06794f72a4") for pod "2d103d0c-2905-11e7-ae76-0e06794f72a4" to desired state.

It's like this:

  1. Start kubelet
  2. The populator runs multiple times, but probably doesn't find any pods because sources aren't ready. After x run, HasAddedPods returns true because sources are ready and the populator loop has run many times already. Now that sources are ready, the populator will surely find all pods in the next loop, 100 ms later, but chances are, up until now it hasn't found any pods.
  3. Reconciler is running on the same 100 ms period. It now sees HasAddedPods return true and so runs. But DSW doesn't have all the pods, so some volumes in ASW get wrong information.
  4. Populator runs again and finds pods because sources are ready. This is only <100 ms too late, but still too late, ASW has the wrong information.

I think to avoid a long delay, we could add a 15 seconds timeout where the populator must run if allSourcesReady has not happened by then. And then the reconciler reconstruction should only run after the populator has run once when sources were ready. Let me update the patch...

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 24, 2017

@jingxu97 patch updated, PTAL.

I made it so desired state of world populator is not blocked by sources.AllReady as recommended. I didn't change the HasAddedPods function because as I said in my testing that didn't work out, instead I changed how 'dswp.hasAddedPods' gets set: do it right after the populator has run with all ready sources.

@jingxu97
Copy link
Contributor

Just a small comment, otherwise lgtm.
Thanks for the fix!

if sourcesReady.AllReady() {
defer func() { dswp.hasAddedPods = true }()
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using defer unnecessarily here? How about adding:

if sourcesReady.AllReady() {
			dswp.hasAddedPods = true
		}

line after findAndAddNewPods invocation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the same, because we have to make sure to set dswp.hasAddedPods = true only if sources were ready at the time findAndAddNewPods was called. Otherwise it's possible that sources weren't ready, findAndAddNewPods gets called and finds nothing, and then dswp.hasAddedPods is set to true just one loop too soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, my above comment is addresisng the case where we do

if sourcesReady.AllReady() {
	dswp.hasAddedPods = true
}

after findAndAddNewPods.

if we do it before, there is the possibility that ASW reconciler will run too early, when the populator is still in the middle of adding pods from all the sources.

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 24, 2017

@jingxu97 I don't see your comment, could you try submitting it again? Thanks : )

@@ -80,6 +80,8 @@ type Reconciler interface {
// nodeName - the Name for this node, used by Attach and Detach methods
// desiredStateOfWorld - cache containing the desired state of the world
// actualStateOfWorld - cache containing the actual state of the world
// populatorHasAddedPods - checker for whether the populator has added pods to
// the desiredStateOfWorld yet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in comments, maybe add a little more details such as whether the populator has finished adding pods at least once after all sources are ready

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this comment.

@@ -80,6 +80,9 @@ type Reconciler interface {
// nodeName - the Name for this node, used by Attach and Detach methods
// desiredStateOfWorld - cache containing the desired state of the world
// actualStateOfWorld - cache containing the actual state of the world
// populatorHasAddedPods - checker for whether the populator has finished
// adding pods to the desiredStateOfWorld cache at least once after sources
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the extra empty space?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm copying the other multi-line parameter comments which are indented by 2 spaces if you mean the spaces at the beginning

@@ -94,6 +97,7 @@ func NewReconciler(
nodeName types.NodeName,
desiredStateOfWorld cache.DesiredStateOfWorld,
actualStateOfWorld cache.ActualStateOfWorld,
populatorHasAddedPods func() bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to double check what is the benefit of using a function instead of passing the pointer to desiredStatePopulator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo it's cleaner to expose it via the interface.

And actually this made me think, the bool is not threadsafe, we could read and write to the bool at the same time? I need to add a lock I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need lock before reading/writing that variable. afaict - it is not being called concurrently by any other goroutine.

The volume manager calls dswp.Run and that in turn loops the function infinitely between the sleep durations. I don't see any concurrent execution of dswp.Run ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I added a lock which justifies using a function. It's quite ugly to have to lock, unlock, set and check it forever, when we only need to do it once but yeah, this should prevent any concurrency problem between populator & reconciler at startup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnufied I highly highly doubt it is ever going to happen ,but I don't think it's impossible for dswp.Run's write and reconciler.Run's read to coincide. :/ You're right it would have to happen the first time at startup, after that there's no chance of concurrent read/write.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that reconciler is reading the value. In that case yeah, we perhaps do need the lock.

@euank
Copy link
Contributor

euank commented Apr 25, 2017

/unassign

@wongma7
Copy link
Contributor Author

wongma7 commented Apr 26, 2017

I've updated the patch, *** PTAL.

With the addition of the lock, things got a tad more complicated so I reworked the patch to be absolutely clear about what it's doing. The populator needs to set hasAddedPods only once, then it's done touching that variable and can resume normal operation. Similarly, the reconciler needs to check hasAddedPods only once, then it's done and can resume.

@wongma7 wongma7 force-pushed the outervolumespec branch 3 times, most recently from 41af8ba to bccd11f Compare April 26, 2017 22:35
@wongma7
Copy link
Contributor Author

wongma7 commented Apr 27, 2017

@jingxu97 ping, PTAL when you get a chance :) The spirit of the patch is the same, I just tried to make it clearer & safer prompted by the comments. @saad-ali can you please /approve this PR, you are the only approver in this directory and for some reason the bot decided to pick on somebody from a parent dir.

I would like to get this cherry-picked into 1.6.3 once it's good to go, it is trivial to reproduce and leads to really bad UX IMO

rc.reconciliationLoopFunc(sourcesReady, rc.populatorHasAddedPods())()
return rc.populatorHasAddedPods(), nil
}, stopCh)
wait.Until(rc.reconciliationLoopFunc(sourcesReady, true), rc.loopSleepDuration, stopCh)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose of this change is to avoid keep checking populatorHasAddedPods? Is this very necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. No it's not technically necessary, I thought it would make the code clearer where/why populatroHasAddedPods is getting called since after discussion with gnufied it was not immediately clear. I can trim it back if you want, I could see how this looks like premature optimization :)

// reconstruct process may add incomplete volume information and cause confusion. In addition, if the
// desired state of world has not been populated yet, the reconstruct process may clean up pods' volumes
// that are still in use because desired state of world does not contain a complete list of pods.
if sourcesReady.AllReady() && populatorHasAddedPods && time.Since(rc.timeOfLastSync) > rc.syncDuration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think populatorHasAddedPods returns true indicates sourcesReady.AllReady()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, so I have removed the sourcesReady arg from reconciler.Run

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 9, 2017
@wongma7
Copy link
Contributor Author

wongma7 commented May 9, 2017

please re-add lgtm, (bazel) tests should pass now, thank you!

@jingxu97
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 10, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jingxu97, saad-ali, wongma7

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@wongma7
Copy link
Contributor Author

wongma7 commented May 10, 2017

@k8s-bot bazel test this
W0510 00:49:48.944] fatal: Unable to create '/root/.cache/git/k8s.io/kubernetes/index.lock': File exists.

@k8s-ci-robot
Copy link
Contributor

@wongma7: you can't request testing unless you are a kubernetes member.

In response to this:

@k8s-bot bazel test this
W0510 00:49:48.944] fatal: Unable to create '/root/.cache/git/k8s.io/kubernetes/index.lock': File exists.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@gnufied
Copy link
Member

gnufied commented May 10, 2017

@k8s-bot bazel test this

1 similar comment
@gnufied
Copy link
Member

gnufied commented May 10, 2017

@k8s-bot bazel test this

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 45382, 45384, 44781, 45333, 45543)

@jingxu97
Copy link
Contributor

@wongma7 could you cherrypick the PR to 1.5 and 1.6?

@k8s-cherrypick-bot
Copy link

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

@wongma7
Copy link
Contributor Author

wongma7 commented May 11, 2017 via email

@wongma7
Copy link
Contributor Author

wongma7 commented May 11, 2017

/release-note

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels May 11, 2017
k8s-github-robot pushed a commit that referenced this pull request May 15, 2017
…1-upstream-release-1.6

Automatic merge from submit-queue

Automated cherry pick of #44781

Cherry pick of #44781 on release-1.6.

#44781: Ensure desired state of world populator runs before volume
k8s-github-robot pushed a commit that referenced this pull request Sep 6, 2017
…1-upstream-release-1.5

Automatic merge from submit-queue

Automated cherry pick of #44781

Cherry pick of #44781 on release-1.5.

#44781: Ensure desired state of world populator runs before volume
@DreadPirateShawn
Copy link

Am I reading correctly that the cherry-pick back into Kube 1.6 failed?

So should I expect to see this in Kube 1.7 but no earlier?

@wongma7
Copy link
Contributor Author

wongma7 commented Oct 18, 2017

It was cherry picked to 1.5 and 1.6 both, not sure the exact .z patch number

@DreadPirateShawn
Copy link

Oh? The "Merge pull request" for 1.6 has a red X with hover text "Failure: k8s-merge-robot needs CLA" -- was that a transient issue, and Github simply doesn't update the status to remove the failure text, then?

@DreadPirateShawn
Copy link

Update -- I see the changelog citing this PR here -- https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.6.md#v165

Fair enough. Filing a fresh ticket shortly...

sutinski pushed a commit to hpcloud/kubernetes that referenced this pull request Apr 4, 2018
…tor runs before volume reconstructor)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.