-
Notifications
You must be signed in to change notification settings - Fork 573
Only reap zombie stunnel processes #514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wongma7 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
klog.V(4).Infof("Waited for child process %d", childPid) | ||
for _, p := range procs { | ||
reaped := waitIfZombieStunnel(p) | ||
if reaped { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will stop after encountering first zombie process. What happens if there are more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each SIGCHLD received from the channel should result in just one process being reaped. It doesn't matter if the process we are reaping is the same process that got terminated and resulted in us getting a SIGCHLD as long as we always clean up one zombie per SIGCHLD. I'll add a comment that reaping one process per signal received over channel is intentional
github.com/google/uuid v1.1.1 | ||
github.com/kubernetes-csi/csi-test v1.1.1 | ||
github.com/kubernetes-csi/csi-test/v3 v3.1.1 // indirect | ||
github.com/mitchellh/go-ps v0.0.0-20170309133038-4fdf99ab2936 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cleaning up
/lgtm |
Is this a bug fix or adding new feature?
What is this PR about? / Why do we need it?
I suspect our reaper is causing (rarely) mounts to error with 'wait: no child processes'' because it might be catching SIGCHLD and waiting for
mount
child processes before os.exec can wait for them: https://cs.opensource.google/go/go/+/refs/tags/go1.16.6:src/os/exec_unix.go;l=43;drc=refs%2Ftags%2Fgo1.16.6.Since the intention of reaper is to clean up zombie stunnel processes which as explained in code comment "become zombies because their parent is the
// driver process but they get terminated by the efs-utils watchdog process:", this PR limits our reaper to cleaning up those.
Arguably maintaining this reaper and having this reaper constantly reading the process table is overkill. In most cases it should be fine to simply let the zombie processes be, the pid limit is 4194303, but in interest of not changing too much to o quickly I'd rather we try this reaper change before we remove the reaper altogether.
(previous try to simply remove reaper: #509)
What testing is done?
build and deploy image.
use dynamic provisioning example (with my own fs of course)
delete the pod from the example and then check logs.