Commit 5bec0a3
sys: fix pidfd leak in UnshareAfterEnterUserns
UnshareAfterEnterUserns() creates a pidfd via os.StartProcess() with
CLONE_PIDFD but fails to close the file descriptor in any code path,
resulting in a file descriptor leak for every container that uses user
namespace isolation.
The leak occurs because:
- The pidfd is created when PidFD field is set in SysProcAttr
- The original defer block only calls PidfdSendSignal() and
pidfdWaitid()
- No code path calls unix.Close(pidfd) to release the file descriptor
This causes one pidfd leak per container launch when user namespace
isolation is enabled (e.g., Kubernetes pods with hostUsers: false). In
production environments with high container churn, this can exhaust the
system's file descriptor limit.
Fix the leak by adding a defer statement immediately after process
creation that ensures unix.Close(pidfd) is always called, regardless of
which code path is taken. This guarantees cleanup even if the function
returns early due to errors or lack of pidfd support.
This follows the same cleanup pattern already established in
core/mount/mount_idmapped_utils_linux.go:getUsernsFD() which properly
closes its pidfd.
Closes: #12166
Signed-off-by: Jose Fernandez <[email protected]>
[Move SupportsPidFD up to handle dupfd in Go 1.23.{0,1} and simplify backport]
Signed-off-by: Wei Fu <[email protected]>1 parent e240415 commit 5bec0a3
1 file changed
Lines changed: 8 additions & 2 deletions
File tree
- pkg/sys
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
38 | 42 | | |
39 | 43 | | |
40 | 44 | | |
| |||
65 | 69 | | |
66 | 70 | | |
67 | 71 | | |
68 | | - | |
| 72 | + | |
69 | 73 | | |
70 | 74 | | |
71 | 75 | | |
| |||
81 | 85 | | |
82 | 86 | | |
83 | 87 | | |
84 | | - | |
| 88 | + | |
85 | 89 | | |
86 | 90 | | |
87 | 91 | | |
88 | 92 | | |
| 93 | + | |
| 94 | + | |
89 | 95 | | |
90 | 96 | | |
91 | 97 | | |
| |||
0 commit comments