-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OwnerReferencesPermissionEnforcement ignores pods/status #45747
OwnerReferencesPermissionEnforcement ignores pods/status #45747
Conversation
6497404
to
d6380ec
Compare
/cc @deads2k @smarterclayton @eparis -- i think this should stop the bleeding we are experiencing right now until we understand the root cause better. |
@derekwaynecarr I'm against this. We haven't determined that the kubelet isn't really mutating it. How about a reproducer that uses node permissions with a different user and then update status without any change via yaml to show it. Example of a simple story where kubelet would be modifying the ownerRefs.
Dropping out new metadata fields on status updates because it's deleted practically guarantees GC problems. |
Also, if you just want to clear them out (though getting a reproduction REST request seems really important), simply create a role with the permission to delete pods/status and bind it to the kubelets. Should clear out, but I'd expect that to do bad things to GC since it will allow whatever mutation is happening. |
@deads2k -- i was under the impression that all nodes had been upgraded in this environment. for your scenario, are you basically saying that old kubelets can no longer communicate to new api servers if this plug-in is enabled? isn't that a problem? |
It's another reason why #45552 is a good idea. @lavalamp You could try patching pod/status from the kubelet. |
@deads2k -- this does appear tied to upgrade scenarios, i think its worth asking if users should expect their 1.5 kubelets to stop working when upgrading to 1.6 masters with n-2 version skew. to properly fix this, we would need patches to 1.4, 1.5, 1.6 kubelets. i think we should evaluate taking this change in addition to getting work lined up to change how kubelet sends status. |
This change allows data corruption. This admission plugin caught a data corruption problem by accident. "Fixing" the admission plugin to allow the current actors to corrupt data is a not a reasonable solution. Something like #45826 would prevent the kubelet from accidentally destroying data like this, but the kubelet should be updated because you may not be able to fix the next one as easily. |
@deads2k -- telling users to upgrade old kubelets before upgrading masters seems like a non-tenable solution. i agree we need to fix future kubelets, but requiring to upgrade old kubelets appears to me to be a non-starter. |
@kubernetes/sig-node-bugs --- i added a discussion of the source problem this PR exposed in our next sig meeting. it appears that kubelets cannot safely update status without doing a patch. |
d6380ec
to
c20a33c
Compare
@deads2k -- i updated this to use the whitelist as discussed. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, derekwaynecarr
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
@derekwaynecarr could you link to the bug that's being fixed? IIUC, this change can be reverted when the all kubelet are updated to use patch when updating node status, right? Can you open an issue to remind ourselves to remove the whitelist in future releases? |
Automatic merge from submit-queue (batch tested with PRs 45826, 45747, 45548, 45606, 41766) |
It was found in https://bugzilla.redhat.com/show_bug.cgi?id=1450461 which i think has basically no public data.I'll see what all I can open. |
Commit found in the "release-1.6" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
What this PR does / why we need it:
It makes the
OwnerReferencesPermissionEnforcement
admission plug-in ignore pods/status operations. If a cluster has version skew with its master (1.6) and nodes (1.6-N), the kubelet is doing a pods/status operation and that code path is causing an error because ownerRef has new fields unknown to kubelet.Release note: