-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-proxy: filter INPUT as well as OUTPUT #43972
kube-proxy: filter INPUT as well as OUTPUT #43972
Conversation
@k8s-bot gce etcd3 e2e test this |
@k8s-bot cvm gce e2e test this |
The bad news:
The good news: if I then do I think you also want it here: https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/iptables/proxier.go#L793 |
We need to apply filter rules on the way in (nodeports) and out (cluster IPs). Testing here is insufficient to have caught this - will come back for that.
b7894cd
to
9a423b6
Compare
urg, I thought I hit both places. |
re-pushed |
ping @justinsb |
It works for me - thanks for fixing :-) /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: justinsb, thockin
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue |
…72-upstream-release-1.6 Automatic merge from submit-queue Automated cherry pick of #43972 Cherry pick of #43972 on release-1.6. #43972: kube-proxy: filter INPUT as well as OUTPUT ```release-note kube-proxy handling of services with no endpoints now applies to both INPUT and OUTPUT chains, preventing socket leak ```
Commit found in the "release-1.6" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Split KUBE-SERVICES chain to re-shrink the INPUT chain **What this PR does / why we need it**: #43972 added an iptables rule "`-A INPUT -j KUBE-SERVICES`" to make NodePort ICMP rejection work. (Previously the KUBE-SERVICES chain was only run from OUTPUT, not INPUT.) #44547 extended that patch for ExternalIP rejection as well. However, the KUBE-SERVICES chain may potentially have a very large number of ICMP reject rules for plain ClusterIP services (the ones that get run from OUTPUT), and it seems that for some reason the kernel is much more sensitive to the length of the INPUT chain than it is to the length of the OUTPUT chain. So a node that worked fine with kube 1.6 (when KUBE-SERVICES was only run from OUTPUT) might fall over with kube 1.7 (with KUBE-SERVICES being run from both INPUT and OUTPUT). (Specifically, a node with about 5000 ClusterIP reject rules that ran fine with OpenShift 3.6 [kube 1.6] slowed almost to a complete halt with OpenShift 3.7 [kube 1.7].) This PR fixes things by splitting out the "new" part of KUBE-SERVICES (NodePort and ExternalIP reject rules) into a separate KUBE-EXTERNAL-SERVICES chain run from INPUT, and moves KUBE-SERVICES back to being only run from OUTPUT. (So, yes, this assumes that you don't have 5000 NodePort/ExternalIP services, but, if you do, there's not much we can do, since those rules *have* to be run on the INPUT side.) Oh, and I left in the code to clean up the "`-A INPUT -j KUBE-SERVICES`" rule even though we don't generate it any more, so it gets fixed on upgrade. **Release note**: ```release-note Reorganized iptables rules to fix a performance regression on clusters with thousands of services. ``` @kubernetes/sig-network-bugs @kubernetes/rh-networking
Task: https://app.asana.com/0/181015504277729/1108553820840840/f Original Issue: kubernetes#31435 Backported fix: kubernetes#43972 kube-proxy: filter INPUT as well as OUTPUT We need to apply filter rules on the way in (nodeports) and out (cluster IPs). Testing here is insufficient to have caught this - will come back for that. (cherry picked from commit 9a423b6) Original Author: Tim Hockin <[email protected]>
We need to apply filter rules on the way in (nodeports) and out (cluster
IPs). Testing here is insufficient to have caught this - will come back
for that.
Fixes #43969
@justinsb since you have the best repro, can you test? It passes what I think is repro.
@ethernetdan we will want this in 1.6.x
Fix bug with service nodeports that have no backends not being rejected, when they should be. This is not a regression vs v1.5 - it's a fix that didn't quite fix hard enough.