-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make iptables and ipvs modes of kube-proxy MASQUERADE --random-fully if possible #78547
Conversation
@kubernetes/sig-network-bugs |
/remove-label needs-sig |
/woof |
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @cmluciano |
pkg/proxy/iptables/proxier.go
Outdated
@@ -308,6 +313,16 @@ func NewProxier(ipt utiliptables.Interface, | |||
klog.Warning("missing br-netfilter module or unset sysctl br-nf-call-iptables; proxy may not work as intended") | |||
} | |||
|
|||
masqueradeRandomFully := false | |||
iptVersionString, err := ipt.GetVersion() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we log this error on startup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. Done.
pkg/proxy/iptables/proxier.go
Outdated
masqueradeRandomFully := false | ||
iptVersionString, err := ipt.GetVersion() | ||
if err == nil { | ||
iptVersion, err := utilversion.ParseGeneric(iptVersionString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call is duplicated with the first check on L317.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having trouble understanding the words here; this line invokes a different function than line 317. Can you please elaborate on the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got my lines confused. I meant to comment on L318-328 where we run ParseGeneric a few times in different nested ifs. Can we consolidate these lines and utilize something like version.Compare. It might just be me but I got a little lost walking through all of these nested conditions. I also had to reread the final comparison on L326 a few times before I understood that we are setting that flag to true based on the other conditions.
Perhaps @dcbw or @danwinship will have other thoughts on if it looks ok to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmluciano : in lines 318--328 there are two different strings parsed with utilversion.ParseGeneric
. We coud remove one of those by using version.Compare
--- which does a parse internally --- instead of version.AtLeast
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship : it looks like #78547 (comment) was intended to be attached here, right?
I am a little surprised that pkg/util/iptables
does not cache the version string. Is that deliberate preparation for coping with live kernel patching? Should we continue to eschew caching the version string and things derived from it for that reason?
There is no direct external consumer for the "supports check" bit, so "SupportsRandomFully" (data bit or query function) would be the first of its kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship : it looks like #78547 (comment) was intended to be attached here, right?
oops. yes
I am a little surprised that
pkg/util/iptables
does not cache the version string. Is that deliberate preparation for coping with live kernel patching?
Nothing so planned. Originally it was only needed in one place. Later whoever added the second usage must have felt it would be easier to just call the "get version" function again rather than cache it. it's not like it takes that long to run "iptables --version
", and you only end up doing it a few times anyway...
There is no direct external consumer for the "supports check" bit, so "SupportsRandomFully" (data bit or query function) would be the first of its kind.
There are external consumers (like pkg/proxy/iptables/proxier.go
), they just have to implement it by getting the iptables version and manually comparing it against utiliptables.MinCheckVersion
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have refactored, consolidating the version string parsing and comparison logic.
pkg/proxy/iptables/proxier.go
Outdated
"-A", string(kubePostroutingChain), | ||
"-m", "comment", "--comment", `"kubernetes service traffic requiring SNAT"`, | ||
"-m", "mark", "--mark", proxier.masqueradeMark, | ||
"-j", "MASQUERADE", | ||
}...) | ||
} | ||
if proxier.masqueradeRandomFully { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an e2e to be sure this is parsed correctly when this feature is enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the existing code, if there is any error in any generated rule then (a) none of the generate rules will take effect (and thus the kube-proxy will be completely ineffective) and (b) an error message will be logged.
Presumably there are already e2e tests that exercise kube-proxy functionality. So those will test the rules that the revised code generates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may be the case, could you reference the kube-proxy test that leverages this generated code for the test cases
I'm mainly trying to find unit test that just validates the expected lines added by this flag or an e2e that enables the flag and ensures the baseline iptables functionality is correct. I can imagine that having the e2e for the scale tests would be useful to see if we can replicate the issue without this flag enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code isn't that confusing, though maybe it would be nice to just have pkg/util/iptables/
just check the version once and export a bunch of variables like SupportsCheck
and SupportsRandomFully
for everyone else to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the original point, of testing:
-
I added a unit test that the generated
-j MASQUERADE
includes--random-fully
if the iptables version supports it. -
The existing e2e tests surely test that all the generated rules parse, for the version of iptables where the tests run. If any rule does not parse then none are installed, which causes massive damage --- for example, no client will be able to reach the
kubernetes
Service.
/retest |
ad402d1
to
7740a36
Compare
/test pull-kubernetes-verify |
2 similar comments
/test pull-kubernetes-verify |
/test pull-kubernetes-verify |
/retest |
@danwinship , @cmluciano : I think I have addressed all the comments. Look good? |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cblecker, MikeSpreitzer, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@cblecker : thanks! |
/retest |
…ng if possible Work around Linux kernel bug that sometimes causes multiple flows to get mapped to the same IP:PORT and consequently some suffer packet drops. Also made the same update in kubelet. Also added cross-pointers between the two bodies of code, in comments. Some day we should eliminate the duplicate code. But today is not that day.
89d3019
to
d86d1de
Compare
Dang! I had to push another revision to fix a mis-spelling. |
/lgtm |
thanks again! |
👋 1.16 release lead here. I've been asked to consider this PR to be added to the 1.16 milestone. Can you please answer the following? Does this fix address a bug that was introduced by an enhancement in 1.16? If not, are you planning on getting this cherry-picked in the current supported releases? What are the associated risks of merging this? |
@lachie83 : the bug that this fixes was reported in April, but has been around longer than that. This is a work-around for a Linux kernel deficiency or bug that has been there forever (as far as I know). Yes, cherry-picking this to supported releases would be good. Regarding risks: this is the usual work-around for the kernel issue, and I am not aware of any reported problems with this work-around. |
@lachie83, another thought regarding risks: I am not sure how |
@lachie83 @MikeSpreitzer I'd also comment that this flag has become standard (for a while now) in recent releases of CNI plugins such as Calico and Flannel that create iptables rules. Calico: projectcalico/felix#1901 Edit: considering the "illusive" impact of the underlying issue this flag fixes, I would be very pleased to see this in 1.16 as well as 1.15 🙏 |
Also FYI: Here is an explanation of the problem and how |
That same code structure is in the latest kernel in the 4.X line. |
Thanks for the updates. Can we get issue #76699 appropriately labelled with the correct priority from sig-network so that I can evaluate the urgency of getting this milestoned in 1.16? |
@lachie83 : can you please elaborate on what that means / how to get it done? |
BTW, on further investigation: it looks like in the 5.X kernel they have made this randomization the normal behavior; the |
/milestone v1.16 |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR makes kube-proxy in iptables or ipvs mode include
--random-fully
in the iptables rule it emits to-j MASQUERADE
, if the local version of iptables supports that additional flag. This greatly reduces the problems caused by a Linux kernel bug that bites when multiple flows are started at the same time. Due to this kernel bug the flows can get mapped to the same port, and all but one of them will suffer packet drops. This flag makes the port choice fully random, greatly reducing the probability of this port collision and consequent packet drop.Which issue(s) this PR fixes:
Fixes #76699
Special notes for your reviewer:
Does this PR introduce a user-facing change?: