-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dockershim hostport manager should use HostIP #94382
Conversation
/assign @thockin @danwinship @dcbw I need to check if there is some job exercising this code with e2e tests, lot of jobs were move to containerd and (I think) using the CNI portmap plugin (at least the KIND ones) |
I am frankly surprised this works at all in the scheduler - does it? Do we really want it to? What's the use case? This seems verrrrry subtle. |
There is a conformance test covering exactly this use case kubernetes/test/e2e/scheduling/predicates.go Lines 654 to 660 in 62b8fae
It seems that works (at least in CRIO cri-o/cri-o#4130 (comment)) because, after the first attempt to allocate the HostPort, it removes ALL sockets listening in that port, so the second attempt works |
I can't find any job in the CI running this code :/ I was reading the issue tracking the hostport implementation #35457, and it seems that this was temporary until the CNI plugins implements the portmapping, @thockin do you think that does not make sense to pursue this and let it as is, fact is nobody noticed this based on a quick github search. If we don fix it here., @danwinship I can make it a package in crio instead of vendoring it |
Well, besides the fact that we know some people are vendoring this code, there's also the fact that it means that people using kubernetes with dockershim get non-conforming behavior. The fact that we don't currently run any CI jobs in that configuration doesn't change the fact that it's broken. The fact that the conformance test fails to accurately test the behavior it's supposed to be testing is... interesting. We should probably fix the conformance test in 1.20 (by making it check that the first hostport still connects to the first pod after the second pod has been created) but not backport the fix, so as to not retroactively de-conform systems that previously passed conformance in earlier releases. At least, I think that's the right approach? |
@@ -35,13 +35,15 @@ func TestOpenCloseHostports(t *testing.T) { | |||
podPortMapping *PodPortMapping | |||
expectError bool | |||
}{ | |||
// no portmaps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're going to identify all the test cases you might as well identify them as strings in the test cases so you can then print the description with the error when an error occurs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming is not useful here because the test cases are not independent. Personally, I don't like using this test structure but didn't want to change the whole test,so I carried over with current status. At least the comments explain better what is supposed to happen
for _, tc := range openPortCases { | ||
mapping, err := manager.openHostports(tc.podPortMapping) | ||
for hostport, socket := range mapping { | ||
manager.hostPortMap[hostport] = socket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed here? This seems like something that should either happen in the actual manager code, or else not be needed.
If it's just for the counting below, then that's really only testing the behavior of the test program, not the behavior of the manager...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was actually a bug in the test.
The same loop to add the open sockets to the hostPortMap map does happen in the manager.Add() method, https://github.com/kubernetes/kubernetes/blob/12fc5e50b9394cc3fa498b24a96c894d96b3bff2/pkg/kubelet/dockershim/network/hostport/hostport_manager.go#L107-L114
We need this because later we test the closeHostports()
method that iterates over that map https://github.com/kubernetes/kubernetes/blob/12fc5e50b9394cc3fa498b24a96c894d96b3bff2/pkg/kubelet/dockershim/network/hostport/hostport_manager.go#L310-L327
in
for _, tc := range closePortCases {
err := manager.closeHostports(tc.portMappings)
if we don't populate the map the closeHostports will always succeed because it's an empty map.
I've also added the counting below as a safety check and to make it easier to debug
I did some quick tests on my own cluster and, shock, it does seem to work.
I dislike this very much. It seems like a great opportunity to make a
mistake (honestly, hostIP at all seems poorly thought out but that dates to
2014, sigh).
So the choice is really a) fix it in dockershim to match containerd or b)
don't allow same-port-different-IP between pods.
I want to say B, but I suspect the answer is A. Can we document this
behavior on the API types.go, please?
|
"-m", "comment", "--comment", fmt.Sprintf(`"%s hostport %d"`, podFullName, pm.HostPort), | ||
"-m", protocol, "-p", protocol, | ||
"-j", "DNAT", fmt.Sprintf("--to-destination=%s", hostPortBinding)) | ||
if pm.HostIP == "" || pm.HostIP == "0.0.0.0" || pm.HostIP == "::" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does kubelet know to use iptables for IPv6 if HostIP is ::
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubelet is ALWAYS single stack as today, and its IP family depends on the nodeIP,
kubernetes/pkg/kubelet/kubelet.go
Lines 468 to 473 in c236285
parsedNodeIP := net.ParseIP(nodeIP) | |
protocol := utilipt.ProtocolIPv4 | |
if utilnet.IsIPv6(parsedNodeIP) { | |
klog.V(0).Infof("IPv6 node IP (%s), assume IPv6 operation", nodeIP) | |
protocol = utilipt.ProtocolIPv6 | |
} |
The hostport_manager has its own iptables handler, but depends on the kubelet for some rules, however, due to this #82125 , those rules are added by kube-proxy
Now that you mention it, without kube-proxy this won't work for dual-stack @danwinship
I guess it is kubernetes/pkg/apis/core/types.go Lines 1762 to 1764 in adac1bf
|
"-A KUBE-HOSTPORTS -m comment --comment \"pod5_ns5 hostport 8888\" -m tcp -p tcp --dport 8888 -j KUBE-HP-TUKTZ736U5JD5UTK": true, | ||
"-A KUBE-HOSTPORTS -m comment --comment \"pod5_ns5 hostport 8888\" -m tcp -p tcp --dport 8888 -j KUBE-HP-CAAJ45HDITK7ARGM": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this does not seem to work as expected, it should load balance between the different backend HostIPs
it does work, it has the HostIP in the iptables rule
I just realized that the hostport_ syncer code was removed recently In the kubernetes repo only the kubenet seems to use this code 🤔 |
the hostport manager was not taking into consideration the hostIP when binding the socket of the hostPort, causing that the same HostPort can not be used with different IP addresses.
The e2e test, included as part of Conformance, "validates that there is no conflict between pods with same hostPort but different hostIP and protocol" was only testing that the pods were scheduled without conflict but was never testing the functionality. The test should check that pods with containers forwarding the same hostPort can be scheduled without conflict, and that those exposed HostPort are forwarding the ports to the corresponding pods.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: aojea The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
My PR was just a follow up clean up after this one: #62212 I wasn't sure about hostport functionality and whether it is actually used. One more discussion point here - there is a leftover of duplicates suppression logic for nodeport mappings. Is it needed in kubelet if CNI should be responsible for setting it all up? |
/retest |
@thockin I think that this is only being used by kubenet in k8s, the rest of the imports are just for the type is kubenet going to move out of the kubelet? |
/kind bug
What this PR does / why we need it:
the hostport manager was not taking into consideration the hostIP
when binding the socket of the hostPort, causing that the same
HostPort can not be used with different IP addresses.