Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dockershim hostport manager should use HostIP #94382

Closed
wants to merge 2 commits into from

Conversation

aojea
Copy link
Member

@aojea aojea commented Aug 31, 2020

/kind bug

What this PR does / why we need it:

the hostport manager was not taking into consideration the hostIP
when binding the socket of the hostPort, causing that the same
HostPort can not be used with different IP addresses.

fix a bug where the hostport manager used by dockershim didn't allow to bind the same HostPort on different HostIPs

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 31, 2020
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 31, 2020
@aojea aojea changed the title dockershim hostport manager use HostIP dockershim hostport manager should use HostIP Aug 31, 2020
@aojea
Copy link
Member Author

aojea commented Aug 31, 2020

/assign @thockin @danwinship @dcbw

I need to check if there is some job exercising this code with e2e tests, lot of jobs were move to containerd and (I think) using the CNI portmap plugin (at least the KIND ones)

@thockin
Copy link
Member

thockin commented Aug 31, 2020

I am frankly surprised this works at all in the scheduler - does it? Do we really want it to? What's the use case? This seems verrrrry subtle.

@aojea
Copy link
Member Author

aojea commented Sep 1, 2020

I am frankly surprised this works at all in the scheduler - does it? Do we really want it to? What's the use case? This seems verrrrry subtle.

There is a conformance test covering exactly this use case

/*
Release: v1.16
Testname: Scheduling, HostPort matching and HostIP and Protocol not-matching
Description: Pods with the same HostPort value MUST be able to be scheduled to the same node
if the HostIP or Protocol is different.
*/
framework.ConformanceIt("validates that there is no conflict between pods with same hostPort but different hostIP and protocol", func() {

It seems that works (at least in CRIO cri-o/cri-o#4130 (comment)) because, after the first attempt to allocate the HostPort, it removes ALL sockets listening in that port, so the second attempt works

@aojea
Copy link
Member Author

aojea commented Sep 1, 2020

I can't find any job in the CI running this code :/

I was reading the issue tracking the hostport implementation #35457, and it seems that this was temporary until the CNI plugins implements the portmapping,

@thockin do you think that does not make sense to pursue this and let it as is, fact is nobody noticed this based on a quick github search.

If we don fix it here., @danwinship I can make it a package in crio instead of vendoring it

@danwinship
Copy link
Contributor

do you think that does not make sense to pursue this and let it as is, fact is nobody noticed this based on a quick github search.

Well, besides the fact that we know some people are vendoring this code, there's also the fact that it means that people using kubernetes with dockershim get non-conforming behavior. The fact that we don't currently run any CI jobs in that configuration doesn't change the fact that it's broken.

The fact that the conformance test fails to accurately test the behavior it's supposed to be testing is... interesting. We should probably fix the conformance test in 1.20 (by making it check that the first hostport still connects to the first pod after the second pod has been created) but not backport the fix, so as to not retroactively de-conform systems that previously passed conformance in earlier releases. At least, I think that's the right approach?

@@ -35,13 +35,15 @@ func TestOpenCloseHostports(t *testing.T) {
podPortMapping *PodPortMapping
expectError bool
}{
// no portmaps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're going to identify all the test cases you might as well identify them as strings in the test cases so you can then print the description with the error when an error occurs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is not useful here because the test cases are not independent. Personally, I don't like using this test structure but didn't want to change the whole test,so I carried over with current status. At least the comments explain better what is supposed to happen

for _, tc := range openPortCases {
mapping, err := manager.openHostports(tc.podPortMapping)
for hostport, socket := range mapping {
manager.hostPortMap[hostport] = socket
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed here? This seems like something that should either happen in the actual manager code, or else not be needed.

If it's just for the counting below, then that's really only testing the behavior of the test program, not the behavior of the manager...

Copy link
Member Author

@aojea aojea Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was actually a bug in the test.
The same loop to add the open sockets to the hostPortMap map does happen in the manager.Add() method, https://github.com/kubernetes/kubernetes/blob/12fc5e50b9394cc3fa498b24a96c894d96b3bff2/pkg/kubelet/dockershim/network/hostport/hostport_manager.go#L107-L114

We need this because later we test the closeHostports() method that iterates over that map https://github.com/kubernetes/kubernetes/blob/12fc5e50b9394cc3fa498b24a96c894d96b3bff2/pkg/kubelet/dockershim/network/hostport/hostport_manager.go#L310-L327
in

	for _, tc := range closePortCases {
		err := manager.closeHostports(tc.portMappings)

if we don't populate the map the closeHostports will always succeed because it's an empty map.
I've also added the counting below as a safety check and to make it easier to debug

@thockin
Copy link
Member

thockin commented Sep 1, 2020 via email

"-m", "comment", "--comment", fmt.Sprintf(`"%s hostport %d"`, podFullName, pm.HostPort),
"-m", protocol, "-p", protocol,
"-j", "DNAT", fmt.Sprintf("--to-destination=%s", hostPortBinding))
if pm.HostIP == "" || pm.HostIP == "0.0.0.0" || pm.HostIP == "::" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does kubelet know to use iptables for IPv6 if HostIP is :: here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubelet is ALWAYS single stack as today, and its IP family depends on the nodeIP,

parsedNodeIP := net.ParseIP(nodeIP)
protocol := utilipt.ProtocolIPv4
if utilnet.IsIPv6(parsedNodeIP) {
klog.V(0).Infof("IPv6 node IP (%s), assume IPv6 operation", nodeIP)
protocol = utilipt.ProtocolIPv6
}

The hostport_manager has its own iptables handler, but depends on the kubelet for some rules, however, due to this #82125 , those rules are added by kube-proxy

Now that you mention it, without kube-proxy this won't work for dual-stack @danwinship

@aojea
Copy link
Member Author

aojea commented Sep 3, 2020

I want to say B, but I suspect the answer is A. Can we document this behavior on the API types.go, please?

I guess it is

// Optional: What host IP to bind the external port to.
// +optional
HostIP string

Comment on lines +450 to +451
"-A KUBE-HOSTPORTS -m comment --comment \"pod5_ns5 hostport 8888\" -m tcp -p tcp --dport 8888 -j KUBE-HP-TUKTZ736U5JD5UTK": true,
"-A KUBE-HOSTPORTS -m comment --comment \"pod5_ns5 hostport 8888\" -m tcp -p tcp --dport 8888 -j KUBE-HP-CAAJ45HDITK7ARGM": true,
Copy link
Member Author

@aojea aojea Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this does not seem to work as expected, it should load balance between the different backend HostIPs
it does work, it has the HostIP in the iptables rule

@aojea
Copy link
Member Author

aojea commented Sep 3, 2020

I just realized that the hostport_ syncer code was removed recently
#93581

In the kubernetes repo only the kubenet seems to use this code 🤔

Antonio Ojea added 2 commits September 3, 2020 16:22
the hostport manager was not taking into consideration the hostIP
when binding the socket of the hostPort, causing that the same
HostPort can not be used with different IP addresses.
The e2e test, included as part of Conformance,
"validates that there is no conflict between
 pods with same hostPort but different hostIP and protocol"
was only testing that the pods were scheduled without conflict
but was never testing the functionality.

The test should check that pods with containers forwarding the same
hostPort can be scheduled without conflict, and that those exposed
HostPort are forwarding the ports to the corresponding pods.
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aojea
To complete the pull request process, please assign thockin
You can assign the PR to them by writing /assign @thockin in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/test sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Sep 3, 2020
@SergeyKanzhelev
Copy link
Member

I just realized that the hostport_ syncer code was removed recently
#93581

In the kubernetes repo only the kubenet seems to use this code 🤔

My PR was just a follow up clean up after this one: #62212 I wasn't sure about hostport functionality and whether it is actually used.

One more discussion point here - there is a leftover of duplicates suppression logic for nodeport mappings. Is it needed in kubelet if CNI should be responsible for setting it all up?

@aojea
Copy link
Member Author

aojea commented Sep 4, 2020

/retest

@aojea
Copy link
Member Author

aojea commented Sep 4, 2020

One more discussion point here - there is a leftover of duplicates suppression logic for nodeport mappings. Is it needed in kubelet if CNI should be responsible for setting it all up?

@thockin I think that this is only being used by kubenet in k8s, the rest of the imports are just for the type []*hostport.PortMapping

is kubenet going to move out of the kubelet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants