Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syncNetworkUtil in kubelet and fix loadbalancerSourceRange on GCE #30486

Merged
merged 3 commits into from
Aug 16, 2016

Conversation

freehan
Copy link
Contributor

@freehan freehan commented Aug 12, 2016

fixes: #29997 #29039

@yujuhong Can you take a look at the kubelet part?

@girishkalele KUBE-MARK-DROP is the chain for dropping connections. Marked connection will be drop in INPUT/OUTPUT chain of filter table. Let me know if this is good enough for your use case.


This change is Reviewable

@freehan freehan added the release-note-none Denotes a PR that doesn't merit a release note. label Aug 12, 2016
@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 12, 2016
@girishkalele
Copy link

Review status: 0 of 15 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


cmd/kube-proxy/app/server.go, line 414 [r1] (raw file):

func getNodeIP(client *kubeclient.Client, hostname string) net.IP{
  var nodeIP net.IP
  node, err := client.Nodes().Get(hostname)

Is this a client to connect to API server ? Is this a new dependency for kube-proxy to have a good connection to api server or we already have other calls ?


Comments from Reviewable

@freehan
Copy link
Contributor Author

freehan commented Aug 12, 2016

cmd/kube-proxy/app/server.go, line 414 [r1] (raw file):

Previously, girishkalele (Girish Kalele) wrote…

Is this a client to connect to API server ? Is this a new dependency for kube-proxy to have a good connection to api server or we already have other calls ?

This is not a new dependency. It has been used for watchers.

Comments from Reviewable

@freehan freehan force-pushed the lbsrcfix branch 3 times, most recently from 49c35fd to fb93e09 Compare August 12, 2016 18:51
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 12, 2016
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 12, 2016
@girishkalele
Copy link

Review status: 0 of 15 files reviewed at latest revision, 2 unresolved discussions.


cmd/kubelet/app/options/options.go, line 160 [r1] (raw file):

  fs.BoolVar(&s.CPUCFSQuota, "cpu-cfs-quota", s.CPUCFSQuota, "Enable CPU CFS quota enforcement for containers that specify CPU limits")
  fs.BoolVar(&s.EnableControllerAttachDetach, "enable-controller-attach-detach", s.EnableControllerAttachDetach, "Enables the Attach/Detach controller to manage attachment/detachment of volumes scheduled to this node, and disables kubelet from executing any attach/detach operations")
  fs.BoolVar(&s.ConfigIPTablesUtil, "config-iptables-util", s.ConfigIPTablesUtil, "If true, kubelet will ensure iptables utility rules are present on host.")

We should also notify Michael Taufen who is working on kubelet flags in configmaps (I think).


Comments from Reviewable

@girishkalele
Copy link

Reviewed 2 of 15 files at r1.
Review status: 0 of 15 files reviewed at latest revision, 2 unresolved discussions.


cmd/kubelet/app/options/options.go, line 160 [r1] (raw file):

Previously, girishkalele (Girish Kalele) wrote…

We should also notify Michael Taufen who is working on kubelet flags in configmaps (I think).

Oh, just noticed that change down below. Nevermind.

Comments from Reviewable

@girishkalele
Copy link

Review status: 0 of 15 files reviewed at latest revision, 3 unresolved discussions.


pkg/apis/componentconfig/types.go, line 408 [r1] (raw file):

  // Warning: Please match the value of corresponding parameter in kube-proxy
  // TODO: clean up IPTablesMasqueradeBit in kube-proxy
  IPTablesMasqueradeBit *int32 `json:"iptablesMasqueradeBit"`

the other int32 declarations in this file are not "*int32" ?


Comments from Reviewable

@girishkalele
Copy link

Reviewed 3 of 15 files at r1.
Review status: 0 of 15 files reviewed at latest revision, 4 unresolved discussions.


pkg/apis/componentconfig/v1alpha1/defaults.go, line 344 [r1] (raw file):

  }
  if obj.IPTablesMasqueradeBit == nil {
      temp := int32(14)

Bit 14 matches 0x4000, Bit 15 will be 0x8000. Do we have a range of bits for our use ? Are there other consumers of these mark bits ?


Comments from Reviewable

@thockin
Copy link
Member

thockin commented Aug 15, 2016

Review status: 0 of 15 files reviewed at latest revision, 15 unresolved discussions.


cmd/kubelet/app/options/options.go, line 160 [r2] (raw file):

  fs.BoolVar(&s.CPUCFSQuota, "cpu-cfs-quota", s.CPUCFSQuota, "Enable CPU CFS quota enforcement for containers that specify CPU limits")
  fs.BoolVar(&s.EnableControllerAttachDetach, "enable-controller-attach-detach", s.EnableControllerAttachDetach, "Enables the Attach/Detach controller to manage attachment/detachment of volumes scheduled to this node, and disables kubelet from executing any attach/detach operations")
  fs.BoolVar(&s.ConfigIPTablesUtil, "config-iptables-util", s.ConfigIPTablesUtil, "If true, kubelet will ensure iptables utility rules are present on host.")

propose "--make-iptables-util-chains" as a clearer name


cmd/kubelet/app/options/options.go, line 161 [r2] (raw file):

  fs.BoolVar(&s.EnableControllerAttachDetach, "enable-controller-attach-detach", s.EnableControllerAttachDetach, "Enables the Attach/Detach controller to manage attachment/detachment of volumes scheduled to this node, and disables kubelet from executing any attach/detach operations")
  fs.BoolVar(&s.ConfigIPTablesUtil, "config-iptables-util", s.ConfigIPTablesUtil, "If true, kubelet will ensure iptables utility rules are present on host.")
  fs.Int32Var(s.IPTablesMasqueradeBit, "iptables-masquerade-bit", util.Int32PtrDerefOr(s.IPTablesMasqueradeBit, 14), "The bit of the fwmark space to mark packets for SNAT. Must be within the range [0, 31]. Please match this parameter with corresponding parameter in Kube-Proxy.")

nit: kube-proxy is not capitalized


pkg/apis/componentconfig/types.go, line 408 [r1] (raw file):

Previously, girishkalele (Girish Kalele) wrote…

the other int32 declarations in this file are not "*int32" ?

This has to be a pointer to distinguish user-provided 0 from not-specified

pkg/apis/componentconfig/types.go, line 403 [r2] (raw file):

  ProtectKernelDefaults bool `json:"protectKernelDefaults"`
  // If true, Kubelet ensures a set of iptables rules are present on host.
  // These rules will serve as utility for various components, e.g. KubeProxy.

kube-proxy


pkg/apis/componentconfig/v1alpha1/defaults.go, line 344 [r1] (raw file):

Previously, girishkalele (Girish Kalele) wrote…

Bit 14 matches 0x4000, Bit 15 will be 0x8000. Do we have a range of bits for our use ? Are there other consumers of these mark bits ?

yes, other network tools like Calico can use bits. It is practically impossible to coordinate a priori. @caseydavenport - do you know if/which bit you guys use?

pkg/apis/componentconfig/v1alpha1/defaults.go, line 348 [r2] (raw file):

  }
  if obj.IPTablesDropBit == nil {
      temp := int32(15)

Should 14 and 15 be const values? also hardcoded in the flag defs.


pkg/kubelet/kubelet_network.go, line 348 [r2] (raw file):

      return
  }
  if _, err := kl.iptClient.EnsureRule(utiliptables.Append, utiliptables.TableNAT, KubeMarkDropChain,

weird linewrap, prefer just one line


pkg/kubelet/kubelet_network.go, line 354 [r2] (raw file):

  }
  if _, err := kl.iptClient.EnsureRule(utiliptables.Prepend, utiliptables.TableFilter, utiliptables.ChainOutput,
      "-m", "comment", "--comment", "kubernetes firewall for dropping marked packets", "-m", "mark", "--mark", dropMark, "-j", "DROP"); err != nil {

define something like dropArgs := []string{ "-m", "comment", "--comment", "kubernetes firewall for dropping marked packets", "-m", "mark", "--mark", dropMark, "-j", "DROP"}, then use dropArgs... for the two ensure.. ?

Alternately, make both INPUT and OUTPUT jump to KUBE_FIREWALL (new) and make that chain hold the actual rule. This is parallel to KUBE_POSTROUTING


pkg/proxy/iptables/proxier.go, line 68 [r2] (raw file):

  // the mark-for-masquerade chain
  // TODO: let kubelet manage this chain. Other component should just assume it exists and use it.

remove this TODO?


pkg/proxy/iptables/proxier.go, line 73 [r2] (raw file):

  // the mark we apply to traffic needing SNAT
  // TODO(thockin): Remove this for v1.3 or v1.4.
  oldIptablesMasqueradeMark = "0x4d415351"

Can you remove this, maybe in a followup?


pkg/proxy/iptables/proxier.go, line 581 [r2] (raw file):

// serviceFirewallChainName takes the servicePortChainName and substitue "KUBE-SVC" with "KUBE-FW"
func serviceFirewallChainName(svcChain utiliptables.Chain) utiliptables.Chain {
  return utiliptables.Chain(strings.Replace(string(svcChain), "KUBE-SVC", "KUBE-FW", 1))

I don't like KUBE-SVC being hardcoded here. Can this and servicePortChainName call the same backend func with different prefixes?

func serviceFirewallChainName(s proxy.ServicePortName, protocol string) utiliptables.Chain {
    return fmt.Sprintf("KUBE-FW-%s", portProtoHash(s, protocol))
}

pkg/proxy/iptables/proxier.go, line 877 [r2] (raw file):

              // We have to SNAT packets from external IPs.
              writeLine(natRules, append(args, "-j", string(KubeMarkMasqChain))...)
              // allow all sources

comment inside the if


pkg/proxy/iptables/proxier.go, line 890 [r2] (raw file):

                  writeLine(natRules, append(args, "-j", string(fwChain))...)

                  args = []string{

This could add a lot of rules to the linear search. Can we instead do something like:

a) if (proto, destip, destport) are the ingress jump to new chain
b) add all rules for processing VIP-related stuff there

That keeps the KUBE-SERVICES chain simpler, and nobody could cause trouble by adding a lot of source ranges

Also, what about nodePorts? Other LB's depend on NodePort behavior, not VIP behavior.


Comments from Reviewable

@freehan
Copy link
Contributor Author

freehan commented Aug 15, 2016

pkg/proxy/iptables/proxier.go, line 890 [r2] (raw file):

Previously, thockin (Tim Hockin) wrote…

This could add a lot of rules to the linear search. Can we instead do something like:

a) if (proto, destip, destport) are the ingress jump to new chain
b) add all rules for processing VIP-related stuff there

That keeps the KUBE-SERVICES chain simpler, and nobody could cause trouble by adding a lot of source ranges

Also, what about nodePorts? Other LB's depend on NodePort behavior, not VIP behavior.

Yeah. The current implementation is very close to what you described.
  1. if (proto, destip, destport) matches the ingress, jump to KUBE-FW chain (if sourceRange not specified, jump directly to KUBE-SVC chain)
  2. in KUBE-FW chain, match each source range and jump to KUBE-SVC chain
  3. if no match, jump to KUBE-MARK-DROP chain and return

Comments from Reviewable

@freehan
Copy link
Contributor Author

freehan commented Aug 15, 2016

pkg/proxy/iptables/proxier.go, line 68 [r2] (raw file):

Previously, thockin (Tim Hockin) wrote…

remove this TODO?

FIXED

Comments from Reviewable

@freehan
Copy link
Contributor Author

freehan commented Aug 15, 2016

pkg/proxy/iptables/proxier.go, line 73 [r2] (raw file):

Previously, thockin (Tim Hockin) wrote…

Can you remove this, maybe in a followup?

FIXED

Comments from Reviewable

@freehan
Copy link
Contributor Author

freehan commented Aug 15, 2016

pkg/kubelet/kubelet_network.go, line 354 [r2] (raw file):

Previously, thockin (Tim Hockin) wrote…

define something like dropArgs := []string{ "-m", "comment", "--comment", "kubernetes firewall for dropping marked packets", "-m", "mark", "--mark", dropMark, "-j", "DROP"}, then use dropArgs... for the two ensure.. ?

Alternately, make both INPUT and OUTPUT jump to KUBE_FIREWALL (new) and make that chain hold the actual rule. This is parallel to KUBE_POSTROUTING

The KUBE_FIREWALL approach sounds cleaner.

Comments from Reviewable

@freehan
Copy link
Contributor Author

freehan commented Aug 15, 2016

pkg/proxy/iptables/proxier.go, line 581 [r2] (raw file):

Previously, thockin (Tim Hockin) wrote…

I don't like KUBE-SVC being hardcoded here. Can this and servicePortChainName call the same backend func with different prefixes?

func serviceFirewallChainName(s proxy.ServicePortName, protocol string) utiliptables.Chain {
    return fmt.Sprintf("KUBE-FW-%s", portProtoHash(s, protocol))
}
FIXED

Comments from Reviewable

@freehan
Copy link
Contributor Author

freehan commented Aug 15, 2016

pkg/proxy/iptables/proxier.go, line 890 [r2] (raw file):

Previously, freehan (Minhan Xia) wrote…

Yeah. The current implementation is very close to what you described.

  1. if (proto, destip, destport) matches the ingress, jump to KUBE-FW chain (if sourceRange not specified, jump directly to KUBE-SVC chain)
  2. in KUBE-FW chain, match each source range and jump to KUBE-SVC chain
  3. if no match, jump to KUBE-MARK-DROP chain and return
IIUC, for AWS, security group was created per ELB. Hence it does not need firewall on node.

Comments from Reviewable

@thockin
Copy link
Member

thockin commented Aug 15, 2016

This might be true for AWS but we can't assume that is true in general.

On Mon, Aug 15, 2016 at 11:03 AM, Minhan Xia [email protected]
wrote:

pkg/proxy/iptables/proxier.go, line 890 [r2]
https://reviewable.kubernetes.io:443/reviews/kubernetes/kubernetes/30486#-KPDzg8ZuBp0M-2OHADI:-KPEFmVYwwSuzrDK5ZM3:b27qxxi
(raw file

args = []string{
):
Previously, freehan (Minhan Xia) wrote…

Yeah. The current implementation is very close to what you described.

  1. if (proto, destip, destport) matches the ingress, jump to KUBE-FW
    chain (if sourceRange not specified, jump directly to KUBE-SVC chain)
  2. in KUBE-FW chain, match each source range and jump to KUBE-SVC chain
  3. if no match, jump to KUBE-MARK-DROP chain and return

IIUC, for AWS, security group was created per ELB. Hence it does not need
firewall on node.


Comments from Reviewable
https://reviewable.kubernetes.io:443/reviews/kubernetes/kubernetes/30486


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#30486 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVCvTIfpYz1qt_vTWO5_v7UdgfcOKks5qgKoMgaJpZM4Jiq3q
.

@thockin
Copy link
Member

thockin commented Aug 15, 2016

OK for now.

On Mon, Aug 15, 2016 at 4:16 PM, Minhan Xia [email protected]
wrote:

pkg/kubelet/kubelet_network.go, line 354 [r2]
https://reviewable.kubernetes.io:443/reviews/kubernetes/kubernetes/30486#-KPDucO9RQ1z3lDx7LPK:-KPFSM4i5SRKVBCAU6TX:brgcp59
(raw file

"-m", "comment", "--comment", "kubernetes firewall for dropping marked packets", "-m", "mark", "--mark", dropMark, "-j", "DROP"); err != nil {
):
Previously, thockin (Tim Hockin) wrote…

are you going to do the KUBE_FIREWALL idea or should I finish the review
as-is?

Since the KUBE_FIREWALL chain will only has one rule, so I refactor a bit
and leave the most as-is.


Comments from Reviewable
https://reviewable.kubernetes.io:443/reviews/kubernetes/kubernetes/30486


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#30486 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVEV7v9jW843lNXoEsp0WdfjVppkyks5qgPNYgaJpZM4Jiq3q
.

@thockin
Copy link
Member

thockin commented Aug 15, 2016

LOL, your call. Ping me when ready again

On Mon, Aug 15, 2016 at 4:21 PM, Minhan Xia [email protected]
wrote:

pkg/kubelet/kubelet_network.go, line 354 [r2]
https://reviewable.kubernetes.io:443/reviews/kubernetes/kubernetes/30486#-KPDucO9RQ1z3lDx7LPK:-KPFVLtLda6A5hzjLmp8:b-vlrc4h
(raw file

"-m", "comment", "--comment", "kubernetes firewall for dropping marked packets", "-m", "mark", "--mark", dropMark, "-j", "DROP"); err != nil {
):
Previously, freehan (Minhan Xia) wrote…

Since the KUBE_FIREWALL chain will only has one rule, so I refactor a bit
and leave the most as-is.

Okay. I take that back. I will do the KUBE_FIREWALL idea.


Comments from Reviewable
https://reviewable.kubernetes.io:443/reviews/kubernetes/kubernetes/30486


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#30486 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVNcsqHIZae50jK3RkfRXQh7SfOMxks5qgPR8gaJpZM4Jiq3q
.

@k8s-bot
Copy link

k8s-bot commented Aug 16, 2016

GCE e2e build/test passed for commit 643fc38.

@thockin
Copy link
Member

thockin commented Aug 16, 2016

Reviewed 2 of 14 files at r2, 11 of 13 files at r3, 2 of 2 files at r4.
Review status: all files reviewed at latest revision, 17 unresolved discussions.


pkg/kubelet/kubelet_network.go, line 46 [r4] (raw file):

  KubePostroutingChain utiliptables.Chain = "KUBE-POSTROUTING"

  // kubernetes postrouting rules

Comment is wrong. If this LGTM overall I might approve and you can fix in followup


pkg/proxy/iptables/proxier.go, line 588 [r4] (raw file):

// then encoding to base32 and truncating with the prefix "KUBE-FW-".
func serviceFirewallChainName(s proxy.ServicePortName, protocol string) utiliptables.Chain {
  return utiliptables.Chain("KUBE-FW-" + portProtoHash(s, protocol))

As in @girishkalele PR, I propose XLB as a middle ground on naming


pkg/proxy/iptables/proxier.go, line 892 [r4] (raw file):

              // The service firewall rules are created based on ServiceSpec.loadBalancerSourceRanges field.
              // This currently works for loadbalancers that preserves source ips.
              // For loadbalancers which direct traffic to service NodePort, the firewall rules will not apply.

put the definition of args here, closer to usage


pkg/proxy/iptables/proxier.go, line 898 [r4] (raw file):

                  "-A", string(fwChain),
                  "-m", "comment", "--comment", fmt.Sprintf(`"%s loadbalancer IP"`, svcName.String()),
                  "-m", protocol, "-p", protocol,

you already checked protocol port and destination before jumping into this chain. This seems redundant.


Comments from Reviewable

@thockin
Copy link
Member

thockin commented Aug 16, 2016

I'm going to LGTM. Nits can be fixed up in followup

@thockin thockin added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2016
@thockin thockin modified the milestones: v1.3, v1.4 Aug 16, 2016
@thockin thockin added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Aug 16, 2016
@thockin
Copy link
Member

thockin commented Aug 16, 2016

bumping prio because security and cherrypick candidate. I'm worried that it may be on the large side for a cherrypick.

@freehan this needs an e2e ASAP

@thockin
Copy link
Member

thockin commented Aug 16, 2016

@fabioy

@yujuhong
Copy link
Contributor

@yujuhong Can you take a look at the kubelet part?

There is not much change to the kubelet side (other than adding more flags). LGTM.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Aug 16, 2016

GCE e2e build/test passed for commit 643fc38.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit d412d57 into kubernetes:master Aug 16, 2016
@lavalamp
Copy link
Member

I think this broke gce e2e slow. revert.

@thockin
Copy link
Member

thockin commented Aug 17, 2016

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:709
  should test kube-proxy [Slow] [It]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/kubeproxy.go:107

  Response was:map[errors:[Get http://146.148.126.105:32080/hostName: dial tcp 146.148.126.105:32080: getsockopt: no route to host Get http://146.148.126.105:32080/hostName: dial tcp 146.148.126.105:32080: getsockopt: no route to host Get http://146.148.126.105:32080/hostName: dial tcp 146.148.126.105:32080: getsockopt: no route to host Get http://146.148.126.105:32080/hostName: dial tcp 146.148.126.105:32080: getsockopt: no route to host Get http://146.148.126.105:32080/hostName: dial tcp 146.148.126.105:32080: getsockopt: no route to host Get http://146.148.126.105:32080/hostName: dial tcp 146.148.126.105:32080: getsockopt: no route to host Get http://146.148.126.105:32080/hostName: dial tcp 146.148.126.105:32080: getsockopt: no route to host Get http://146.148.126.105:32080/hostName: dial tcp 146.148.126.105:32080: getsockopt: no route to host Get http://146.148.126.105:32080/hostName: dial tcp 146.148.126.105:32080: getsockopt: no route to host Get http://146.148.126.105:32080/hostName: dial tcp 146.148.126.105:32080: getsockopt: no route to host]]
  Expected
      <int>: 0
  to be ==
      <int>: 2

  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/kubeproxy.go:247

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

loadBalancerSourceRanges does not work on GKE
10 participants