-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-proxy scheduled before node is initialised by cloud-controller #1027
Comments
if this cannot be fixed by adjusting the kubeadm kube-proxy manifest and config map: or by adjusting the kubeproxy config that the kubeadm config embeds: then it's hard to qualify this as a kubeadm issue. |
Thanks. I'll have a play with the DS and if I can't do anything there I'll kick it upstairs. |
Running the kube-proxy regardless of taints via the |
@NeilW we are really back and forth on this one and to be honest i don't know what's best here in terms of rules, except that i know for sure that we need to expose those hardcoded addon configurations to the users and let them adjust the values they want. /assign @timothysc |
It looks like an architectural issue that's dropping down the gap. |
It's unfortunate that |
/cc @kubernetes/sig-scheduling-bugs |
I believe this issue happened at the time that kube-proxy (which is a Daemon) was being scheduled by DaemonSet controller. I wonder if the same issue exists now (k8s 1.12+) when Daemons are scheduled by the default scheduler. |
Same problem
|
The patch I use to workaround the issue is
|
Has anyone considered adding a "daemonTaints" field to kubeadm's |
so one of the reasons we didn't expose full control of the addon manifests in the latest iteration of the config was because we then lose some of the control when we do upgrades. so by planning a field such as overall this falls in the the bucket of customization the we currently simply do not allow and users need to patch. |
this idea here is mostly outdated (and not approved) but it spawned a bit of a discussion about the problems at hand: #1091 |
Thanks for the background. Instead of "daemonTaints," I should have said "daemonTolerations," but you get the idea. |
@timothysc I'm interested in helping with this issue or taking it as I understand it is related to the broader add-on issue as @neolit123 suggested. |
here are some key points in this thread: this isn't really a kubeadm bug, because in kubernetes/kubernetes#65931 we arguably fixed a bigger problem. but with that PR we introduced this problem, which then sort of transitions into a feature request in the scheduler as outlined here #1027 (comment) i don't think we can do much in this ticket for this cycle in terms of code. if anyone wants to help this is the place: /kind documentation |
i will send a website PR to add a troubleshooting note. |
sent docs PR to document the workaround: moving this to the Next milestone. |
Unfortunately it is. If I remove the toleration patch I get
when building a new cluster with terraform. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
docs PR merged, but not much we can do here on the kubeadm side to conform this use case. |
@neolit123: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
BUG REPORT
Versions
kubeadm version (use
kubeadm version
):kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:50:16Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Environment:
kubectl version
):Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:53:20Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Brightbox
Ubuntu 18.04 LTS
uname -a
):Linux srv-d35vu 4.15.0-29-generic The product_uuid and the hostname should be unique across nodes #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
What happened?
kube-proxy is scheduled on new worker nodes before the cloud-controller has initialised the node addresses. This causes kube-proxy to fail to pick up the node's IP address properly - with knock on effects to the proxy function managing load balancers.
What you expected to happen?
kube-proxy should probably block on the uninitialised taint via the config (or respond to the event of the address update if that is possible).
Not sure if kubeadm uses its own ds specification or just picks up an upstream one.
How to reproduce it (as minimally and precisely as possible)?
Run a kubeadm init with 'cloud-provider: external' set and join a worker to the cluster. Kube-Proxy will schedule and run on all nodes even with the uninitialised taints in place.
Anything else we need to know?
Deleting the pod and causing a reload picks up the node ip on the worker.
kubeadm.conf is
The text was updated successfully, but these errors were encountered: