Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Calico add-on #38169

Merged
merged 1 commit into from
May 20, 2017

Conversation

caseydavenport
Copy link
Member

@caseydavenport caseydavenport commented Dec 6, 2016

What this PR does / why we need it:

Updates Calico to the latest version using self-hosted install as a DaemonSet, removes Calico's dependency on etcd.

  • Remove last bits of Calico salt
  • Failing on the master since no kube-proxy to access API.
  • Fix outgoing NAT
  • Tweak to work on both debian / GCI (not just GCI)
  • Add the portmap plugin for host port support

Maybe:

  • Add integration test

Which issue this PR fixes:

#32625

Try it out

Clone the PR, then:

make quick-release
export NETWORK_POLICY_PROVIDER=calico
export NODE_OS_DISTRIBUTION=gci
export MASTER_SIZE=n1-standard-4
./cluster/kube-up.sh 

Release note:

The Calico version included in kube-up for GCE has been updated to v2.2.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 6, 2016
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Dec 6, 2016
@caseydavenport
Copy link
Member Author

Quoting #32625 (comment)

To avoid this in the future, I'd recommend adding a separate test suite that tests calico integration.

@vishh would you mind pointing me to the right place to add this?

@eparis
Copy link
Contributor

eparis commented Dec 6, 2016

@kubernetes/sig-network

@thockin
Copy link
Member

thockin commented Dec 8, 2016

nice!

@caseydavenport caseydavenport force-pushed the calico-daemonset branch 2 times, most recently from dbc960a to e2bf14f Compare December 15, 2016 02:09
@caseydavenport
Copy link
Member Author

Looks like the PR that removed the legacy "configure-cbr0" mode also removed some handy outgoing NAT behavior that this PR was depending on (or rather it's now kubenet only).

Calico can do its own NAT (and usually doesn't rely on the above), but I haven't added support for that feature yet to the "etcdless" mode used in this PR, so that's the next step to move this forward.

@roberthbailey
Copy link
Contributor

Is this still WIP?

@caseydavenport
Copy link
Member Author

@roberthbailey Yes, sorry, I forgot about this guy. Just a few tweaks I need to make - will do it this week.

@caseydavenport caseydavenport changed the title [WIP] Update Calico add-on Update Calico add-on Feb 8, 2017
@caseydavenport
Copy link
Member Author

It's still a bit icky.

I can't mount /opt/cni/bin (the location for CNI binaries on debian, etc) when running on GCI, which means that I can only make the manifest work on one or the other.

How do we feel about making GCI and debian both look in /home/kubernetes/bin for CNI plugins so that at least they're consistent?

@freehan
Copy link
Contributor

freehan commented Feb 10, 2017

How do we feel about making GCI and debian both look in /home/kubernetes/bin for CNI plugins so that at least they're consistent?

Probably better to put it in /home/kubernetes/bin/cni. There are a bunch of stuff in the directory already.

@caseydavenport
Copy link
Member Author

@freehan I've left it as /home/kubernetes/bin for the moment, since that is what the default GCI scripts configure right now and I'm worried about breaking other CNI providers that might expect that location. https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh#L637

Do you think it's OK to change?

@caseydavenport
Copy link
Member Author

This PR currently wants this: #41943

@caseydavenport
Copy link
Member Author

@k8s-bot gce etcd3 e2e test this

@freehan
Copy link
Contributor

freehan commented Feb 24, 2017

/lgtm

kubernetes.io/cluster-service: "true"
k8s-app: calico-node
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this runs in a cluster w/o alpha features enabled, do these annotations get ignored?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

- name: CALICO_NETWORKING_BACKEND
value: "none"
- name: CALICO_IPV4POOL_CIDR
value: "10.244.0.0/16"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this CIDR range do? Should it be configurable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CIDR range is used by Calico to perform outgoing NAT for Pods to access the internet, etc and is required as of k8s v1.5 since the kubelet stopped doing NAT for CNI plugins.

It probably should be configurable if the cluster CIDR is configurable (they should have the same value). I think that would mean making this file into a template.

@roberthbailey
Copy link
Contributor

@caseydavenport -- are you still working on this PR? The labels indicate that you need to set a release note before it will merge.

@caseydavenport
Copy link
Member Author

@roberthbailey sorry for the delay in response, have been out of the office for a few weeks.

I've already got a release note in the description, but I'd like to rebase and retest this before merging.

@dnardo
Copy link
Contributor

dnardo commented May 8, 2017

@caseydavenport Any chance we can get this in this week?

@caseydavenport
Copy link
Member Author

@dnardo just spun another cluster up today with this and noticed that something has changed in the last rebase that is causing the master components to fail.

I need to investigate the cause of that - will post back with more information. I'm really hoping to get this merged this week provided I can sort out this new issue.

@caseydavenport
Copy link
Member Author

Interestingly enough, it looks like the problem I'm seeing is that the kubelet is trying to evict the controller manager upon installation of Calico on the node.

May 09 21:36:04 kubernetes-master kubelet[1587]: I0509 21:36:04.886477    1587 preemption.go:101] preemption: attempting to evict pods [&Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:kube-controller-manager-kubernetes-master

Not sure yet what the right fix is - might just be to stop running Calico on the master somehow, or adjust resource requests, or.. something else?

@caseydavenport
Copy link
Member Author

caseydavenport commented May 10, 2017

@dnardo Yeah, can confirm that this works just swell when I set MASTER_SIZE=n1-standard-4.

@caseydavenport caseydavenport force-pushed the calico-daemonset branch 3 times, most recently from ff012f5 to c100e5a Compare May 12, 2017 22:41
@caseydavenport
Copy link
Member Author

@dnardo @roberthbailey I think this is ready to go in now.

@dnardo
Copy link
Contributor

dnardo commented May 12, 2017

lgtm

@@ -122,7 +122,7 @@ ENABLE_CLUSTER_MONITORING="${KUBE_ENABLE_CLUSTER_MONITORING:-influxdb}"
# of fluentd running on a node, kubelet need to mark node on which
# fluentd is not running as a manifest pod with appropriate label.
# TODO(piosz): remove this in 1.8
NODE_LABELS="${KUBE_NODE_LABELS:-beta.kubernetes.io/fluentd-ds-ready=true}"
NODE_LABELS="${KUBE_NODE_LABELS:-beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/calico-ds-ready=true}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to make this conditional on NETWORK_POLICY_PROVIDER=calico ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be beta.kubernetes.io/calico-ds-ready=true ? Shouldn't it be something like: beta.calicoproject.org/ds-ready=true ?

I guess my question comes down to 'who defined this as 'beta' and 'why does it belong in the kubernetes.io org?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eparis I was mostly modeling this off the annotation used for the fluentd DaemonSet for consistency. The designation as beta has not had a formal agreement by anyone. projectcalico.org/ds-ready=true sounds fine to me.

@dnardo yes, I think you're right. Will update.

@caseydavenport
Copy link
Member Author

@k8s-bot pull-kubernetes-federation-e2e-gce test this
@k8s-bot cvm gke e2e test this

@caseydavenport
Copy link
Member Author

/release-note

@k8s-github-robot k8s-github-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels May 15, 2017
@caseydavenport
Copy link
Member Author

@dnardo @roberthbailey @eparis I think I've made all the markups - please take a look. Thanks!


# Replace the cluster cidr.
local -r calico_file="${dst_dir}/calico-policy-controller/calico-node.yaml"
sed -i -e "s@__CLUSTER_CIDR__@${CLUSTER_IP_RANGE:-'10.244.0.0/16'}@g" "${calico_file}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the default value here. It doesn't match the value in config-default.sh (https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/config-default.sh#L89) and I'm wondering if this script should just fail if the CIDR isn't set rather than falling back to a value that will likely result in a non-functional cluster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's wrong - it should just use the value in config-default.sh.

# container programs network policy and routes on each
# host.
- name: calico-node
image: calico/node:v1.2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this pulls from dockerhub. Most (all?) of the other cluster addons pull from gcr.io, so it'd be nice to use that here as well for consistency.

@roberthbailey
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 19, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: caseydavenport, freehan, roberthbailey

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 19, 2017
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 20, 2017

@caseydavenport: The following test(s) failed:

Test name Commit Details Rerun command
Jenkins GKE smoke e2e 0622a6292bf527b9c93a524a790ff19686929b85 link @k8s-bot cvm gke e2e test this
pull-kubernetes-federation-e2e-gce 63744a8 link @k8s-bot pull-kubernetes-federation-e2e-gce test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit a9d0403 into kubernetes:master May 20, 2017
@caseydavenport caseydavenport deleted the calico-daemonset branch May 20, 2017 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants