Skip to content

Calico ClusterRole missing perms which causes calicoctl to error #11683

Closed as not planned
@bvierra

Description

@bvierra

What happened?

New install with calico used as cni. Log into a k8s node and run calicoctl.sh ipam check and you get a perm issue

root@k8s-worker-1:/home/ansible# calicoctl.sh ipam check
Checking IPAM for inconsistencies...

Loading all IPAM blocks...
Found 5 IPAM blocks.
 IPAM block 10.233.110.128/26 affinity=host:k8s-worker-1:
 IPAM block 10.233.113.0/26 affinity=host:k8s-control-1:
 IPAM block 10.233.66.0/26 affinity=host:k8s-worker-4:
 IPAM block 10.233.85.64/26 affinity=host:k8s-control-2:
 IPAM block 10.233.93.192/26 affinity=host:k8s-control-3:
IPAM blocks record 5 allocations.

Loading all IPAM pools...
  10.233.64.0/18
Found 1 active IP pools.

Loading all nodes.
failed to list nodes: connection is unauthorized: nodes is forbidden: User "system:serviceaccount:kube-system:calico-cni-plugin" cannot list resource "nodes" in API group "" at the cluster scope

The fix was to apply:

diff --git a/roles/network_plugin/calico/templates/calico-cr.yml.j2 b/roles/network_plugin/calico/templates/calico-cr.yml.j2
index 7ddec1698..5e6651761 100644
--- a/roles/network_plugin/calico/templates/calico-cr.yml.j2
+++ b/roles/network_plugin/calico/templates/calico-cr.yml.j2
@@ -11,6 +11,7 @@ rules:
       - namespaces
     verbs:
       - get
+      - list
   - apiGroups: [""]
     resources:
       - pods/status

Note that I added list to pods, nodes, and namespaces because they also needed the permission (separated out nodes for list and get, then namespaces errored on the next run)

What did you expect to happen?

I expected it not to error and get a result similar to the following:

root@k8s-worker-1:/home/ansible# calicoctl.sh ipam check
Checking IPAM for inconsistencies...

Loading all IPAM blocks...
Found 5 IPAM blocks.
 IPAM block 10.233.110.128/26 affinity=host:k8s-worker-1:
 IPAM block 10.233.113.0/26 affinity=host:k8s-control-1:
 IPAM block 10.233.66.0/26 affinity=host:k8s-worker-4:
 IPAM block 10.233.85.64/26 affinity=host:k8s-control-2:
 IPAM block 10.233.93.192/26 affinity=host:k8s-control-3:
IPAM blocks record 5 allocations.

Loading all IPAM pools...
  10.233.64.0/18
Found 1 active IP pools.

Loading all nodes.
Found 0 node tunnel IPs.

Loading all workload endpoints.
Found 5 workload IPs.
Workloads and nodes are using 5 IPs.

Loading all handles
Looking for top (up to 20) nodes by allocations...
  k8s-worker-4 has 1 allocations
  k8s-control-2 has 1 allocations
  k8s-control-3 has 1 allocations
  k8s-worker-1 has 1 allocations
  k8s-control-1 has 1 allocations
Node with most allocations has 1; median is 1

Scanning for IPs that are allocated but not actually in use...
Found 0 IPs that are allocated in IPAM but not actually in use.
Scanning for IPs that are in use by a workload or node but not allocated in IPAM...
Found 0 in-use IPs that are not in active IP pools.
Found 0 in-use IPs that are in active IP pools but have no corresponding IPAM allocation.

Scanning for IPAM handles with no matching IPs...
Found 0 handles with no matching IPs (and 5 handles with matches).
Scanning for IPs with missing handle...
Found 0 handles mentioned in blocks with no matching handle resource.
Check complete; found 0 problems.

How can we reproduce it (as minimally and precisely as possible)?

Do an install with calico setup (will include my calico vars below). Log into a k8s node and run calicoctl.sh ipam check. Apply the diff from above and then run the same command and it works.

My group_vars/k8s_cluster/k8s-net-calico.yml is as follows, however you should not need all of the bgp settings.

---
calico_cni_name: k8s-pod-network
peer_with_router: true
nat_outgoing: true
nat_outgoing_ipv6: false
calico_pool_name: "default-pool"
calico_pool_blocksize: 26
calico_pool_cidr: 10.233.64.0/18
calico_cni_pool: true
global_as_num: "64513"
calico_mtu: 1500
calico_veth_mtu: 1500
calico_advertise_cluster_ips: true
calico_datastore: "kdd"
typha_enabled: true
calico_network_backend: 'bird'
calico_ipip_mode: 'Never'
calico_vxlan_mode: 'Never'
calico_apiserver_enabled: true
peers:
  - router_id: "10.10.130.1"
    as: "64512"

OS

Linux 6.8.0-48-generic x86_64
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo

Version of Ansible

ansible [core 2.16.12]
  config file = None
  configured module search path = ['/home/bvierra/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/bvierra/p/homelab/.direnv/python-3.12.3/lib/python3.12/site-packages/ansible
  ansible collection location = /home/bvierra/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/bvierra/p/homelab/.direnv/python-3.12.3/bin/ansible
  python version = 3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0] (/home/bvierra/p/homelab/.direnv/python-3.12.3/bin/python)
  jinja version = 3.1.4
  libyaml = True

Version of Python

Python 3.12.3

Version of Kubespray (commit)

bb7b4e0

Network plugin used

calico

Full inventory with variables

If needed I can add, but it appeared there was some stuff I would have to redact and it was large :)

Command used to invoke ansible

ansible-playbook -i inventory/mycluster/hosts.yaml cluster.yml

Output of ansible run

as above

Anything else we need to know

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions