Dynamic provisioning stopped working after upgrade #119913

arthurala · 2023-08-11T22:15:47Z

What happened?

Deployments that require PVC stopped working. I thought it the chart but then tested using simple PVC yaml and it also failed to spin up a PV.

Did a describe on the PVC and it shows status of PENDING
Checked the events and it showed:
MESSAGE:
waiting for a volume to be created, either by external provisioner "cluster.local/nfs-client-provisioner" or manually created by system administrator

OUTPUT from / kubectl get sc -o yaml

apiVersion: v1
items:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"
creationTimestamp: "2020-06-04T19:24:24Z"
labels:
app: nfs-client-provisioner
chart: nfs-client-provisioner-1.2.8
heritage: Tiller
io.cattle.field/appId: nfs-client-provisioner
release: nfs-client-provisioner
name: nfs-client
resourceVersion: "442718644"
uid: b607cb9e-f49c-4407-827c-4014dddba287
mountOptions:
- nfsvers=4.1
  parameters:
  archiveOnDelete: "true"
  provisioner: cluster.local/nfs-client-provisioner
  reclaimPolicy: Delete
  volumeBindingMode: Immediate
  kind: List
  metadata:
  resourceVersion: ""
  selfLink: ""

We do NOT know when this issue started because most of the deployments don't require PVC. So we aren't sure if this happened after the Rancher and K8S upgrade

What did you expect to happen?

Default behavior of Dynamic Provisioning to spin up a PV for a PVC if one doesn't already exist.

How can we reproduce it (as minimally and precisely as possible)?

unknown

Anything else we need to know?

Rancher: 2.6.12
Docker: 19.3.11

Kubernetes version

$ kubectl version
# paste output here

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.8"

Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.14"

Cloud provider

NA

OS version

# On Linux:
$ cat /etc/os-release
# paste output here

$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

NAME="CentOS Linux" VERSION="7 (Core)" Linux 3.10.0-1160.88.1.el7.x86_64 #1 SMP Tue Mar 7 15:41:52 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot · 2023-08-11T22:15:55Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

arthurala · 2023-08-11T22:21:59Z

/sig k8s-infra

arthurala · 2023-08-14T18:50:11Z

does anyone have insight into issue or experienced same problem?

arthurala · 2023-08-14T20:37:50Z

Provision log =
I0811 18:51:19.203915 1 leaderelection.go:185] attempting to acquire leader lease nfs-client-provisioner/cluster.local-nfs-client-provisioner...
E0811 18:51:36.614372 1 event.go:259] Could not construct reference to: '&v1.Endpoints{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"cluster.local-nfs-client-provisioner", GenerateName:"", Namespace:"nfs-client-provisioner", SelfLink:"", UID:"420983e9-525f-4f19-8178-4333c400aaf2", ResourceVersion:"442658895", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63726895467, loc:(*time.Location)(0x1956800)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"control-plane.alpha.kubernetes.io/leader":"{"holderIdentity":"nfs-client-provisioner-57f49ccf5b-jf8vd_0ee2b427-3878-11ee-8695-bab7f20f017d","leaseDurationSeconds":15,"acquireTime":"2023-08-11T18:51:36Z","renewTime":"2023-08-11T18:51:36Z","leaderTransitions":84}"}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Subsets:[]v1.EndpointSubset(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' 'nfs-client-provisioner-57f49ccf5b-jf8vd_0ee2b427-3878-11ee-8695-bab7f20f017d became leader'
I0811 18:51:36.614457 1 leaderelection.go:194] successfully acquired lease nfs-client-provisioner/cluster.local-nfs-client-provisioner
I0811 18:51:36.614507 1 controller.go:631] Starting provisioner controller cluster.local/nfs-client-provisioner_nfs-client-provisioner-57f49ccf5b-jf8vd_0ee2b427-3878-11ee-8695-bab7f20f017d!
I0811 18:51:36.714726 1 controller.go:680] Started provisioner controller cluster.local/nfs-client-provisioner_nfs-client-provisioner-57f49ccf5b-jf8vd_0ee2b427-3878-11ee-8695-bab7f20f017d!
I0811 18:51:36.715466 1 controller.go:987] provision "report/data-0" class "nfs-client": started
E0811 18:51:36.719407 1 controller.go:1004] provision "report/data-0" class "nfs-client": unexpected error getting claim reference: selfLink was empty, can't make reference
I0811 18:54:03.103016 1 controller.go:987] provision "default/test001-pvc" class "nfs-client": started
E0811 18:54:03.111321 1 controller.go:1004] provision "default/test001-pvc" class "nfs-client": unexpected error getting claim reference: selfLink was empty, can't make reference
I0811 19:06:36.619605 1 controller.go:987] provision "default/test001-pvc" class "nfs-client": started

pranav-pandey0804 · 2023-11-30T06:36:47Z

@arthurala
Hey, thanks for reporting this. It seems like the issue might be related to dynamic provisioning, and the PVCs are stuck in a PENDING state with a message indicating that cAdvisor is waiting for a volume to be created, either by the external provisioner or manually. I noticed that the StorageClass (nfs-client) is set as the default class. Have there been any recent changes to the NFS server or the provisioner?

pranav-pandey0804 · 2023-11-30T06:37:32Z

Also, it could be useful to check if there are any issues with the external provisioner cluster.local/nfs-client-provisioner and its leader election. The log snippet shows an unexpected error during provision, possibly linked to selfLink being empty.

k8s-triage-robot · 2024-02-28T07:09:30Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

BenTheElder · 2024-03-05T23:49:50Z

/remove-sig k8s-infra

https://github.com/kubernetes/community/tree/master/sig-k8s-infra/README.md

/sig storage

This is in the NFS provisioner

k8s-triage-robot · 2024-04-05T00:08:23Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-05-05T01:06:23Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-05-05T01:06:27Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

arthurala added the kind/bug Categorizes issue or PR as related to a bug. label Aug 11, 2023

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 11, 2023

k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 11, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2024

k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. labels Mar 5, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 5, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic provisioning stopped working after upgrade #119913

Dynamic provisioning stopped working after upgrade #119913

arthurala commented Aug 11, 2023

k8s-ci-robot commented Aug 11, 2023

arthurala commented Aug 11, 2023

arthurala commented Aug 14, 2023

arthurala commented Aug 14, 2023

pranav-pandey0804 commented Nov 30, 2023

pranav-pandey0804 commented Nov 30, 2023

k8s-triage-robot commented Feb 28, 2024

BenTheElder commented Mar 5, 2024

k8s-triage-robot commented Apr 5, 2024

k8s-triage-robot commented May 5, 2024

k8s-ci-robot commented May 5, 2024

Dynamic provisioning stopped working after upgrade #119913

Dynamic provisioning stopped working after upgrade #119913

Comments

arthurala commented Aug 11, 2023

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Aug 11, 2023

arthurala commented Aug 11, 2023

arthurala commented Aug 14, 2023

arthurala commented Aug 14, 2023

pranav-pandey0804 commented Nov 30, 2023

pranav-pandey0804 commented Nov 30, 2023

k8s-triage-robot commented Feb 28, 2024

BenTheElder commented Mar 5, 2024

k8s-triage-robot commented Apr 5, 2024

k8s-triage-robot commented May 5, 2024

k8s-ci-robot commented May 5, 2024