Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic provisioning stopped working after upgrade #119913

Closed
arthurala opened this issue Aug 11, 2023 · 11 comments
Closed

Dynamic provisioning stopped working after upgrade #119913

arthurala opened this issue Aug 11, 2023 · 11 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@arthurala
Copy link

What happened?

Deployments that require PVC stopped working. I thought it the chart but then tested using simple PVC yaml and it also failed to spin up a PV.

Did a describe on the PVC and it shows status of PENDING
Checked the events and it showed:
MESSAGE:
waiting for a volume to be created, either by external provisioner "cluster.local/nfs-client-provisioner" or manually created by system administrator

OUTPUT from / kubectl get sc -o yaml

apiVersion: v1
items:

  • allowVolumeExpansion: true
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
    annotations:
    storageclass.kubernetes.io/is-default-class: "true"
    creationTimestamp: "2020-06-04T19:24:24Z"
    labels:
    app: nfs-client-provisioner
    chart: nfs-client-provisioner-1.2.8
    heritage: Tiller
    io.cattle.field/appId: nfs-client-provisioner
    release: nfs-client-provisioner
    name: nfs-client
    resourceVersion: "442718644"
    uid: b607cb9e-f49c-4407-827c-4014dddba287
    mountOptions:
    • nfsvers=4.1
      parameters:
      archiveOnDelete: "true"
      provisioner: cluster.local/nfs-client-provisioner
      reclaimPolicy: Delete
      volumeBindingMode: Immediate
      kind: List
      metadata:
      resourceVersion: ""
      selfLink: ""

We do NOT know when this issue started because most of the deployments don't require PVC. So we aren't sure if this happened after the Rancher and K8S upgrade

What did you expect to happen?

Default behavior of Dynamic Provisioning to spin up a PV for a PVC if one doesn't already exist.

How can we reproduce it (as minimally and precisely as possible)?

unknown

Anything else we need to know?

Rancher: 2.6.12
Docker: 19.3.11

Kubernetes version

$ kubectl version
# paste output here
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.8"

Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.14"

Cloud provider

NA

OS version

# On Linux:
$ cat /etc/os-release
# paste output here

$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
NAME="CentOS Linux" VERSION="7 (Core)" Linux 3.10.0-1160.88.1.el7.x86_64 #1 SMP Tue Mar 7 15:41:52 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@arthurala arthurala added the kind/bug Categorizes issue or PR as related to a bug. label Aug 11, 2023
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 11, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@arthurala
Copy link
Author

/sig k8s-infra

@k8s-ci-robot k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 11, 2023
@arthurala
Copy link
Author

does anyone have insight into issue or experienced same problem?

@arthurala
Copy link
Author

Provision log =
I0811 18:51:19.203915 1 leaderelection.go:185] attempting to acquire leader lease nfs-client-provisioner/cluster.local-nfs-client-provisioner...
E0811 18:51:36.614372 1 event.go:259] Could not construct reference to: '&v1.Endpoints{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"cluster.local-nfs-client-provisioner", GenerateName:"", Namespace:"nfs-client-provisioner", SelfLink:"", UID:"420983e9-525f-4f19-8178-4333c400aaf2", ResourceVersion:"442658895", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63726895467, loc:(*time.Location)(0x1956800)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"control-plane.alpha.kubernetes.io/leader":"{"holderIdentity":"nfs-client-provisioner-57f49ccf5b-jf8vd_0ee2b427-3878-11ee-8695-bab7f20f017d","leaseDurationSeconds":15,"acquireTime":"2023-08-11T18:51:36Z","renewTime":"2023-08-11T18:51:36Z","leaderTransitions":84}"}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Subsets:[]v1.EndpointSubset(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' 'nfs-client-provisioner-57f49ccf5b-jf8vd_0ee2b427-3878-11ee-8695-bab7f20f017d became leader'
I0811 18:51:36.614457 1 leaderelection.go:194] successfully acquired lease nfs-client-provisioner/cluster.local-nfs-client-provisioner
I0811 18:51:36.614507 1 controller.go:631] Starting provisioner controller cluster.local/nfs-client-provisioner_nfs-client-provisioner-57f49ccf5b-jf8vd_0ee2b427-3878-11ee-8695-bab7f20f017d!
I0811 18:51:36.714726 1 controller.go:680] Started provisioner controller cluster.local/nfs-client-provisioner_nfs-client-provisioner-57f49ccf5b-jf8vd_0ee2b427-3878-11ee-8695-bab7f20f017d!
I0811 18:51:36.715466 1 controller.go:987] provision "report/data-0" class "nfs-client": started
E0811 18:51:36.719407 1 controller.go:1004] provision "report/data-0" class "nfs-client": unexpected error getting claim reference: selfLink was empty, can't make reference
I0811 18:54:03.103016 1 controller.go:987] provision "default/test001-pvc" class "nfs-client": started
E0811 18:54:03.111321 1 controller.go:1004] provision "default/test001-pvc" class "nfs-client": unexpected error getting claim reference: selfLink was empty, can't make reference
I0811 19:06:36.619605 1 controller.go:987] provision "default/test001-pvc" class "nfs-client": started

@pranav-pandey0804
Copy link

@arthurala
Hey, thanks for reporting this. It seems like the issue might be related to dynamic provisioning, and the PVCs are stuck in a PENDING state with a message indicating that cAdvisor is waiting for a volume to be created, either by the external provisioner or manually. I noticed that the StorageClass (nfs-client) is set as the default class. Have there been any recent changes to the NFS server or the provisioner?

@pranav-pandey0804
Copy link

Also, it could be useful to check if there are any issues with the external provisioner cluster.local/nfs-client-provisioner and its leader election. The log snippet shows an unexpected error during provision, possibly linked to selfLink being empty.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2024
@BenTheElder
Copy link
Member

/remove-sig k8s-infra

https://github.com/kubernetes/community/tree/master/sig-k8s-infra/README.md

/sig storage

This is in the NFS provisioner

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. labels Mar 5, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 5, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

5 participants