Skip to content

Proposal for external dynamic provisioners #30285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jsafrane
Copy link
Member

@jsafrane jsafrane commented Aug 9, 2016

@kubernetes/sig-storage


This change is Reviewable


## Design

This design represents the minimally viable changes required to provision based on storage classe configuration. Additional incremental features may be added as a separte effort.

We propose that:

1. For the base impelementation storage class and volume selectors are mutually exclusive.
1. For the base implementation storage class and volume selectors are mutually exclusive for now.

2. An api object will be incubated in extensions/v1beta1 named `storage` to hold the a `StorageClass`
API resource. Each StorageClass object contains parameters required by the provisioner to provision volumes of that class. These parameters are opaque to the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opaque to the user or to the system? The user (admin?) would know what 'gp2' is but the system wouldn't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user as in the user creating the PVC.

@k8s-github-robot k8s-github-robot added kind/design Categorizes issue or PR as related to design. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Aug 9, 2016
0. Kubernetes administator can configure name of a default StorageClass. This
StorageClass instance is then used when user requests a dynamically
provisioned volume, but does not specify a StorageClass. In other words,
`claim.Spec.Class == ""`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this potentially be expanded to something like Params?

Assume that StorageClass is the correct place to configure whether or not a volume is encrypted. We need a way to pass an encryption key to the provisioner.

A new SecretType can be added to manage these keys for the user. The PVC would therefore need both claim.Spec.Class and claim.Spec.Secret as params to the provisioner.

All GCE volumes are encrypted by default, so an empty Secret means GCE creates its own keypair. AWS is optionally encrypted, so an empty key might mean "no encryption" or it might mean "generate a keypair".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the use case for secret is along the lines of control API credentials used during the creation of the asset (cloud credentials, admin RBD secret, login to fibre channel provisioning API etc..) a secret as a parameter to provisioner used to encrypt the volume or similar would result in every volume created by that provisioner using the same secret.

a better use case is the provisioner creating a unique secret for every provisioned asset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my suggestion above, I was saying each PVC would supply a Secret so that each volume is encrypted with a specific key the user controls.

For OpenShift Online, it would be desirable for each user to supply the keys they want to use and we'd use that key to encrypt the volume. Storing the keys in a Secret makes sense, though we need to work out how that key gets into AWS/GCE (keys are referenced by name in the EBS API).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original docs/proposals/volume-provisioning.md specifies that the provisioned PV can be configured by PVC labels. Concrete implementation is not written yet, but I expect something like
claim.Labels['kubernetes.io/aws-ebs/zone"] = "us-east-1d". The provisioner then evaluates both claim.Class and claim.Labels. It would work the same for hypothetical encryption key ID or anything that the provisioner implements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked with our Ops team and they said a key per namespace was not necessary. End user supplied keys was too much.

Instead, 1 key per StorageClass where admins manually create the keys is sufficient.

@bgrant0607 bgrant0607 assigned saad-ali and matchstick and unassigned bgrant0607 Aug 9, 2016
@bgrant0607
Copy link
Member

cc @thockin

as internal provisioners do and user gets the same experience.

* The provisioner must implement also a deleter. It must be able to delete
storage assets it created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as proposed below, the same plugin / external provisioner would be used to delete volumes.

We could add a separate annotation that provisioner could use to select another deleter, but do we need it?

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 18, 2016
@k8s-bot
Copy link

k8s-bot commented Aug 30, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.


1. If the create operation for the `api.PersistentVolume` fails, it is
retried
5. If the deleter is not known to Kubernetes, it does not do anything.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the new annotation was the only change needed to implement this proposal, this is another one since right now kubernetes sets PV to failed if it can't find a deleter.

Just noting because I know I said annotation's the only change a couple times

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wongma7, good chatch. I filled pr #32565 not to report any error when external deleter is requested.

@jsafrane jsafrane force-pushed the external-provisioner-proposal branch from cb44799 to 5082776 Compare September 19, 2016 10:59
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 19, 2016
@jsafrane jsafrane force-pushed the external-provisioner-proposal branch 3 times, most recently from d82402d to 96c643e Compare September 19, 2016 11:41
@jsafrane
Copy link
Member Author

  • rebased
  • reformulated the protocol between external provisioners and Kubernetes to be very strict
  • added an example PVC+PV

@k8s-bot
Copy link

k8s-bot commented Sep 19, 2016

GCE e2e build/test passed for commit d82402d06f4e0a859df9859cdbadfbd199a4b96e.

`api.PersistentVolume`, fills its `Class` attribute with `claim.Spec.Class`
and makes it already bound to the claim

1. If the create operation for the `api.PersistentVolume` fails, it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the numbering here and blow correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markdown expands it to different levels, i.e. this "1." is rendered as "a.", see "rich diff" at https://github.com/kubernetes/kubernetes/pull/30285/files?short_path=3cb6e02#diff-3cb6e02c12e0f637a83317ce28eb2c8c

Please report any problems with wrong cross-references, I might have missed some.. And yes, it's hard to review from looking at the .md file, better review rendered markdown or rich diff.


7. If `Provision` returns an error, the controller generates an event on the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the provisioner be blacklisted if it keeps trashing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave such optimization to future when we see they're needed. The first implementation would IMO work well without it.

@jsafrane
Copy link
Member Author

@thockin ping?

@jsafrane jsafrane added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Sep 27, 2016
@jsafrane jsafrane force-pushed the external-provisioner-proposal branch from 5813ff2 to 979b890 Compare October 4, 2016 11:28
@k8s-github-robot k8s-github-robot added kind/design Categorizes issue or PR as related to design. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 4, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins unit/integration failed for commit 979b890. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

Kubernetes was not blocked in any way during the provisioning and could
either bound the claim to another PV that was created by user or even the
claim may have been deleted by the user. In both cases, Kubernetes will mark
the PV to be delete using the protocol below.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of other race conditions here, I think, as hinted at by the paragraph below. Specifically,

  1. The claim can be changed to point to a storage class that uses a different provisioner.
  2. The claim can be resized to request more storage than it originally did.
  3. The claim can be resized to request less storage than it originally did.

Item 3 is a non-issue (the claim will bind to the PVC), but items 1 and 2 will likely cause problems. I'm unsure what the desired semantics for item 1 should be, as there'll be nothing to stop the binding from occurring, even though the user no longer wants to use the external provisioner. In the case of item 2, the binding won't happen, and the incorrectly-sized volume should probably be deleted and the binding retried.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. this is indeed possible in rare cases, however if two provisioners create a PV for one PVC, the PV controller will bind the PVC to one of them and it will mark the "loosing" one for deletion.

2-3) we do not support PVC resizing (yet), PVC are be immutable after creation.


* `pv.Spec.PersistentVolumeReclaimPolicy` SHOULD be set to `Delete` unless
user manually configures other reclaim policy.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an explicit mechanism for doing this that all external provisioners should support, like an annotation in the PVC? Should this also apply to internal provisioners?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not there yet, maybe in the future.

`pv.Status.Phase == Released && pv.Annotations['pv.kubernetes.io/provisioned-by'] == <deleter name>`,
the deleter:

* It MUST delete the storage asset.
Copy link

@cdragga cdragga Oct 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to contradict line 260, which implies that other reclaim options should be possible. In cases where the policy is set to "Retain", a useful approach would be to retain the storage asset until the PV itself was deleted, rather than the PVC. "Recycle" should probably have similar semantics to its current behavior, although the recycling process could be left to the plugin itself, possibly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, the delete should check reclaim policy first. I'll add a note about it.

@jsafrane jsafrane force-pushed the external-provisioner-proposal branch from 979b890 to 6bdf5a8 Compare October 19, 2016 10:05
@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GKE smoke e2e failed for commit 6bdf5a8. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

deleter SHOULD retry to delete the volume periodically.

* The deleter SHOULD NOT use any information from StorageClass instance
referenced by the PV.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsafrane are you referring deleter of the external provisioner here ? if yes, we may require the storage class parameters at time of deleter() invocation. Isnt it ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to move away from StorageClass to be present at the time of deletion. The only reason we need it there is for internal provisioners and their handling of secrets. External provisioners can have secrets configured in a different way, e.g. as Secret instance attached to the pod where the provisioner runs. So, let's suggest that the class may be missing at the time when a deleter is invoked. I'll add some text about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsafrane thanks, but then, if there are more than one storage class, the external provisioner need to keep track of the parameters of each SC internally and fetch and use it whenever a delete request comes in. iic on this assumption, it will be difficult to maintain the pointers to SC and PVs when deleter is invoked in the external provisioners . Isnt it ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the provisioner still can pass parameters to deleter via PV annotations. Just don't pass pointers to secret this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsafrane thanks for the clarification ! that make sense :)

claim and goes back to step 1., i.e. it will retry provisioning periodically
The external provisioner runs in a separate process which watches claims, be
it an external storage appliance, a daemon or a Kubernetes pod. For every
claim creation or update, it implements these steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsafrane if its running as a pod, do we need to make sure the pod has certain privileges to serve the claims ? I mean, the access to the controller or the api server or something of that sort?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pod's privileges are up to the storage admin & external provisioner author and shouldn't need accounting for in the binding controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @childsb .

@childsb childsb self-assigned this Oct 25, 2016
Copy link
Contributor

@childsb childsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably call out some lessons learned on security -

  • Do not store security sensitive properties on the PVC.

What happens when a PV is released belonging to a storageClass thats been deleted?


## Design

This design represents the minimally viable changes required to provision based on storage class configuration. Additional incremental features may be added as a separate effort.

We propose that:

1. For the base impelementation storage class and volume selectors are mutually exclusive.
1. For the base implementation storage class and volume selectors are mutually exclusive for now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could relax or undefine the behavior for label selectors and storage classes for out of tree provisioners. The only thing that needs to be enforced is that PVC.size < PV.size, PV.storageClass = PVC.storageClass and PVC.selector matches PV.labels.

None of the in-tree provisioners support labels+storageClass at this time, but there's no real way we can enforce out-of-tree behavior for storageClass and label selectors for provisioners.

Possible update: "For the in-tree storage provisioners, a PVC with labelSelectors is ignored for provisioning. For out of tree provisioners, the PV created by the provisioner must be labeled such that it matches the PVC selector."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks for a nice sentence

@jsafrane
Copy link
Member Author

Do not store security sensitive properties on the PVC.

Added "Security considerations" paragraph.

@jsafrane jsafrane force-pushed the external-provisioner-proposal branch from 6bdf5a8 to f30834e Compare October 26, 2016 10:46
to achieve, it SHOULD report an error and it MUST NOT provision a volume.
All errors found during parsing or provisioning SHOULD be send as events
on the claim and the provisioner SHOULD retry periodically with step i.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specification says "for in-tree provisioners a PVC with labelSelectors is ignored for provisioning" and "claim.Spec.Selector and claim.Spec.Class are mutually exclusive. User can either match existing volumes with Selector XOR match existing volumes with Class and get dynamic provisioning by using Class." However, external provisioners "MUST parse arguments in the StorageClass and claim.Spec.Selector and provisions appropriate storage asset that matches both the parameters and the selector." Why is there a distinction between in-tree and external provisioners? Can external provisioners ignore PVCs with labelSelectors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the text is different for internal and external provisioners, however the result is the same: if there is a selector in a PVC, the provisioner must provision something that matches both the class and the selector. In-tree provisioners are stupid and don't parse selectors right now (this will change in future, 1.6?) and therefore refuse to provision anything with any selector. I would advise external provisoners do the same - either refuse to provision a claim with any selector or explicitly select few labels that are allowed in a PVC selector and document those as additional way how to configure a volume. Therefore "MUST parse claim.Spec.Selector" may be just if pvc.Spec.Selector != nil { return Error("can't parse selector!") }

I'll try to capture this idea in the proposal...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! Is using selector as a means to passing arguments to provisioners an interim capability (say to support dynamic provisioning for releases before 1.4 where StorageClass abstraction didn't exist) or a capability planned for the long term?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For internal provisioners (GCE/AWS/Cinder) it's a feature we'd like introduce eventually, maybe in 1.6 (patches welcome!), external provisioner can go wild and parse selectors as they want, Kubernetes will not prevent them in doing so in any release.

@jsafrane jsafrane force-pushed the external-provisioner-proposal branch from f30834e to 9519753 Compare October 27, 2016 08:28
@childsb
Copy link
Contributor

childsb commented Oct 31, 2016

As mentioned in last weeks storage SIG i'm marking this for merge as reviewed by the community. Thanks everyone.

@childsb childsb added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 31, 2016
@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 3e3ff44 into kubernetes:master Oct 31, 2016
@jsafrane
Copy link
Member Author

jsafrane commented Nov 1, 2016

implementation of the proposal: #35957

@thockin
Copy link
Member

thockin commented Nov 7, 2016

LGTM

xingzhou pushed a commit to xingzhou/kubernetes that referenced this pull request Dec 15, 2016
…r-proposal

Automatic merge from submit-queue

Proposal for external dynamic provisioners

@kubernetes/sig-storage
k8s-github-robot pushed a commit that referenced this pull request Feb 2, 2017
Automatic merge from submit-queue (batch tested with PRs 40574, 40806, 40308, 40771, 39440)

Add bootstrap cluster role for external pv provisioners

The set of permissions an external provisioner #30285 running as a pod will need. Technically in order to dynamically provision PVs one doesn't need to "update" PVCs or "watch" events but the controller https://github.com/kubernetes-incubator/nfs-provisioner/tree/master/controller we are recommending people use does those things to: set lock annotations on PVCs and watch `ProvisioningSucceeded`/`ProvisioningFailed` events.

Some external provisioners may need additional permissions, for example nfs-provisioner requires "get" access to Services and Endpoints when run "statefully." I think in that case we would recommend creating a new ClusterRole specific to that provisioner, using this as a base?

(This was to be a part of my redo/fix of the external e2e test #39545 but I'm submitting it as a separate PR for now due to some issues I had with running nfs-provisioner on gce.)

@kubernetes/sig-auth-misc ?
tamalsaha pushed a commit to kmodules/authorizer that referenced this pull request Nov 17, 2021
Automatic merge from submit-queue (batch tested with PRs 40574, 40806, 40308, 40771, 39440)

Add bootstrap cluster role for external pv provisioners

The set of permissions an external provisioner kubernetes/kubernetes#30285 running as a pod will need. Technically in order to dynamically provision PVs one doesn't need to "update" PVCs or "watch" events but the controller https://github.com/kubernetes-incubator/nfs-provisioner/tree/master/controller we are recommending people use does those things to: set lock annotations on PVCs and watch `ProvisioningSucceeded`/`ProvisioningFailed` events.

Some external provisioners may need additional permissions, for example nfs-provisioner requires "get" access to Services and Endpoints when run "statefully." I think in that case we would recommend creating a new ClusterRole specific to that provisioner, using this as a base?

(This was to be a part of my redo/fix of the external e2e test kubernetes/kubernetes#39545 but I'm submitting it as a separate PR for now due to some issues I had with running nfs-provisioner on gce.)

@kubernetes/sig-auth-misc ?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.