-
Notifications
You must be signed in to change notification settings - Fork 40.6k
Proposal for external dynamic provisioners #30285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for external dynamic provisioners #30285
Conversation
|
||
## Design | ||
|
||
This design represents the minimally viable changes required to provision based on storage classe configuration. Additional incremental features may be added as a separte effort. | ||
|
||
We propose that: | ||
|
||
1. For the base impelementation storage class and volume selectors are mutually exclusive. | ||
1. For the base implementation storage class and volume selectors are mutually exclusive for now. | ||
|
||
2. An api object will be incubated in extensions/v1beta1 named `storage` to hold the a `StorageClass` | ||
API resource. Each StorageClass object contains parameters required by the provisioner to provision volumes of that class. These parameters are opaque to the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opaque to the user or to the system? The user (admin?) would know what 'gp2' is but the system wouldn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user as in the user creating the PVC.
0. Kubernetes administator can configure name of a default StorageClass. This | ||
StorageClass instance is then used when user requests a dynamically | ||
provisioned volume, but does not specify a StorageClass. In other words, | ||
`claim.Spec.Class == ""` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this potentially be expanded to something like Params
?
Assume that StorageClass is the correct place to configure whether or not a volume is encrypted. We need a way to pass an encryption key to the provisioner.
A new SecretType
can be added to manage these keys for the user. The PVC would therefore need both claim.Spec.Class
and claim.Spec.Secret
as params to the provisioner.
All GCE volumes are encrypted by default, so an empty Secret means GCE creates its own keypair. AWS is optionally encrypted, so an empty key might mean "no encryption" or it might mean "generate a keypair".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the use case for secret is along the lines of control API credentials used during the creation of the asset (cloud credentials, admin RBD secret, login to fibre channel provisioning API etc..) a secret as a parameter to provisioner used to encrypt the volume or similar would result in every volume created by that provisioner using the same secret.
a better use case is the provisioner creating a unique secret for every provisioned asset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my suggestion above, I was saying each PVC would supply a Secret so that each volume is encrypted with a specific key the user controls.
For OpenShift Online, it would be desirable for each user to supply the keys they want to use and we'd use that key to encrypt the volume. Storing the keys in a Secret makes sense, though we need to work out how that key gets into AWS/GCE (keys are referenced by name in the EBS API).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original docs/proposals/volume-provisioning.md specifies that the provisioned PV can be configured by PVC labels. Concrete implementation is not written yet, but I expect something like
claim.Labels['kubernetes.io/aws-ebs/zone"] = "us-east-1d"
. The provisioner then evaluates both claim.Class
and claim.Labels
. It would work the same for hypothetical encryption key ID or anything that the provisioner implements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked with our Ops team and they said a key per namespace was not necessary. End user supplied keys was too much.
Instead, 1 key per StorageClass where admins manually create the keys is sufficient.
cc @thockin |
as internal provisioners do and user gets the same experience. | ||
|
||
* The provisioner must implement also a deleter. It must be able to delete | ||
storage assets it created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as proposed below, the same plugin / external provisioner would be used to delete volumes.
We could add a separate annotation that provisioner could use to select another deleter, but do we need it?
Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test". This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry. |
|
||
1. If the create operation for the `api.PersistentVolume` fails, it is | ||
retried | ||
5. If the deleter is not known to Kubernetes, it does not do anything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the new annotation was the only change needed to implement this proposal, this is another one since right now kubernetes sets PV to failed if it can't find a deleter.
Just noting because I know I said annotation's the only change a couple times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cb44799
to
5082776
Compare
d82402d
to
96c643e
Compare
|
GCE e2e build/test passed for commit d82402d06f4e0a859df9859cdbadfbd199a4b96e. |
`api.PersistentVolume`, fills its `Class` attribute with `claim.Spec.Class` | ||
and makes it already bound to the claim | ||
|
||
1. If the create operation for the `api.PersistentVolume` fails, it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the numbering here and blow correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
markdown expands it to different levels, i.e. this "1." is rendered as "a.", see "rich diff" at https://github.com/kubernetes/kubernetes/pull/30285/files?short_path=3cb6e02#diff-3cb6e02c12e0f637a83317ce28eb2c8c
Please report any problems with wrong cross-references, I might have missed some.. And yes, it's hard to review from looking at the .md file, better review rendered markdown or rich diff.
|
||
7. If `Provision` returns an error, the controller generates an event on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the provisioner be blacklisted if it keeps trashing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave such optimization to future when we see they're needed. The first implementation would IMO work well without it.
@thockin ping? |
5813ff2
to
979b890
Compare
Jenkins unit/integration failed for commit 979b890. Full PR test history. The magic incantation to run this job again is |
Kubernetes was not blocked in any way during the provisioning and could | ||
either bound the claim to another PV that was created by user or even the | ||
claim may have been deleted by the user. In both cases, Kubernetes will mark | ||
the PV to be delete using the protocol below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of other race conditions here, I think, as hinted at by the paragraph below. Specifically,
- The claim can be changed to point to a storage class that uses a different provisioner.
- The claim can be resized to request more storage than it originally did.
- The claim can be resized to request less storage than it originally did.
Item 3 is a non-issue (the claim will bind to the PVC), but items 1 and 2 will likely cause problems. I'm unsure what the desired semantics for item 1 should be, as there'll be nothing to stop the binding from occurring, even though the user no longer wants to use the external provisioner. In the case of item 2, the binding won't happen, and the incorrectly-sized volume should probably be deleted and the binding retried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- this is indeed possible in rare cases, however if two provisioners create a PV for one PVC, the PV controller will bind the PVC to one of them and it will mark the "loosing" one for deletion.
2-3) we do not support PVC resizing (yet), PVC are be immutable after creation.
|
||
* `pv.Spec.PersistentVolumeReclaimPolicy` SHOULD be set to `Delete` unless | ||
user manually configures other reclaim policy. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an explicit mechanism for doing this that all external provisioners should support, like an annotation in the PVC? Should this also apply to internal provisioners?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not there yet, maybe in the future.
`pv.Status.Phase == Released && pv.Annotations['pv.kubernetes.io/provisioned-by'] == <deleter name>`, | ||
the deleter: | ||
|
||
* It MUST delete the storage asset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to contradict line 260, which implies that other reclaim options should be possible. In cases where the policy is set to "Retain", a useful approach would be to retain the storage asset until the PV itself was deleted, rather than the PVC. "Recycle" should probably have similar semantics to its current behavior, although the recycling process could be left to the plugin itself, possibly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, the delete should check reclaim policy first. I'll add a note about it.
979b890
to
6bdf5a8
Compare
Jenkins GCI GKE smoke e2e failed for commit 6bdf5a8. Full PR test history. The magic incantation to run this job again is |
deleter SHOULD retry to delete the volume periodically. | ||
|
||
* The deleter SHOULD NOT use any information from StorageClass instance | ||
referenced by the PV. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsafrane are you referring deleter of the external provisioner here ? if yes, we may require the storage class parameters at time of deleter() invocation. Isnt it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to move away from StorageClass to be present at the time of deletion. The only reason we need it there is for internal provisioners and their handling of secrets. External provisioners can have secrets configured in a different way, e.g. as Secret instance attached to the pod where the provisioner runs. So, let's suggest that the class may be missing at the time when a deleter is invoked. I'll add some text about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsafrane thanks, but then, if there are more than one storage class, the external provisioner need to keep track of the parameters of each SC internally and fetch and use it whenever a delete request comes in. iic on this assumption, it will be difficult to maintain the pointers to SC and PVs when deleter is invoked in the external provisioners . Isnt it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the provisioner still can pass parameters to deleter via PV annotations. Just don't pass pointers to secret this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsafrane thanks for the clarification ! that make sense :)
claim and goes back to step 1., i.e. it will retry provisioning periodically | ||
The external provisioner runs in a separate process which watches claims, be | ||
it an external storage appliance, a daemon or a Kubernetes pod. For every | ||
claim creation or update, it implements these steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsafrane if its running as a pod, do we need to make sure the pod has certain privileges to serve the claims ? I mean, the access to the controller or the api server or something of that sort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pod's privileges are up to the storage admin & external provisioner author and shouldn't need accounting for in the binding controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @childsb .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably call out some lessons learned on security -
- Do not store security sensitive properties on the PVC.
What happens when a PV is released belonging to a storageClass thats been deleted?
|
||
## Design | ||
|
||
This design represents the minimally viable changes required to provision based on storage class configuration. Additional incremental features may be added as a separate effort. | ||
|
||
We propose that: | ||
|
||
1. For the base impelementation storage class and volume selectors are mutually exclusive. | ||
1. For the base implementation storage class and volume selectors are mutually exclusive for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could relax or undefine the behavior for label selectors and storage classes for out of tree provisioners. The only thing that needs to be enforced is that PVC.size < PV.size, PV.storageClass = PVC.storageClass and PVC.selector matches PV.labels.
None of the in-tree provisioners support labels+storageClass at this time, but there's no real way we can enforce out-of-tree behavior for storageClass and label selectors for provisioners.
Possible update: "For the in-tree storage provisioners, a PVC with labelSelectors is ignored for provisioning. For out of tree provisioners, the PV created by the provisioner must be labeled such that it matches the PVC selector."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks for a nice sentence
Added "Security considerations" paragraph. |
6bdf5a8
to
f30834e
Compare
to achieve, it SHOULD report an error and it MUST NOT provision a volume. | ||
All errors found during parsing or provisioning SHOULD be send as events | ||
on the claim and the provisioner SHOULD retry periodically with step i. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specification says "for in-tree provisioners a PVC with labelSelectors is ignored for provisioning" and "claim.Spec.Selector and claim.Spec.Class are mutually exclusive. User can either match existing volumes with Selector XOR match existing volumes with Class and get dynamic provisioning by using Class." However, external provisioners "MUST parse arguments in the StorageClass and claim.Spec.Selector and provisions appropriate storage asset that matches both the parameters and the selector." Why is there a distinction between in-tree and external provisioners? Can external provisioners ignore PVCs with labelSelectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the text is different for internal and external provisioners, however the result is the same: if there is a selector in a PVC, the provisioner must provision something that matches both the class and the selector. In-tree provisioners are stupid and don't parse selectors right now (this will change in future, 1.6?) and therefore refuse to provision anything with any selector. I would advise external provisoners do the same - either refuse to provision a claim with any selector or explicitly select few labels that are allowed in a PVC selector and document those as additional way how to configure a volume. Therefore "MUST parse claim.Spec.Selector" may be just if pvc.Spec.Selector != nil { return Error("can't parse selector!") }
I'll try to capture this idea in the proposal...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new item here: https://github.com/kubernetes/kubernetes/pull/30285/files#diff-3cb6e02c12e0f637a83317ce28eb2c8cR80
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification! Is using selector as a means to passing arguments to provisioners an interim capability (say to support dynamic provisioning for releases before 1.4 where StorageClass abstraction didn't exist) or a capability planned for the long term?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For internal provisioners (GCE/AWS/Cinder) it's a feature we'd like introduce eventually, maybe in 1.6 (patches welcome!), external provisioner can go wild and parse selectors as they want, Kubernetes will not prevent them in doing so in any release.
f30834e
to
9519753
Compare
As mentioned in last weeks storage SIG i'm marking this for merge as reviewed by the community. Thanks everyone. |
Automatic merge from submit-queue |
implementation of the proposal: #35957 |
LGTM |
…r-proposal Automatic merge from submit-queue Proposal for external dynamic provisioners @kubernetes/sig-storage
Automatic merge from submit-queue (batch tested with PRs 40574, 40806, 40308, 40771, 39440) Add bootstrap cluster role for external pv provisioners The set of permissions an external provisioner #30285 running as a pod will need. Technically in order to dynamically provision PVs one doesn't need to "update" PVCs or "watch" events but the controller https://github.com/kubernetes-incubator/nfs-provisioner/tree/master/controller we are recommending people use does those things to: set lock annotations on PVCs and watch `ProvisioningSucceeded`/`ProvisioningFailed` events. Some external provisioners may need additional permissions, for example nfs-provisioner requires "get" access to Services and Endpoints when run "statefully." I think in that case we would recommend creating a new ClusterRole specific to that provisioner, using this as a base? (This was to be a part of my redo/fix of the external e2e test #39545 but I'm submitting it as a separate PR for now due to some issues I had with running nfs-provisioner on gce.) @kubernetes/sig-auth-misc ?
Automatic merge from submit-queue (batch tested with PRs 40574, 40806, 40308, 40771, 39440) Add bootstrap cluster role for external pv provisioners The set of permissions an external provisioner kubernetes/kubernetes#30285 running as a pod will need. Technically in order to dynamically provision PVs one doesn't need to "update" PVCs or "watch" events but the controller https://github.com/kubernetes-incubator/nfs-provisioner/tree/master/controller we are recommending people use does those things to: set lock annotations on PVCs and watch `ProvisioningSucceeded`/`ProvisioningFailed` events. Some external provisioners may need additional permissions, for example nfs-provisioner requires "get" access to Services and Endpoints when run "statefully." I think in that case we would recommend creating a new ClusterRole specific to that provisioner, using this as a base? (This was to be a part of my redo/fix of the external e2e test kubernetes/kubernetes#39545 but I'm submitting it as a separate PR for now due to some issues I had with running nfs-provisioner on gce.) @kubernetes/sig-auth-misc ?
@kubernetes/sig-storage
This change is