Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: propagate TransformFromStorage errors from List #69399

Merged
merged 1 commit into from
Oct 4, 2018

Conversation

mikedanese
Copy link
Member

@mikedanese mikedanese commented Oct 3, 2018

Like we do everywhere else we use TranformFromStorage. The current
behavior is causing all service account tokens to be regenerated,
invalidating old service account tokens and unrecoverably breaking apps
that are using InClusterConfig or exported service account tokens.

If we are going to break stuff, let's just break the Lists so that
misconfiguration of encryption config or checkpoint corruption are
obvious.

Please let me know what you think.

List operations against the API now return internal server errors instead of partially complete lists when a value cannot be transformed from storage. The updated behavior is consistent with all other operations that require transforming data from storage such as watch and get.

/sig api-machinery
/sig auth

@kubernetes/sig-api-machinery-pr-reviews

Like we do everywhere else we use TranformFromStorage. The current
behavior is causing all service account tokens to be regenerated,
invalidating old service account tokens and unrecoverably breaking apps
that are using InClusterConfig or exported service account tokens.

If we are going to break stuff, let's just break the Lists so that
misconfiguration of encryption config or checkpoint corruption are
obvious.
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/auth Categorizes an issue or PR as relevant to SIG Auth. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Oct 3, 2018
@smarterclayton
Copy link
Contributor

If we merge this, will working but degraded clusters suddenly regress when upgraded?

@smarterclayton
Copy link
Contributor

Original justification was the concern about partial degradation of a list due to an external factor not under our control (encryption service). But in practice, if your encryption config is broken, your cluster probably should grind to a halt.

@mikedanese mikedanese added the kind/bug Categorizes issue or PR as related to a bug. label Oct 3, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-kind Indicates a PR lacks a `kind/foo` label and requires one. label Oct 3, 2018
@mikedanese
Copy link
Member Author

If we merge this, will working but degraded clusters suddenly regress when upgraded?

Yes.

Original justification was the concern about partial degradation of a list due to an external factor not under our control (encryption service).

Speaking more generally than Secrets, this behavior breaks what I would expect to be the semantics of ListWatch. If an error returned by TransformFromStorage is transient, two fully synced informer caches could have different state.

@lavalamp
Copy link
Member

lavalamp commented Oct 3, 2018

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 3, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, mikedanese

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 3, 2018
@lavalamp
Copy link
Member

lavalamp commented Oct 3, 2018

I'd LGTM a cherrypick of this if I saw one.

@mikedanese
Copy link
Member Author

Failures caused by accidental bump of go to 1.11. That should be reverted now.

/retest

@mikedanese
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit 6e88271 into kubernetes:master Oct 4, 2018
@mikedanese mikedanese deleted the consistent branch October 4, 2018 06:47
@smarterclayton
Copy link
Contributor

That's a really good point. I think for that reason I agree.

However, we need to figure out how we're going to roll this out to potentially broken users - i.e. how do they mitigate (probably by deleting keys in etcd).

@paphillon
Copy link

@smarterclayton - So we did encounter this issue today while upgrading to 1.13.x, luckily we caught it really early and rolled back to 1.12. I believe when the cluster was created there might have been some errors and a re-setup was done but the etcd was not cleared. This caused some servicetokens from kube-system and default namespace that are autogenerated were encrypted with the old key.

Right now the plan is to delete these keys (servicetokens) from the etcd and try again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants