storage: propagate TransformFromStorage errors from List #69399

mikedanese · 2018-10-03T20:45:47Z

Like we do everywhere else we use TranformFromStorage. The current
behavior is causing all service account tokens to be regenerated,
invalidating old service account tokens and unrecoverably breaking apps
that are using InClusterConfig or exported service account tokens.

If we are going to break stuff, let's just break the Lists so that
misconfiguration of encryption config or checkpoint corruption are
obvious.

Please let me know what you think.

List operations against the API now return internal server errors instead of partially complete lists when a value cannot be transformed from storage. The updated behavior is consistent with all other operations that require transforming data from storage such as watch and get.

/sig api-machinery
/sig auth

@kubernetes/sig-api-machinery-pr-reviews

Like we do everywhere else we use TranformFromStorage. The current behavior is causing all service account tokens to be regenerated, invalidating old service account tokens and unrecoverably breaking apps that are using InClusterConfig or exported service account tokens. If we are going to break stuff, let's just break the Lists so that misconfiguration of encryption config or checkpoint corruption are obvious.

smarterclayton · 2018-10-03T20:46:28Z

If we merge this, will working but degraded clusters suddenly regress when upgraded?

smarterclayton · 2018-10-03T20:47:02Z

Original justification was the concern about partial degradation of a list due to an external factor not under our control (encryption service). But in practice, if your encryption config is broken, your cluster probably should grind to a halt.

mikedanese · 2018-10-03T20:57:38Z

If we merge this, will working but degraded clusters suddenly regress when upgraded?

Yes.

Original justification was the concern about partial degradation of a list due to an external factor not under our control (encryption service).

Speaking more generally than Secrets, this behavior breaks what I would expect to be the semantics of ListWatch. If an error returned by TransformFromStorage is transient, two fully synced informer caches could have different state.

lavalamp · 2018-10-03T21:53:11Z

/lgtm
/approve

k8s-ci-robot · 2018-10-03T21:53:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, mikedanese

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/pkg/storage/OWNERS~~ [lavalamp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lavalamp · 2018-10-03T21:57:29Z

I'd LGTM a cherrypick of this if I saw one.

mikedanese · 2018-10-03T23:52:51Z

Failures caused by accidental bump of go to 1.11. That should be reverted now.

/retest

mikedanese · 2018-10-04T04:01:38Z

/retest

smarterclayton · 2018-10-04T14:41:16Z

That's a really good point. I think for that reason I agree.

However, we need to figure out how we're going to roll this out to potentially broken users - i.e. how do they mitigate (probably by deleting keys in etcd).

paphillon · 2019-08-15T08:12:26Z

@smarterclayton - So we did encounter this issue today while upgrading to 1.13.x, luckily we caught it really early and rolled back to 1.12. I believe when the cluster was created there might have been some errors and a re-setup was done but the etcd was not cleared. This caused some servicetokens from kube-system and default namespace that are autogenerated were encrypted with the old key.

Right now the plan is to delete these keys (servicetokens) from the etcd and try again.

k8s-ci-robot requested review from timothysc and wojtek-t October 3, 2018 20:45

k8s-ci-robot added the area/apiserver label Oct 3, 2018

mikedanese added the kind/bug Categorizes issue or PR as related to a bug. label Oct 3, 2018

k8s-ci-robot removed the needs-kind Indicates a PR lacks a `kind/foo` label and requires one. label Oct 3, 2018

k8s-ci-robot assigned lavalamp Oct 3, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 3, 2018

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 3, 2018

k8s-ci-robot merged commit 6e88271 into kubernetes:master Oct 4, 2018

mikedanese deleted the consistent branch October 4, 2018 06:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: propagate TransformFromStorage errors from List #69399

storage: propagate TransformFromStorage errors from List #69399

mikedanese commented Oct 3, 2018 •

edited

Loading

smarterclayton commented Oct 3, 2018

smarterclayton commented Oct 3, 2018

mikedanese commented Oct 3, 2018

lavalamp commented Oct 3, 2018

k8s-ci-robot commented Oct 3, 2018

lavalamp commented Oct 3, 2018

mikedanese commented Oct 3, 2018

mikedanese commented Oct 4, 2018

smarterclayton commented Oct 4, 2018

paphillon commented Aug 15, 2019

storage: propagate TransformFromStorage errors from List #69399

storage: propagate TransformFromStorage errors from List #69399

Conversation

mikedanese commented Oct 3, 2018 • edited Loading

smarterclayton commented Oct 3, 2018

smarterclayton commented Oct 3, 2018

mikedanese commented Oct 3, 2018

lavalamp commented Oct 3, 2018

k8s-ci-robot commented Oct 3, 2018

lavalamp commented Oct 3, 2018

mikedanese commented Oct 3, 2018

mikedanese commented Oct 4, 2018

smarterclayton commented Oct 4, 2018

paphillon commented Aug 15, 2019

mikedanese commented Oct 3, 2018 •

edited

Loading