Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Storageversion API for CRDs - update in just one place #1

Closed
wants to merge 2 commits into from

Conversation

richabanker
Copy link
Owner

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@richabanker richabanker force-pushed the sv-single-update branch 4 times, most recently from 975751f to ecf65bf Compare March 13, 2024 03:38
@richabanker richabanker changed the base branch from master to storage-version-api-beta March 14, 2024 20:37
@richabanker richabanker force-pushed the sv-single-update branch 2 times, most recently from d7091f2 to a70533b Compare March 19, 2024 04:30
err := apierrors.NewServiceUnavailable(err.Error())
responsewriters.ErrorNegotiated(err, Codecs, schema.GroupVersion{Group: requestInfo.APIGroup, Version: requestInfo.APIVersion}, w, req)
return nil
}
return handlers.UpdateResource(storage, requestScope, r.admission)
case "patch":
if err := crdInfo.waitForStorageVersionUpdate(req.Context()); err != nil {
if err := crdInfo.waitForStorageVersionUpdate(req.Context(), crd, latestSVUpdateInfo); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if we can dedup the cases here. Any code change that needs to be duplicated into each case seems fragile long-term.

}

func (r *crdHandler) updateLatestStorageVersionUpdateInfo(crdUID types.UID, svUpdateInfo *storageVersionUpdateInfo) {
r.latestStorageVersionUpdateInfo.Store(crdUID, svUpdateInfo)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC we are only tracking the latest teardown. If there are multiple versions being torn down, do we block new CR writes until all the versions have been torn down?

Copy link
Owner Author

@richabanker richabanker Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we will be blocking CR writes on only the latest SV update to finish meaning we will wait for the latest teardown to finish.
If multiple CRD updates are received over some period of time, v1 -> v2 -> v3 and a CR request happens after v3 update was requested, CR would be blocked on v3 update to finish, v3 update would be blocked on v2 teardown to finish (provided that v2 update was processed fully) and v2 update(historically) was blocked on v1 teardown to finish (provided that v1 update was processed fully).

If multiple rapid CRD updates come in, and the map gets updated from v1 -> v3 directly, the new CR would be blocked on just v1 teardown to finish.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess at any given time, because we only process the latest SV update, there can only be a single teardown in progress?

// timeout is the time when the API server will unblock and allow CR
// write requests even if the storage version update hasn't been
// processed.
timeout time.Time

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we won't serve CR requests if the SV update fails. WDYT?

Copy link
Owner Author

@richabanker richabanker Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes that's what the code is doing, i.e. failing the CR request if timeout occurs. Updated the comment to reflect the same.

@richabanker richabanker force-pushed the sv-single-update branch 3 times, most recently from ff3ee7c to ab525a8 Compare March 19, 2024 20:18
Signed-off-by: Andrew Sy Kim <[email protected]>
Co-authored-by: Haowei Cai (Roy) <[email protected]>
@richabanker richabanker force-pushed the sv-single-update branch 6 times, most recently from 5873fae to 16d7158 Compare March 19, 2024 23:13
Comment on lines +49 to +53
var (
// map from crd.UID -> storageVersionUpdateInfo to keep
// track of the latest storageversion update for a CRD.
latestStorageVersionUpdateInfo sync.Map
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this needs to be global and not in Manager?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made it a global in Manager in kubernetes#123999. Wonder what the preference is between a sync.Map global v/s lock + map fields in the Manager struct?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. sync.Map is specialized and typically is a good choice if each key is only ever written once but read many times. It's a bit of judgement call on when exactly to use it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated the main PR to use a single sync.Map as a struct field for the manager.

Comment on lines +421 to +425
if err := waitForStorageVersionUpdate(req.Context(), r, crd, requestInfo, w, req); err != nil {
err := apierrors.NewServiceUnavailable(err.Error())
responsewriters.ErrorNegotiated(err, Codecs, schema.GroupVersion{Group: requestInfo.APIGroup, Version: requestInfo.APIVersion}, w, req)
return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do something to reduce the copy-paste verbosity of these checks?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack'd

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the main PR with a helper to make this more readable.

Comment on lines +582 to 583
// removeStorageLocked removes the cached storage with the given uid as key from the storage map. This function
// updates r.customStorage with the cleaned-up storageMap and tears down the old storage.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document the returned channel?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, thanks! Will incorporate this in the main PR

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the main PR

close(errCh)
}
return
case <-time.After(1 * time.Minute):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this 1 * time.Minute also be defined as a constant like storageVersionUpdateTimeout is?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point. Will incorporate this in the main PR

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the main PR

@richabanker
Copy link
Owner Author

Can we continue the review of these changes in the PR kubernetes#123999 instead of here? I'll close this PR since this was made against my fork of k/k. I'll incorporate the feedback in the aforementioned PR, and close this one to avoid confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants