Skip to content

Conversation

@denkensk
Copy link
Member

@denkensk denkensk commented Jan 4, 2019

What type of PR is this?

/kind bug
/sig scheduling

What this PR does / why we need it:
The scheduler places unschedulable pods in "unschedulabe" queue and retries them only when certain events happen that could potentially make them schedulable. This logic works well in almost all scenarios, but inevitable race condition in large distributed systems, could potentially cause some events to be seen before pods are added to the unschedulable queue. If this happens, pods may be left in the unschedulable queue and not be retried. Such scenarios should be rare and even if they occur, usually there are other events that trigger a retry and cover them. However, if such scenarios happen in smaller and low churn clusters, other events may not be seen for a while and pods may be stuck in the unschedulable queue for a long time.

Which issue(s) this PR fixes:
Fixes #72122

Special notes for your reviewer:

add goroutine to move unschedulable pods to activeq if they are not retried for more than 1 minute

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 4, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @denkensk. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 4, 2019
@resouer
Copy link
Contributor

resouer commented Jan 4, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 4, 2019
@denkensk denkensk force-pushed the add-goroutine-move-unschedulablepods-to-activeq branch 2 times, most recently from ab21df1 to 1714a9f Compare January 4, 2019 09:07
@denkensk denkensk changed the title [WIP] add goroutine to move unschedulablepods to activeq regularly add goroutine to move unschedulablepods to activeq regularly Jan 4, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 4, 2019
@denkensk denkensk changed the title add goroutine to move unschedulablepods to activeq regularly [WIP]add goroutine to move unschedulablepods to activeq regularly Jan 4, 2019
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 4, 2019
@denkensk denkensk changed the title [WIP]add goroutine to move unschedulablepods to activeq regularly add goroutine to move unschedulablepods to activeq regularly Jan 4, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 4, 2019
@denkensk denkensk changed the title add goroutine to move unschedulablepods to activeq regularly Move unschedulable pods to the active queue if they are not retried for more than 1 minute Jan 4, 2019
@resouer
Copy link
Contributor

resouer commented Jan 4, 2019

/assign

I will try to do a round of review.

@resouer
Copy link
Contributor

resouer commented Jan 4, 2019

/kind bug
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jan 4, 2019
@denkensk denkensk force-pushed the add-goroutine-move-unschedulablepods-to-activeq branch 3 times, most recently from 95c0e4a to 40a9a5c Compare January 16, 2019 03:52
@denkensk denkensk force-pushed the add-goroutine-move-unschedulablepods-to-activeq branch from 40a9a5c to de8cfdc Compare January 16, 2019 04:08
Copy link
Member

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Namespace: "ns1",
UID: types.UID("tp-mid"),
},
Spec: v1.PodSpec{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Priority and NominatedNodeName are not really necessary here.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 16, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, denkensk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 16, 2019
@k8s-ci-robot k8s-ci-robot merged commit d857790 into kubernetes:master Jan 17, 2019
@Huang-Wei
Copy link
Member

@denkensk if this change needs to be back ported into other releases, I guess PR #73022 should also be included?

@denkensk
Copy link
Member Author

denkensk commented Jan 18, 2019

@denkensk if this change needs to be back ported into other releases, I guess PR #73022 should also be included?

@Huang-Wei Yes, PR #73022 should also be included. I'm not sure if this change needs to be back ported into other releases.

@Huang-Wei
Copy link
Member

Thanks.

I'm not sure if this change needs to be back ported into other releases.

It seems so, see #72925 (comment). But let's hold for now.

@bsalamat
Copy link
Member

@denkensk if this change needs to be back ported into other releases, I guess PR #73022 should also be included?

@denkensk @Huang-Wei Yes, this PR should be backported to 1.11+. Could you please send cherrypick PRs?

@denkensk
Copy link
Member Author

denkensk commented Jan 25, 2019

@denkensk @Huang-Wei Yes, this PR should be backported to 1.11+. Could you please send cherrypick PRs?

Yes, I will send cherrypick PRs.

k8s-ci-robot added a commit that referenced this pull request Jan 30, 2019
…58-upstream-release-1.13

Automated cherry pick of #72558: add goroutine to move unschedulablepods to activeq regularly 1.13
k8s-ci-robot added a commit that referenced this pull request Feb 2, 2019
…58-upstream-release-1.11

Automated cherry pick of #72558: add goroutine to move unschedulablepods to activeq regularly 1.11
k8s-ci-robot added a commit that referenced this pull request Feb 7, 2019
…58-upstream-release-1.12

Automated cherry pick of #72558: add goroutine to move unschedulablepods to activeq regularly 1.12
@denkensk denkensk deleted the add-goroutine-move-unschedulablepods-to-activeq branch February 15, 2019 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move unschedulable pods to the active queue if they are not retried for more than 1 minute

6 participants