-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler stucks (try to schedule a pod endlessly, but never succeed) #1751
Comments
@ddysher I saw this too while following the guest book and creating the php/frontend step. In my case, I had 2 minions while the replication controller configuration had configured 3 replicas, so 2 pods got scheduled -each on another minion, the third pod had empty Host (saw that when doing 'list pods'), and the 3 of them were stuck in "waiting" status. |
@abonas This is different from your use case. If you have more replicas than minions, then the scheduler will get stuck for sure (with the error msg probably being 'failed to find a fit for pod'). AFAICT, it's not support yet. |
@lavalamp , @ddysher thanks for your replies.
|
@abonas #2 should be true today. #1 is less possible than it seems like-- minions can come and go, and become over or under loaded in the time between creation and when the pod shows up at the scheduler. So even if we check up front, we still have to check again later, and sometimes the first check will pass but not the second. |
When creating replication controller (e.g. frontend php-redis from guestbook example), scheduler gets stuck and never recovers. This is a race, I have to run several times to see the error (A script to delete controller and pods, then create new controller, leave enough time in between for k8s to react). I've created PR for an attempted fix.
The text was updated successfully, but these errors were encountered: