-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix worker leak in provisioner #11141
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we were able to find the actual error in prodstack, where it was trying to list information about availability zones and we got a 500 back, I'm comfortable that this is the correct fix.
|
Shouldn't be hard to add a test for this. Just get the |
@@ -168,6 +162,13 @@ func (task *provisionerTask) Wait() error { | |||
|
|||
func (task *provisionerTask) loop() error { | |||
|
|||
// Get existing machine distributions. | |||
err := task.populateAvailabilityZoneMachines() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering whether there was anything that expected availabilityZoneMachines
to be populated by the time the worker is returned, but from going through the code it looks like the only things that use it are called from loop, so seems all good.
stoppage when an error is returned by populateAvailabilityZoneMachines. Always add a test for in-theatre bugs previously uncovered.
stoppage when an error is returned by populateAvailabilityZoneMachines. Always add a test for in-theatre bugs previously uncovered.
stoppage when an error is returned by populateAvailabilityZoneMachines. Always add a test for in-theatre bugs previously uncovered.
#11146 ### Checklist - [x] Checked if it requires a [pylibjuju](https://github.com/juju/python-libjuju) change? - [x] Added [integration tests](https://github.com/juju/juju/tree/develop/tests) for the PR? - [x] Added or updated [doc.go](https://discourse.jujucharms.com/t/readme-in-packages/451) related to packages changed? - [x] Do comments answer the question of why design decisions were made? ---- ## Description of change This patch simply adds a unit test to verify the change made in #11141. It should be a golden rule that where we fix a bug from the field for functionality without test coverage, that coverage gap is closed. ## QA steps Unit tests fail sans #11141, but pass with it applied. ## Documentation changes None. ## Bug reference N/A
#11151 ### Checklist - [x] Checked if it requires a [pylibjuju](https://github.com/juju/python-libjuju) change? - [x] Added [integration tests](https://github.com/juju/juju/tree/develop/tests) for the PR? - [x] Added or updated [doc.go](https://discourse.jujucharms.com/t/readme-in-packages/451) related to packages changed? - [x] Do comments answer the question of why design decisions were made? ---- ## Description of change This pulls the latest changes in the 2.7 branch into develop. Brings in: - #11139 from jameinel/2.7-detach-volume-state-changing-1860542 - #11141 from howbazaar/2.7-fix-provisioner-leak - #11144 from babbageclunk/2.7-fix-destroy-test - #11137 from ycliuhw/feature/vSphere-disk-resizing - #11146 from manadart/2.7-provisioner-worker-test - #11147 from manadart/relocate-systemd-files - #11150 from manadart/2.7-controller-integration-test - #11149 from manadart/2.7-systemd-upgrades (No conflicts when merging.)
During analysis of the goroutine leak on prodstack, we identified that there was a leak of provisioner tasks. Analysing the code, we found that the creation of the availability zone map for the task was done after the task loop was started, but if an error occurred, the task was never killed.
This fix also looks like it will address the error case where a new machine was started but it was not able to log in due to a second active task having reset its password.
QA steps
Run the tests.
Documentation changes
Internal change only, nothing to document.