Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix worker leak in provisioner #11141

Merged
merged 1 commit into from
Jan 22, 2020
Merged

Conversation

howbazaar
Copy link
Contributor

During analysis of the goroutine leak on prodstack, we identified that there was a leak of provisioner tasks. Analysing the code, we found that the creation of the availability zone map for the task was done after the task loop was started, but if an error occurred, the task was never killed.

This fix also looks like it will address the error case where a new machine was started but it was not able to log in due to a second active task having reset its password.

QA steps

Run the tests.

Documentation changes

Internal change only, nothing to document.

Copy link
Member

@jameinel jameinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we were able to find the actual error in prodstack, where it was trying to list information about availability zones and we got a 500 back, I'm comfortable that this is the correct fix.

@howbazaar
Copy link
Contributor Author

$$merge$$

@jujubot jujubot merged commit bb37a5d into juju:2.7 Jan 22, 2020
@manadart
Copy link
Member

Shouldn't be hard to add a test for this. Just get the ZonedEnviron mock to return an error.

@@ -168,6 +162,13 @@ func (task *provisionerTask) Wait() error {

func (task *provisionerTask) loop() error {

// Get existing machine distributions.
err := task.populateAvailabilityZoneMachines()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering whether there was anything that expected availabilityZoneMachines to be populated by the time the worker is returned, but from going through the code it looks like the only things that use it are called from loop, so seems all good.

manadart added a commit to manadart/juju that referenced this pull request Jan 23, 2020
stoppage when an error is returned by populateAvailabilityZoneMachines.

Always add a test for in-theatre bugs previously uncovered.
manadart added a commit to manadart/juju that referenced this pull request Jan 23, 2020
stoppage when an error is returned by populateAvailabilityZoneMachines.

Always add a test for in-theatre bugs previously uncovered.
manadart added a commit to manadart/juju that referenced this pull request Jan 23, 2020
stoppage when an error is returned by populateAvailabilityZoneMachines.

Always add a test for in-theatre bugs previously uncovered.
jujubot added a commit that referenced this pull request Jan 23, 2020
#11146

### Checklist

 - [x] Checked if it requires a [pylibjuju](https://github.com/juju/python-libjuju) change?
 - [x] Added [integration tests](https://github.com/juju/juju/tree/develop/tests) for the PR?
 - [x] Added or updated [doc.go](https://discourse.jujucharms.com/t/readme-in-packages/451) related to packages changed?
 - [x] Do comments answer the question of why design decisions were made?

----

## Description of change

This patch simply adds a unit test to verify the change made in #11141.

It should be a golden rule that where we fix a bug from the field for functionality without test coverage, that coverage gap is closed.

## QA steps

Unit tests fail sans #11141, but pass with it applied.

## Documentation changes

None.

## Bug reference

N/A
@babbageclunk babbageclunk mentioned this pull request Jan 26, 2020
4 tasks
jujubot added a commit that referenced this pull request Jan 26, 2020
#11151

### Checklist

 - [x] Checked if it requires a [pylibjuju](https://github.com/juju/python-libjuju) change?
 - [x] Added [integration tests](https://github.com/juju/juju/tree/develop/tests) for the PR?
 - [x] Added or updated [doc.go](https://discourse.jujucharms.com/t/readme-in-packages/451) related to packages changed?
 - [x] Do comments answer the question of why design decisions were made?

----

## Description of change

This pulls the latest changes in the 2.7 branch into develop.
Brings in:
- #11139 from jameinel/2.7-detach-volume-state-changing-1860542
- #11141 from howbazaar/2.7-fix-provisioner-leak
- #11144 from babbageclunk/2.7-fix-destroy-test
- #11137 from ycliuhw/feature/vSphere-disk-resizing
- #11146 from manadart/2.7-provisioner-worker-test
- #11147 from manadart/relocate-systemd-files
- #11150 from manadart/2.7-controller-integration-test
- #11149 from manadart/2.7-systemd-upgrades

(No conflicts when merging.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants