Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JUJU-521] 1959351 fix slow juju ssh leader #13670

Merged
merged 7 commits into from
Jan 31, 2022

Conversation

hmlanigan
Copy link
Member

@hmlanigan hmlanigan commented Jan 28, 2022

Find the leader for juju ssh application/leader via a direction api call rather than gathering the data from status to improve result time. Fall back to the current method if the new api call is not available.

Including a little bit of cleanup due to compiler warnings etc.

QA steps

Deploy a bunch of units and time juju ssh to the leader of the application. It should be faster with the newer client and controller.

Ensure that the newer client can still run juju ssh to the leader of the application, though slower.

Test with debug hooks and in a k8s config as well.

Bug reference

https://bugs.launchpad.net/juju/+bug/1959351

To improve the run time for juju ssh <app>/leader, add a direct call to
get the application's leader unit.  Bumped the facade revision.
@hmlanigan
Copy link
Member Author

Need to remove the back stop call to status when merged into develop.

@hmlanigan hmlanigan changed the title 1959351 fix slow juju ssh leader [JUJU-521] 1959351 fix slow juju ssh leader Jan 28, 2022
if err != nil {
return result, err
}
leaders, err := facade.leadershipReader.Leaders()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may return the wrong result. As the backing to this is the raft store, you could be reading stale data before quorum is reached. As I don't believe we guarantee immediate consistency, as Juju is eventually consistent, then another query to the Leader API in the future should return a non-stale leader if it is wrong.

To be sure about the consistency of a leader you would have to insert a barrier to the raft leases and watch for your write to appear before reading the leaders. We currently don't support that, but if we get bugs about this, then that is the actual fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be an existing bug as well. The current implementation uses the results of FullStatus, which get the leader the same way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leader is never guaranteed anyway, you could find the leader and then by the time you finish SSH to the unit it is no longer the leader. It is certainly better (much smaller window) with Heather's change.

@@ -29,10 +30,6 @@ import (
jujussh "github.com/juju/juju/network/ssh"
)

//go:generate go run github.com/golang/mock/mockgen -package mocks -destination mocks/ssh_container_mock.go github.com/juju/juju/cmd/juju/commands CloudCredentialAPI,ApplicationAPI,ModelAPI,CharmsAPI
//go:generate go run github.com/golang/mock/mockgen -package mocks -destination mocks/context_mock.go github.com/juju/juju/cmd/juju/commands Context
//go:generate go run github.com/golang/mock/mockgen -package mocks -destination mocks/k8s_exec_mock.go github.com/juju/juju/caas/kubernetes/provider/exec Executor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

Which implements Leader on the apiserver side. Required to make
juju ssh <app>/leader faster, rather than getting the data from status.
It executes a small subset of FullStatus, thus is okay.
Use a specific API call to get the leader unit name, fall back to using FullStatus
if the new sshclient facade version is not available.  FullStatus can be very slow
on large models, juju ssh should not be if possible.

Juju debughooks time to ssh will also improved.

Moved mockgen lines to package_test.go, per convention, easier to find in one place.

Fixed some unhandled error and unused parameter compiler warnings.
@hmlanigan hmlanigan force-pushed the 1959351-fix-slow-juju-ssh-leader branch from d2025ec to bf41648 Compare January 31, 2022 15:45
@hmlanigan
Copy link
Member Author

Force push added apiserver/facades/schema.json to the apiserver changes commit.

@hmlanigan
Copy link
Member Author

$$merge$$

@jujubot jujubot merged commit e327bb2 into juju:2.9 Jan 31, 2022
@hmlanigan hmlanigan deleted the 1959351-fix-slow-juju-ssh-leader branch January 31, 2022 19:19
@wallyworld wallyworld mentioned this pull request Feb 1, 2022
jujubot added a commit that referenced this pull request Feb 1, 2022
#13673

Merge 2.9

#13546 [JUJU-299] Store unit CharmURL as a string reference
#13640 add message query to juju wait-for
#13646 [JUJU-484] Stop polling external cmr controller when it is removed
#13634 [JUJU-463] Implement NetworkInterfaces method for azure provider
#13624 [JUJU-416] Consistantly use juju/retry to handle retries 3 (state/*)
#13648 [JUJU-464] Implement NetworkInterfaces method for openstack provider
#13636 [JUJU-479] Revert "unit charm url"
#13650 [JUJU-485] Add per controller and per app limits for downloading resources
#13649 [JUJU-481] Remove fake NIC generation fallback logic from instancepoller worker
#13681 Comment out the snap interfaces till approved
#13670 [JUJU-521] 1959351 fix slow juju ssh leader
#13671 [JUJU-527] Updating juju/os to support macos Monterey
#13678 [Juju-541] Document use of the @ symbol when setting a config value in help message

Lots of trivial conflicts due ot import path changes and the forward port of the charm download changes which first landed in this branch and were backported.

## QA steps

See PRs



[JUJU-299]: https://warthogs.atlassian.net/browse/JUJU-299?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[JUJU-484]: https://warthogs.atlassian.net/browse/JUJU-484?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[JUJU-463]: https://warthogs.atlassian.net/browse/JUJU-463?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[JUJU-416]: https://warthogs.atlassian.net/browse/JUJU-416?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[JUJU-464]: https://warthogs.atlassian.net/browse/JUJU-464?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[JUJU-479]: https://warthogs.atlassian.net/browse/JUJU-479?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[JUJU-485]: https://warthogs.atlassian.net/browse/JUJU-485?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[JUJU-481]: https://warthogs.atlassian.net/browse/JUJU-481?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

[JUJU-521]: https://warthogs.atlassian.net/browse/JUJU-521?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[JUJU-527]: https://warthogs.atlassian.net/browse/JUJU-527?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants