Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge 2.9 #12909

Merged
merged 208 commits into from
Apr 21, 2021
Merged

Merge 2.9 #12909

merged 208 commits into from
Apr 21, 2021

Conversation

wallyworld
Copy link
Member

Merge 2.9

#12827 Unsubscribe from hub when closing state pool
#12829 Correct default bootstrap-timeout value displayed in help.
#12840 Constraint tags can be used for pod affinity
#12842 Fix upgrade series agent version handling
#12794 Add disk provisioning customization
#12845 Restore space support for manual machines
#12839 Support merging of netplan configs
#12853 Add display type for network-get results
#12854 Fix for LP1921557 sni in Juju login.
#12850 Use Base in Charmhub packge and its response structures.
#12858 Ensure assess-upgrade-series does not report started prematuremly
#12860 Removed logging from core annotations.
#12861 Fixes bug where empty error can happen in storage
#12865 Update Pebble version to include new files API
#12866 Workaround for k8s dashboard URL with k8s client proxy
#12862 Fix/lp 1923051
#12867 Fix/lp 1923561
#12870 Use channel logic in charm library
#12873 Add support for setting pod affinity topology key
#12874 Use Patch instead of Update for ensuring ingress resources
#12831 Integration fixes
#12879 Ensure refresh uses version
#12864 bug: fix for bootstrap fail on vsphere 7 + multi network
#12883 Initial work to allow juju trust for sidecar charms
#12884 Fix ssh with sidecar charms and containers.
#12886 Charmhub bases
#12881 Use charm pkg updates
#12889 Ignore projected volume mounts when looking up juju storage
#12890 Fix passing empty string container name to unit params
#12893 Add CLA checker GH action and remove codecov push action
#12897 Use production charmhub endpoint
#12887 Resource validation error
#12888 Ensure we validate the model target
#12898 Remove usage of systems package from CAAS application provisioner
#12899 CAAS bundle deployments
#12900 Bump up Pebble version to include user/group in list-files
#12901 charm Format helper
#12902 charm Iskubernetes helper
#12903 Display ... for really long k8s app versions in status
#12904 Filter out more full registry paths for app version in status
#12905 Fix k8s bundle deploys with v2 charms
#12906 Register resource-get for containeragent binary

Conflicts mostly due to charm v8 vs v9 imports.
The other one was due to changes to dashboard CLI.

# Conflicts:
#       api/common/charms/common.go
#       api/common/charms/common_test.go
#       apiserver/facades/client/application/application.go
#       apiserver/facades/client/application/charmstore.go
#       apiserver/facades/client/application/update_series_mocks_test.go
#       apiserver/facades/client/charms/client.go
#       apiserver/facades/client/charms/convertions.go
#       apiserver/facades/client/machinemanager/types_mock_test.go
#       apiserver/facades/controller/caasoperatorprovisioner/provisioner.go
#       cmd/juju/application/deployer/bundlehandler_test.go
#       cmd/juju/application/refresh_test.go
#       cmd/juju/application/refresher/refresher_mock_test.go
#       cmd/juju/dashboard/dashboard.go
#       core/charm/strategies_mock_test.go
#       core/model/model.go
#       core/model/model_test.go
#       go.mod
#       go.sum
#       resource/resourceadapters/charmhub.go
#       scripts/win-installer/setup.iss
#       service/agentconf_test.go
#       snap/snapcraft.yaml
#       state/charm.go
#       state/migration_export.go
#       state/state.go
#       version/version.go
#       worker/caasfirewallerembedded/applicationworker.go
#       worker/caasfirewallerembedded/applicationworker_test.go

QA steps

See PRs

achilleasa and others added 30 commits March 26, 2021 11:35
…b-when-closing-state-pool

juju#12827

This PR fixes a goroutine leak in our tests. When the state pool opens, it subscribes to the hub (each subscriber runs in a separate goroutine). However, the subscribers were not terminated when the state pool closed thus causing a goroutine leak when running tests.

## QA steps

Just run the tests. The CI should do this for you.
DefaultBootstrapSSHTimeout is defined as 1200.
juju#12829

A quick usability fix. Correct default bootstrap-timeout value displayed in help. DefaultBootstrapSSHTimeout is defined as 1200.

## QA steps
```console
$ juju help bootstrap | grep bootstrap-timeout
 bootstrap-timeout: 1200 # default: 20 minutes <- here is the change.
bootstrap-timeout:
 juju bootstrap --config bootstrap-timeout=1200 azure joe-eastus
```
This commit fixes LP1701429 by introducing support for merging
configuration values when parsing netplan files.

The netplan yaml files are first lexicographically sorted and the very
first file serves as the base configuration. Subsequent files get merged
into the base configuration following the set of rules described in the
man page of the "netplan-generate" command
(http://manpages.ubuntu.com/manpages/groovy/man8/netplan-generate.8.html).

Each netplan yaml file is unmarshaled into a map[interface{}]interface{}
(and not a map[string]interface{} as the former is used by the goyaml
package for nested maps) and a recursive merge function visits all map
entries and applies the appropriate merge rules based on their type
(scalar, map or slice).

The final merged configuration is marshalled back into YAML and then
unmarshaled (in strict mode) to the Netplan struct used by juju.
juju#12840

k8s constraint tags have been extended to support pod affinity as well as node affinity
Use the "pod." or "anti-pod." prefix for pod affinity and pod anti affinity.
No prefix or "node." is for node affinity

## QA steps


```
juju deploy somecharm --constraints="tags=node.foo=a|b|c,^bar=d|e|f,^foo=g|h,pod.foo=1|2|3,^pod.bar=4|5|6,anti-pod.afoo=x|y|z,^anti-pod.abar=7|8|9"

kubectl get -o json statefulset.apps/somecharm | jq .spec.template.spec.affinity
{
 "nodeAffinity": {
 "requiredDuringSchedulingIgnoredDuringExecution": {
 "nodeSelectorTerms": [
 {
 "matchExpressions": [
 {
 "key": "bar",
 "operator": "NotIn",
 "values": [
 "d",
 "e",
 "f"
 ]
 },
 {
 "key": "foo",
 "operator": "NotIn",
 "values": [
 "g",
 "h"
 ]
 },
 {
 "key": "foo",
 "operator": "In",
 "values": [
 "a",
 "b",
 "c"
 ]
 }
 ]
 }
 ]
 }
 },
 "podAffinity": {
 "requiredDuringSchedulingIgnoredDuringExecution": [
 {
 "labelSelector": {
 "matchExpressions": [
 {
 "key": "bar",
 "operator": "NotIn",
 "values": [
 "4",
 "5",
 "6"
 ]
 },
 {
 "key": "foo",
 "operator": "In",
 "values": [
 "1",
 "2",
 "3"
 ]
 }
 ]
 },
 "topologyKey": ""
 }
 ]
 },
 "podAntiAffinity": {
 "requiredDuringSchedulingIgnoredDuringExecution": [
 {
 "labelSelector": {
 "matchExpressions": [
 {
 "key": "abar",
 "operator": "NotIn",
 "values": [
 "7",
 "8",
 "9"
 ]
 },
 {
 "key": "afoo",
 "operator": "In",
 "values": [
 "x",
 "y",
 "z"
 ]
 }
 ]
 },
 "topologyKey": ""
 }
 ]
 }
}
```
Add the ability to specify if the disks of the newly created instance
should be thin, thick or thickEagerZero.
  * Fixed the default disk provisioning type if the model does not
hold a valid option
  * Added tests
This helper allows us to create network CIDRs from either a network IP
and a netmask (e.g. an IPNet value) or from a host IP and a netmask.

The latter allows us to convert host addresses in CIDR notation into
network CIDRs, e.g. 10.1.2.42/24 into 10.1.2.0/24.
…ails

When using netlink as our address source, it reports interface addresses
as IPNet values which represent the address in CIDR notation, e.g.
10.1.2.99/24.

Prior to this change, on manual machines, the networkingcommon package
used the address in CIDR notation value as the subnet's CIDR. API calls to
SetMachineAddresses (machiner) first convert the input arguments into a
SpaceAddresses list. However, that particular code path eventually calls
core/network/SubnetInfos.GetByUnderlayCIDR with the incorrect subnet
CIDR which fails validation thus causing an error to be returned back to
the machiner worker which in turn causes it to restart.
juju#12843

Merge 2.8

juju#12827 Unsubscribe from hub when closing state pool
juju#12829 Correct default bootstrap-timeout value displayed in help.
juju#12840 Constraint tags can be used for pod affinity

## QA steps

See PRs
…ing-customization

juju#12794

This change allows operators to set a new model-level config option which dictates how template VM disks should be cloned when creating a new machine. Current values are:

 prdesc thin - Sparse provisioning, only written blocks will take up disk space on the datastore
 prdesc thick - The entire size of the virtual disk will be deducted from the datastore, but unwritten blocks will not be zeroed out. This adds 2 potential pitfalls. See comments in  regarding .
 prdesc thickEagerZero (default) - The entire size of the virtual disk is deducted from the datastore, and unwritten blocks are zeroed out. Improves latency when committing to disk, as no extra step needs to be taken before writing data.

## Checklist

 - [ ] Requires a [pylibjuju](https://github.com/juju/python-libjuju) change
 - [ ] Added [integration tests](https://github.com/juju/juju/tree/develop/tests) for the PR
 - [ ] Added or updated [doc.go](https://discourse.jujucharms.com/t/readme-in-packages/451) related to packages changed
 - [ ] Comments answer the question of why design decisions were made

## QA steps



Verify your vsphere deployment that the root disk of the newly provisioned VM is thinly provisioned.

NOTE: The VM template used must *not* have disks provisioned as flat. Disks of the source template must be thin or thick. Flat disks cannot be cloned as thin or thick.

## Documentation changes

No changes required to CLI or API.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1807957
juju#12842

When doing an upgrade-series, there's no need to copy across agent binaries to the new series as everything is now "ubuntu".
So this operation is removed.

When, when upgrading from 2.8, the controller agent itself still is recorded as running the original "series" agent. This is because it is the old 2.8 agent that write out the new agent symlink prior to exiting. However, this breaks upgrade series on the upgraded controller so add a check to the upgrade worker so that older agent binaries are copied. The code to do this is a cut down copy of what was deleted from upgrade series.

## QA steps

Using Juju 2.8, bootstrap controller, switch to controller model, eg:
juju bootstrap lxd --bootstrap-series bionic
juju switch controller

Using Juju 2.9, this branch (gh pr checkout 12842):
juju deploy cs:ubuntu --series bionic --to 0
juju upgrade-controller --build-agent
juju status
juju ssh 0 "ls -l /var/lib/juju/tools"
(both machine-0 and ubuntu-0 should point to the ubuntu tools)

juju switch default
juju upgrade-model
juju deploy cs:ubuntu --series bionic
juju upgrade-series 0 prepare focal
juju upgrade-series 0 complete
check logs for errors
juju status
juju ssh 0 "ls -l /var/lib/juju/tools"
(both machine-0 and ubuntu-0 should point to the ubuntu tools)
…tionality-for-manual-machines

juju#12845

This PR fixes a networking-related bug which prevents bootstraps of manual machines on 2.9 and also enhances the fidelity of address-related information that gets captured by the machiner worker.

At some point along its execution, the machiner worker retrieves a list of observed network interfaces and sends it to the controller. When using netlink as our address source, it reports interface addresses (on manual machines) as `net.IPNet` values which represent the interface addresses in _CIDR notation-, e.g. `10.1.2.99/24`.

The controller, uses this information to populate the known subnet details as a workaround given that the manual provider does not support native subnet discovery. However, storing a non-network CIDR is stored as the subnet CIDR causes issues when the machiner contacts the controller to report the machine addresses visible to the worker.

In particular, the machiner worker periodically makes an API call to `SetMachineAddresses`. The facade implementation first attempts to convert the input arguments into a`SpaceAddresses list`. However, that particular code path eventually calls
`core/network/SubnetInfos.GetByUnderlayCIDR` with the incorrect subnet CIDR which fails validation thus causing an error to be returned back to the machiner worker. The error then causes the machiner to get stuck in a restart loop.

To fix this issue, a new helper called `NetworkCIDRFromIPAndNetmask` has been added to the `core/network` package. This allows us to properly convert host addresses in CIDR notation into network CIDRS (e.g. 10.1.2.3/24 -> 10.1.2.0/24) and avoid the validation errors.

## QA steps

Deploy a bionic MAAS machine with NICs in two spaces. In the example below, the machine is accessible at 10.0.0.75

```console
$ juju bootstrap manual/[email protected] manual-test --debug --no-gui

$ juju switch controller

# You should get a list of spaces/subnets
$ juju subnets
juju subnets
subnets:
 10.0.0.0/24:
 type: ipv4
 status: in-use
 space: alpha
 zones: []
 10.42.0.0/24:
 type: ipv4
 status: in-use
 space: alpha
 zones: []

$ juju add-space space1 10.42.0.0/24
added space "space1" with subnets 10.42.0.0/24

$ juju spaces
Name Space ID Subnets
alpha 0 10.0.0.0/24
space1 1 10.42.0.0/24

$ juju deploy cs:wordpress --to 0 --bind space1 --series bionic

# The charm should deploy as expected
```

## Bug reference
https://bugs.launchpad.net/juju/+bug/1922098
According to the netplan reference https://netplan.io/reference/, for
the link-local field:

"If this field is not defined, the default is to enable only
IPv6 link-local addresses. If the field is defined but configured as an
empty set, IPv6 link-local addresses are disabled as well as IPv4 link-
local addresses."

This commit changes the type of the LinkLocal field in netplan.Interface
to a *[]string so we can distinguish between a missing link-local
attribute and a present link-local attribute with an empty list as its
value.
…netplan-configs

juju#12839

This PR fixes LP1701429 by introducing support for merging
configuration values when parsing netplan files.

The netplan yaml files are first lexicographically sorted and the very
first file serves as the base configuration. Subsequent files get merged
into the base configuration following the set of rules described in the
man page of the "netplan-generate" command
(http://manpages.ubuntu.com/manpages/groovy/man8/netplan-generate.8.html).

Each netplan yaml file is unmarshaled into a `map[interface{}]interface{}`
(and **not** a `map[string]interface{}` as the former is used by the goyaml
package for nested maps) and a recursive merge function visits all map
entries and applies the appropriate merge rules based on their type
(scalar, map or slice).

The final merged configuration is marshalled back into YAML and then
unmarshaled (in strict mode) to the Netplan struct used by juju.

## QA steps

Try the steps from https://bugs.launchpad.net/juju/+bug/1919144/comments/2

Or this: 

You need a virtual maas with a dual-spaced machine (in this example, the machine gets an eth0 and eth1 NIC)

```console
$ juju bootstrap vmaas test --constraints "spaces=space1,space2"
$ juju switch controller

# ssh into the controller and set up a netplan file as follows:
$ juju ssh 0
$ cat /etc/netplan/99-juju-post.yaml
network:
 ethernets:
 eth0:
 link-local: []
 eth1:
 link-local: [ipv4]
 version: 2

# For reference, the seeded cloud-init yaml should look like this:
network:
 ethernets:
 eth0:
 addresses:
 - 10.0.0.79/24
 gateway4: 10.0.0.1
 match:
 macaddress: 52:54:00:a4:ee:93
 mtu: 1500
 nameservers:
 addresses:
 - 10.0.0.1
 search:
 - maas
 set-name: eth0
 eth1:
 dhcp4: true
 match:
 macaddress: 52:54:00:5b:59:87
 mtu: 1500
 set-name: eth1
 version: 2

# Back to the client, run the following:
$ juju deploy ubuntu --bind space2 --to lxd:0

# SSH back to the machine and ensure that the original configuration has been backed up 
# and a new "99-juju.yaml" file has been created with the bridge setup:
$ juju ssh 0
$ cat /etc/netplan/99-juju.yaml
network:
 version: 2
 ethernets:
 eth0:
 match:
 macaddress: 52:54:00:a4:ee:93
 set-name: eth0
 addresses:
 - 10.0.0.79/24
 gateway4: 10.0.0.1
 nameservers:
 search: [maas]
 addresses: [10.0.0.1]
 mtu: 1500
 link-local: [] <- the empty list for this link-local attr. is preserved; the empty list has special meaning for netplan
 eth1:
 match:
 macaddress: 52:54:00:5b:59:87
 set-name: eth1
 mtu: 1500
 bridges:
 br-eth1:
 interfaces: [eth1]
 dhcp4: true
 mtu: 1500
 link-local:
 - ipv4 <----- link local block should be relocated from eth1 here

```

NOTE: As per the bridging [implementation](https://github.com/juju/juju/blob/2.8/network/netplan/netplan.go#L280-L291), the bridge **steals all** `netplan.Interface` values from the interface that gets bridged **except** the MTU. The `eth1` fields above are all that's remaining in `netplan.Ethernet` (which embeds `netplan.Interface`). As a result of this implementation detail, the `link-local` field gets moved to the bridge.

## Bug reference
https://bugs.launchpad.net/juju/+bug/1701429
results to stdout.

A conversion method copies params.NetworkInfoResult into the display
type.

We should never be writing raw DTOs as output - they should be
restricted to use as API arguments/results.
results. The main fields now have YAML tags corresponding with those
from the JSON.
juju#12852

This PR forward ports the changes from juju#12839 into 2.9
juju#12853

To-date, the JSON and YAML representations of `params.NetworkInfo` have been inconsistent due to the omission of YAML serialisation tags.

This wouldn't normally be an issue but for the fact that we use the wire type for display purposes, which we should never do.

Here, we add dedicated display types and a conversion method to produce them from `params.NetworkInfoResult`. YAML tags are added to make them consistent with the JSON, but we also include field copies with the legacy YAML tags so as not to break old agents.

For the next major Juju version, the compatibility fields can be removed.

## QA steps

- Bootstrap to LXD.
- `juju deploy ubuntu`
- `juju run --unit ubuntu/0 "network-get juju-info --format json"|jq .` should render as before:
```
{
 "bind-addresses": [
 {
 "mac-address": "00:16:3e:fe:cd:8c",
 "interface-name": "eth0",
 "addresses": [
 {
 "hostname": "",
 "value": "10.62.163.32",
 "cidr": "10.62.163.0/24"
 }
 ],
 }
 ],
 "egress-subnets": [
 "10.62.163.32/32"
 ],
 "ingress-addresses": [
 "10.62.163.32"
 ]
}
```
- `juju run --unit ubuntu/0 "network-get juju-info --format yaml` should include both old and new field names:
```
bind-addresses:
- mac-address: 00:16:3e:fe:cd:8c
 interface-name: eth0
 addresses:
 - hostname: ""
 address: 10.62.163.32
 cidr: 10.62.163.0/24
 macaddress: 00:16:3e:fe:cd:8c
 interfacename: eth0
egress-subnets:
- 10.62.163.32/32
ingress-addresses:
- 10.62.163.32
```

## Documentation changes

None.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1915418
wallyworld and others added 14 commits April 20, 2021 11:18
juju#12903

k8s charms use oci images from the juju registry, which results in a useless app version string in status.
For such cass, just print "..."; the full value is displayed in status --format yaml.

## QA steps

deploy a k8s charm from the store
juju status
juju status --format yaml
juju#12902

Add a helper to determine if a charm is kubernetes. Avoids having redundant calculation logic all over.

## QA steps

no change to unit test results
juju#12904

juju#12903 landed to filter out long registry paths from app version in status.
This PR adds a couple more filter terms.

## QA steps

Deploy kubeflow and run juju status
pipelines-db app should have version "..."
juju#12905

k8s v2 charms need to be deployed with a series that matches that of the `base` in the manifest.
Bundles which contained v2 charms were incorrectly being processed and having the deploy series set to "kubernetes".
There was a unit test for this, but it was incorrectly set up such that the test charm used was not a v2 charm, hence the test passed.

Also some drive by fixes for excess logging and an unused function parameter.

A side effect of this change is that sidecar k8s apps now show up in status as OS/series "ubuntu/focal" or whatever, not "kubernetes/kubernetes". This strictly represents the underlying container where the workload is running, but there's now no visible indication in status that it is a k8s app. The second commit addresses this by ensuring all k8s changes have "kubernetes" as the series.

## QA steps

bootstrap k8s
add a model pointing to staging charmhub
deploy this bundle with both a v1 k8s charm and a v2 k8s charm

```
bundle: kubernetes
applications:
 mariadb:
 charm: cs:~juju/mariadb-k8s
 scale: 1
 snappass:
 charm: facundo-snappass-test
 scale: 1
```

also deploy the charms directly
```
juju deploy facundo-snappass-test
juju deploy cs:~juju/mariadb-k8s
```
juju#12906

*Register resource-get for containeragent binary;*

## Checklist

 - [ ] ~Requires a [pylibjuju](https://github.com/juju/python-libjuju) change~
 - [ ] ~Added [integration tests](https://github.com/juju/juju/tree/develop/tests) for the PR~
 - [ ] ~Added or updated [doc.go](https://discourse.jujucharms.com/t/readme-in-packages/451) related to packages changed~
 - [x] Comments answer the question of why design decisions were made

## QA steps

charm code:
```python
 def _on_snappass_pebble_ready(self, event):
 logger.info("_on_snappass_pebble_ready")
 container = self.unit.containers["redis"]
 
 test_file = self.model.resources.fetch("test-file")
 target_dir = test_file.parent
 container.make_dir(str(target_dir), make_parents=True)
 container.push(str(test_file), test_file.read_bytes())
 logger.info("files at %s are %s", str(target_dir), container.list_files(str(target_dir)))

 if self._is_running(container, "redis"):
 logger.info("redis already started")
 self._start_snappass()

```

```console
$ yml2json metadata.yaml | jq '.resources["test-file"]'
{
 "type": "file",
 "description": "test resource",
 "filename": "test.txt"
}

$ cat ./test-file.txt
111111111111111111111111111111111

$ juju deploy ./snappass-test.charm --resource snappass-image=benhoyt/snappass-test --resource redis-image=redis --resource test-file=./test-file.txt

$ mkubectl exec pod/snappass-test-0 -n t1 -it -c charm -- cat /var/lib/juju/agents/unit-snappass-test-0/resources/test-file/test.txt
111111111111111111111111111111111

$ mkubectl exec pod/snappass-test-0 -n t1 -it -c redis -- cat /var/lib/juju/agents/unit-snappass-test-0/resources/test-file/test.txt
111111111111111111111111111111111


# or deploy from charmhub
$ juju add-model t1

$ juju deploy kelvin-snappass-test
Located charm "kelvin-snappass-test" in charm-hub, revision 1
Deploying "kelvin-snappass-test" from charm-hub charm "kelvin-snappass-test", revision 1 in channel stable

$ mkubectl exec pod/kelvin-snappass-test-0 -n t1 -it -c redis -- cat /var/lib/juju/agents/unit-kelvin-snappass-test-0/resources/test-file/test.txt
111111111111111111111111111111111

```


## Documentation changes

No

## Bug reference

https://bugs.launchpad.net/juju/+bug/1922936
@wallyworld wallyworld force-pushed the merge-2.9-20210421 branch 8 times, most recently from fd4bd1f to e36c60c Compare April 21, 2021 10:08
@SimonRichardson
Copy link
Member

$$merge$$

1 similar comment
@SimonRichardson
Copy link
Member

$$merge$$

@SimonRichardson
Copy link
Member

Failed for a different reason

WorkerSuite.TestErrRefresh

$$merge$$

@SimonRichardson
Copy link
Member

It's been a long day!

$$merge$$

@jujubot jujubot merged commit 5879b6b into juju:develop Apr 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.