Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.9] Restore space support for manual machines #12845

Conversation

achilleasa
Copy link
Contributor

@achilleasa achilleasa commented Apr 1, 2021

This PR fixes a networking-related bug which prevents bootstraps of manual machines on 2.9 and also enhances the fidelity of address-related information that gets captured by the machiner worker.

At some point along its execution, the machiner worker retrieves a list of observed network interfaces and sends it to the controller. When using netlink as our address source, it reports interface addresses (on manual machines) as net.IPNet values which represent the interface addresses in _CIDR notation-, e.g. 10.1.2.99/24.

The controller, uses this information to populate the known subnet details as a workaround given that the manual provider does not support native subnet discovery. However, storing a non-network CIDR is stored as the subnet CIDR causes issues when the machiner contacts the controller to report the machine addresses visible to the worker.

In particular, the machiner worker periodically makes an API call to SetMachineAddresses. The facade implementation first attempts to convert the input arguments into aSpaceAddresses list. However, that particular code path eventually calls
core/network/SubnetInfos.GetByUnderlayCIDR with the incorrect subnet CIDR which fails validation thus causing an error to be returned back to the machiner worker. The error then causes the machiner to get stuck in a restart loop.

To fix this issue, a new helper called NetworkCIDRFromIPAndNetmask has been added to the core/network package. This allows us to properly convert host addresses in CIDR notation into network CIDRS (e.g. 10.1.2.3/24 -> 10.1.2.0/24) and avoid the validation errors.

QA steps

Deploy a bionic MAAS machine with NICs in two spaces. In the example below, the machine is accessible at 10.0.0.75

$ juju bootstrap manual/[email protected] manual-test --debug --no-gui

$ juju switch controller

# You should get a list of spaces/subnets
$ juju subnets
juju subnets
subnets:
  10.0.0.0/24:
    type: ipv4
    status: in-use
    space: alpha
    zones: []
  10.42.0.0/24:
    type: ipv4
    status: in-use
    space: alpha
    zones: []

$ juju add-space space1 10.42.0.0/24
added space "space1" with subnets 10.42.0.0/24

$ juju spaces
Name    Space ID  Subnets
alpha   0         10.0.0.0/24
space1  1         10.42.0.0/24

$ juju deploy cs:wordpress --to 0 --bind space1 --series bionic

# The charm should deploy as expected

Bug reference

https://bugs.launchpad.net/juju/+bug/1922098

This helper allows us to create network CIDRs from either a network IP
and a netmask (e.g. an IPNet value) or from a host IP and a netmask.

The latter allows us to convert host addresses in CIDR notation into
network CIDRs, e.g. 10.1.2.42/24 into 10.1.2.0/24.
@achilleasa achilleasa added the 2.9 label Apr 1, 2021
@achilleasa achilleasa changed the title [2.9] Restore space support on manual machines [2.9] Restore space support for manual machines Apr 1, 2021
…ails

When using netlink as our address source, it reports interface addresses
as IPNet values which represent the address in CIDR notation, e.g.
10.1.2.99/24.

Prior to this change, on manual machines, the networkingcommon package
used the address in CIDR notation value as the subnet's CIDR. API calls to
SetMachineAddresses (machiner) first convert the input arguments into a
SpaceAddresses list. However, that particular code path eventually calls
core/network/SubnetInfos.GetByUnderlayCIDR with the incorrect subnet
CIDR which fails validation thus causing an error to be returned back to
the machiner worker which in turn causes it to restart.
@achilleasa achilleasa force-pushed the 2.9-restore-spaces-functionality-for-manual-machines branch from 448cee7 to 7bc69b5 Compare April 1, 2021 15:46
Copy link
Member

@manadart manadart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good; will run the QA next week.

Copy link
Member

@manadart manadart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well.

@achilleasa
Copy link
Contributor Author

$$merge$$

1 similar comment
@hpidcock
Copy link
Member

hpidcock commented Apr 6, 2021

$$merge$$

@jujubot jujubot merged commit 257ceca into juju:2.9 Apr 6, 2021
@achilleasa achilleasa deleted the 2.9-restore-spaces-functionality-for-manual-machines branch April 6, 2021 11:30
@wallyworld wallyworld mentioned this pull request Apr 21, 2021
jujubot added a commit that referenced this pull request Apr 21, 2021
#12909

Merge 2.9

#12827 Unsubscribe from hub when closing state pool
#12829 Correct default bootstrap-timeout value displayed in help.
#12840 Constraint tags can be used for pod affinity
#12842 Fix upgrade series agent version handling
#12794 Add disk provisioning customization
#12845 Restore space support for manual machines
#12839 Support merging of netplan configs
#12853 Add display type for network-get results
#12854 Fix for LP1921557 sni in Juju login.
#12850 Use Base in Charmhub packge and its response structures.
#12858 Ensure assess-upgrade-series does not report started prematuremly
#12860 Removed logging from core annotations.
#12861 Fixes bug where empty error can happen in storage
#12865 Update Pebble version to include new files API
#12866 Workaround for k8s dashboard URL with k8s client proxy
#12862 Fix/lp 1923051
#12867 Fix/lp 1923561
#12870 Use channel logic in charm library
#12873 Add support for setting pod affinity topology key
#12874 Use Patch instead of Update for ensuring ingress resources
#12831 Integration fixes
#12879 Ensure refresh uses version
#12864 bug: fix for bootstrap fail on vsphere 7 + multi network
#12883 Initial work to allow juju trust for sidecar charms
#12884 Fix ssh with sidecar charms and containers.
#12886 Charmhub bases
#12881 Use charm pkg updates
#12889 Ignore projected volume mounts when looking up juju storage
#12890 Fix passing empty string container name to unit params
#12893 Add CLA checker GH action and remove codecov push action
#12897 Use production charmhub endpoint
#12887 Resource validation error
#12888 Ensure we validate the model target
#12898 Remove usage of systems package from CAAS application provisioner
#12899 CAAS bundle deployments
#12900 Bump up Pebble version to include user/group in list-files
#12901 charm Format helper
#12902 charm Iskubernetes helper
#12903 Display ... for really long k8s app versions in status
#12904 Filter out more full registry paths for app version in status
#12905 Fix k8s bundle deploys with v2 charms
#12906 Register resource-get for containeragent binary

Conflicts mostly due to charm v8 vs v9 imports.
The other one was due to changes to dashboard CLI.
```
# Conflicts:
# api/common/charms/common.go
# api/common/charms/common_test.go
# apiserver/facades/client/application/application.go
# apiserver/facades/client/application/charmstore.go
# apiserver/facades/client/application/update_series_mocks_test.go
# apiserver/facades/client/charms/client.go
# apiserver/facades/client/charms/convertions.go
# apiserver/facades/client/machinemanager/types_mock_test.go
# apiserver/facades/controller/caasoperatorprovisioner/provisioner.go
# cmd/juju/application/deployer/bundlehandler_test.go
# cmd/juju/application/refresh_test.go
# cmd/juju/application/refresher/refresher_mock_test.go
# cmd/juju/dashboard/dashboard.go
# core/charm/strategies_mock_test.go
# core/model/model.go
# core/model/model_test.go
# go.mod
# go.sum
# resource/resourceadapters/charmhub.go
# scripts/win-installer/setup.iss
# service/agentconf_test.go
# snap/snapcraft.yaml
# state/charm.go
# state/migration_export.go
# state/state.go
# version/version.go
# worker/caasfirewallerembedded/applicationworker.go
# worker/caasfirewallerembedded/applicationworker_test.go
```

## QA steps

See PRs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants