-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Azure internal load balancer #43510
Conversation
Hi @karataliu. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@karataliu Hi! just a QQ ensureLoadBalancerDeleted() does not do anything when delete is happening on ilb? is this expected? wouldn't that leave dangling load balancing rules on the lib? |
Few questions, haven't looked at code yet:
|
There's a number of edge-cases here to consider too. For example, what happens if you create two services (one ILB, one ELB) w/ frontend port = 80 and backend port = 80? When they're on the same LB object, Azure will allow this if DSR is enabled. If they're on separate LB objects, even with DSR enabled on both, it's very possible that we will reject it. Can you confirm? |
Thanks for the review. For 'ensureLoadBalancerDeleted', please check the new file in the diff panel. At Line272 it figures out the loadbalancer name (based on internal or not), and then at Line281~Line302 it performs the same update logic taken from original code. Basically the call to 'reconcileLoadBalancer' will calculate the rule to be updated, and later code will apply the update. |
Thanks for the review. Q: Is the value of the annotation a bool, or is the desired private subnet IP? Seems like you describe it as both? A: The annotation value is a boolean only. We'll then look up 'loadBalancerIP' for the desired private subnet IP. Please check Note#3. Q: What happens if the user pre-supposes the private ip and it is already taken? Is the failure surfaced well to the user? A: This is a similar case to that of proposing a private IP which does not fall into the subnet IP range.
Which comes from
The controller will keep retrying. And if later at some time the assigned IP is freed, the service creation will eventually complete successfully. Q: What happens if you create two services (one ILB, one ELB) w/ frontend port = 80 and backend port = 80? A: Azure will allow ILB and ELB based on identical backend IPConfiguration to reuse the same backend port, as long as 'EnableFloatingIP' (DSR) option is set to true. |
service.Spec.Ports = []v1.ServicePort{} | ||
az.ensureLoadBalancerDeleted(clusterName, service, isInternal) | ||
|
||
sg, existsSg, err := az.getSecurityGroup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the following be moved to the overloaded func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've got two Azure LoadBalancer Resource (internal, external), but only one Network Security Group (bound to vnet).
'ensureLoadBalancerDeleted' is for reconciliation of rules under certain Azure LoadBalancer resource.
If the service requests an external LoadBalancer:
Upon creation, controller calls it (isInternal=true) on internal load balancer resource to ensure the related internal rules deleted (in case the service was switched from internal load balancer type).
Then controller calls 'reconcileSecurityGroup' (wantLb=true) to ensure new rules under Network Security Group.
Upon deletion, controller calls it (isInternal=false) on external load balancer resource to ensure the related external rules deleted.
Then controller calls 'reconcileSecurityGroup' (wantLb=false) to cleanup related rules under Network Security Group.
For internal LoadBalancer, it'll be. creation: 'isInternal=false', 'wantLb=true'. Deletion: 'isInternal=true', 'wantLb=false'.
That is, the parameter value of 'isInternal' on 'ensureLoadBalancerDeleted', is orthogonal to 'wantLb' on 'reconcileSecurityGroup'. In this case, they may be not well suited to combine.
I think we may do some refactor later to wrap the similar code logic around 'reconcileSecurityGroup' calls, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about passing in another flag, say, "cleanupSecurityGroup" which defaults to true? Another option is to pull the IP removal part out the original func also, and rename the overloaded one to something like "deleteLoadBalancerOnly" to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, the name 'ensureLoadBalancerDeleted' is somewhat confusing.
Will rename 'ensureLoadBalancerDeleted' to 'deleteLoadBalancerResource', and create a separate func for 'deletePublicIpResource'.
Current PR branch is conflicting with a newly merged change. Will update after rebasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added 'cleanupLoadBalancer' and 'cleanupPublicIP' in new commit.
@@ -105,7 +127,50 @@ func (az *Cloud) EnsureLoadBalancer(clusterName string, service *v1.Service, nod | |||
} | |||
} | |||
|
|||
lb, lbNeedsUpdate, err := az.reconcileLoadBalancer(lb, pip, clusterName, service, nodes) | |||
var lbIP *string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to move the added portion to a new func? This one is getting so much longer with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the function gets a bit long. But we'll also need to pass down the external parameters to the spited func.
I've got a try for the change: karataliu@3848fb4 , and could be evaluated later.
|
||
sourceRanges, err := serviceapi.GetLoadBalancerSourceRanges(service) | ||
if err != nil { | ||
return sg, false, err | ||
} | ||
var sourceAddressPrefixes []string | ||
if sourceRanges == nil || serviceapi.IsAllowAll(sourceRanges) { | ||
sourceAddressPrefixes = []string{"Internet"} | ||
if !requiresInternalLoadBalancer(service) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So sourceAddressPrefixes is nil for the internal case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and it would function like an empty array.
Traffics from vnet is allowed by default rule, thus there is no need to add an explicit one.
/unassign @gmarek |
@colemickens, @khenidak Can you help take a look? Thanks. |
A couple of nits, but this LGTM. Can you add an e2e test for this? Thanks! |
@brendandburns As for e2e test, I did a search around and find the following two related test cases: They are both testing cloud Load Balancer, and against certain cloud providers (gce, gke, aws). The change in this PR is limited to Azure provider only. But seems for now 'azure' as an E2E cloud provider is not ready. Do we need to add that support first? If so, shall we include it in the same PR? Could you please kindly suggest? Related: #22715 |
Regarding the failed test (Jenkins GCE e2e), the test history indicates it passed at first round. The newly added commit only changes log level, which doesn't seem to cause the break. Any idea how to get it rerun, or to deal with such a situation? Thanks. |
Update the links on previous comment: Adding E2E tests might require some extra work in test-infra project. Any suggestions on how to proceed with this PR? Thank you. |
@brendandburns Do you want to gate this on adding e2e tests, given there's no groundwork for e2e for Azure right now? (Not sure how much that entails) @karataliu We have the e2e tests running in Azure right now, without doing work in test-infra. Do you have to do work in test-infra just to add a new golang-based e2e test that we can then execute in our Jenkins system on Azure? |
@colemickens I just did some search around and found there's cloud testing related code lies in test-infra. It is fine to use other approaches. Thanks. |
@karataliu From my time in that repo, I believe that's more about describing the jobs that will run in various Jenkins setups to run the actual e2e tests. But the e2e tests all just live in the main Kubernetes code base. I don't think there's any changes needed, but it might be hard for you to run the tests without having access to a Jenkins box that is already running the tests in/against Azure. If that becomes a road-block, we can share details of the Jenkins jobs that we use so that others can create similar jobs to run the e2e ginkgo tests. |
fwiw, the tests are here in the main repo: https://github.com/kubernetes/kubernetes/tree/master/test/e2e I'm ok with merging this as is, and then adding a test as a follow on, but I'd really like to make sure we get some coverage on this code. /lgtm |
@k8s-bot cvm gce e2e test this |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: brendandburns, karataliu
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue (batch tested with PRs 44645, 44639, 43510) |
Thanks for the update. I'll keep a note on adding test back for the change. Thank you. |
var fipConfigurationProperties *network.FrontendIPConfigurationPropertiesFormat | ||
|
||
if isInternal { | ||
subnet, existsSubnet, err := az.getSubnet(az.VnetName, az.SubnetName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey guys, just a quick question regarding this.
Does the subnet need to be in the same resource group (in Azure) as the cluster?
This should not be the case given we can create the kubernetes cluster with an existing VNet in different resource groups.
Getting the following error (the subnet (internal-network/kube-machines) is in a different resource group, it is the same subnet that the agent machines are on):
Error creating load balancer (will retry): Failed to create load balancer for service default/rabbitmq: ensure(default/rabbitmq): lb(XXXXXX-internal) - failed to get subnet: internal-network/kube-machines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cloudprovider really isn't setup well to handle multi-resource-group things right now unfortunately. There's a larger task to revamp and use resource IDs everywhere instead of resource "names" that are assumed to be under a single RG.
Will this be added in 1.7? |
@dcieslak19973 this should land in 1.7 and is in the process of being backported to 1.6 |
…dbalancer Automatic merge from submit-queue Cherry pick azure internal loadbalancer Backports fixes for Azure load balancer: - [Add support for Azure internal load balancer](#43510) - [Add support for bring-your-own ip address for Services on Azure](#42034) **Release note**: ```release-note Add support for Azure internal load balancer. ```
Automatic merge from submit-queue Add E2E tests for Azure internal loadbalancer support, fix an issue for public IP resource deletion. **What this PR does / why we need it**: - Add E2E tests for Azure internal loadbalancer support: #43510 - Fix an issue that public IP resource not get deleted when switching from external loadbalancer to internal static loadbalancer. **Special notes for your reviewer**: 1. Add new Azure resource tag to Public IP resources to indicate kubernetes managed resources. Currently we determine whether the public IP resource should be deleted by looking at LoadBalancerIp property on spec. In the scenario 'Switching from external loadbalancer to internal loadbalancer with static IP', that value might have been updated for internal loadbalancer. So here we're to add an explicit tag for kubernetes managed resources. 2. Merge cleanupPublicIP logic into cleanupLoadBalancer **Release note**: NONE CC @brendandburns @colemickens
It was discovered that the Azure Internal Load Balancer doesn't open up the NSG associated with the Subnet where the Internal LB is created, unlike the Public LB. |
Automatic merge from submit-queue Add E2E tests for Azure internal loadbalancer support, fix an issue for public IP resource deletion. **What this PR does / why we need it**: - Add E2E tests for Azure internal loadbalancer support: kubernetes/kubernetes#43510 - Fix an issue that public IP resource not get deleted when switching from external loadbalancer to internal static loadbalancer. **Special notes for your reviewer**: 1. Add new Azure resource tag to Public IP resources to indicate kubernetes managed resources. Currently we determine whether the public IP resource should be deleted by looking at LoadBalancerIp property on spec. In the scenario 'Switching from external loadbalancer to internal loadbalancer with static IP', that value might have been updated for internal loadbalancer. So here we're to add an explicit tag for kubernetes managed resources. 2. Merge cleanupPublicIP logic into cleanupLoadBalancer **Release note**: NONE CC @brendandburns @colemickens
Which issue this PR fixes
Fixes #38901
What this PR does / why we need it:
This PR is to add support for Azure internal load balancer
Currently when exposing a serivce with LoadBalancer type, Azure provider would assume that it requires a public load balancer.
Thus it will request a public IP address resource, and expose the service via that public IP.
In this case we're not able to apply private IP addresses (within the cluster virtual network) for the service.
Special notes for your reviewer:
Clarification:
a. 'LoadBalancer' refers to an option for 'type' field under ServiceSpec. See https://kubernetes.io/docs/resources-reference/v1.5/#servicespec-v1
b. 'Azure LoadBalancer' refers a type of Azure resource. See https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-overview
For a single Azure LoadBalancer, all frontend ip should reference either a subnet or publicIpAddress, which means that it could be either an Internet facing load balancer or an internal one.
For current provider, it would create an Azure LoadBalancer with generated '${loadBalancerName}' for all services with 'LoadBalancer' type.
This PR introduces name '${loadBalancerName}-internal' for a separate Azure Load Balancer resource, used by all the service that requires internal load balancers.
This PR introduces a new annotation for the internal load balancer type behaviour:
a. When the annotaion value is set to 'false' or not set, it falls back to the original behaviour, assuming that user is requesting a public load balancer;
b. When the annotaion value is set to 'true', the following rule applies depending on 'loadBalancerIP' field on ServiceSpec:
Users may change the load balancer type by applying the annotation to the service at runtime.
In this case, the load balancer rule would need to be 'switched' between the internal one and external one.
For example, it we have a service with internal load balancer, and then user removes the annotation, making it to a public one. Before we creating rules in the public Azure LoadBalancer, we'll need to clean up rules in the internal Azure LoadBalancer.
Release note: