Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCE] Support internal load balancers #46663

Merged
merged 8 commits into from
Jun 5, 2017

Conversation

nicksardo
Copy link
Contributor

@nicksardo nicksardo commented May 31, 2017

What this PR does / why we need it:
Allows users to expose K8s services externally of the K8s cluster but within their GCP network.

Fixes #33483

Important User Notes:

  • This is a beta feature. ILB could be enabled differently in the future.
  • Requires nodes having version 1.7.0+ (ILB requires health checking and a health check endpoint on kube-proxy has just been exposed)
  • This cannot be used for intra-cluster communication. Do not call the load balancer IP from a K8s node/pod.
  • There is no reservation system for private IPs. You can specify a RFC 1918 address in loadBalancerIP field, but it could be lost to another VM or LB if service settings are modified.
  • If you're running an ingress, your existing loadbalancer backend service must be using BalancingMode type RATE - not UTILIZATION.
    • Option 1: With a 1.5.8+ or 1.6.4+ version master, delete all your ingresses, and re-create them.
    • Option 2: Migrate to a new cluster running 1.7.0. Considering ILB requires nodes with 1.7.0, this isn't a bad idea.
    • Option 3: Possible migration opportunity, but use at your own risk. More to come later.

Reviewer Notes:
Several files were renamed, so github thinks ~2k lines have changed. Review commits one-by-one to see the actual changes.

Release note:

Support creation of GCP Internal Load Balancers from Service objects

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 31, 2017
@nicksardo
Copy link
Contributor Author

/assign @bowei @MrHohn @thockin

@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 31, 2017
// ServiceAnnotationLoadBalancerInternal is the annotation used on a service with type LoadBalancer
// to signify that an Internal LB should be assembled instead of an External LB.
// Value should be set to "true" (string) to enable.
ServiceAnnotationLoadBalancerInternal = "cloud.google.com/load-balancer-internal"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth making this be cloud.google.com/load-balancer-type: (internal | external) and leave room for the other LB types that we don't currently represent at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue against having an option named "external" as one would be confused as to how it's different from not having the annotation at all. However, if we have different types of "external" LBs supported in the future, having this annotation would be a nice spot to surface them.

I'll make the change tomorrow; the only supported option will be "internal" or no annotation set. Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we are trying to keep all API related annotations in https://github.com/kubernetes/kubernetes/blob/master/pkg/api/annotation_key_constants.go and https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/annotation_key_constants.go. Is there any concern for moving it to the same place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum...I think the concern here is that ILB is GCE specific. May be we could move this out when other cloud providers would like to support ILB as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other cloud providers already have their "internal" lb's created and use cloud-specific annotations. There is an issue open discussing consolidating the annotations - let's not jump the gun on that effort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

// Delete forwarding rule if any settings are different
fwdRuleDeleted := false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move this closer to where it is recreated, to minimize disruption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll move instance group and health check creation above here. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of code - not so much tests... Are there any parts of this that could be serviced by unit tests?

Overall looks great. would like an extra set of eyes, if possible.

mc := newTargetPoolMetricContext("delete", region)
op, err := gce.service.TargetPools.Delete(gce.projectID, region, name).Do()

func (gce *GCECloud) DeleteExternalTargetPoolAndChecks(name, region string, hcNames ...string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why some of these are no public, but this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was public before - I just renamed it and merged it with the logic of the non-public func. https://github.com/kubernetes/kubernetes/pull/46663/files/b836aa75b780046cb290d3ab01f303ffb21ba316..10e59b429409a59c26f985f8a441626e169e37a0#diff-4cae6102def67324f04d6988b5aab0f6L527

Unfortunately, it's public because it's being used in resource cleanup in test/e2e/framework/utils.go.


func init() {
var err error
lbSrcRngsFlag.ipn, err = netsets.ParseIPNets([]string{"130.211.0.0/22", "35.191.0.0/16", "209.85.152.0/22", "209.85.204.0/22"}...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment please

@thockin
Copy link
Member

thockin commented May 31, 2017

/approve

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 31, 2017
return defaultZone
}
return zone
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI to reviewers: this logic was taken from the ingress controller.

@nicksardo
Copy link
Contributor Author

Lots of code - not so much tests... Are there any parts of this that could be serviced by unit tests?

Agreed, the lack of tests disturb me as well. I don't think we can run any meaningful unit tests without mocking a significant portion of the compute API. This PR is so large partly because I'm separating compute API calls from the load balancer logic in order to support mocking. I'd love to spend a week or two on that effort but would like to hear from other folks who work on the GCE cloudprovider.

For the immediate concern of this PR, I think the only unit tests worth considering are a few around resource naming. Would love to hear any other ideas...

@thockin
Copy link
Member

thockin commented May 31, 2017 via email

Copy link
Member

@MrHohn MrHohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't gone through all commits...Leave some comments ahead.

// ServiceAnnotationLoadBalancerInternal is the annotation used on a service with type LoadBalancer
// to signify that an Internal LB should be assembled instead of an External LB.
// Value should be set to "true" (string) to enable.
ServiceAnnotationLoadBalancerInternal = "cloud.google.com/load-balancer-internal"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we are trying to keep all API related annotations in https://github.com/kubernetes/kubernetes/blob/master/pkg/api/annotation_key_constants.go and https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/annotation_key_constants.go. Is there any concern for moving it to the same place?

// ServiceAnnotationLoadBalancerInternal is the annotation used on a service with type LoadBalancer
// to signify that an Internal LB should be assembled instead of an External LB.
// Value should be set to "true" (string) to enable.
ServiceAnnotationLoadBalancerInternal = "cloud.google.com/load-balancer-internal"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum...I think the concern here is that ILB is GCE specific. May be we could move this out when other cloud providers would like to support ILB as well.

@@ -35,13 +35,46 @@ import (
"k8s.io/kubernetes/pkg/cloudprovider"
)

const (
zoneKey = "failure-domain.beta.kubernetes.io/zone"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed changing this in first round of changes. Will add this later.


// MakeHealthCheckFirewallName returns the firewall name used by the GCE load
// balancers (l4) for performing health checks.
func MakeHealthCheckFirewallName(clusterID, hcName string, isNodesHealthCheck bool) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to make the function naming consistent (we have getXXX and makeXXX now) between ELB and ILB. I'm guessing you made changes in later commits but leave this comment just in case :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, we also have a mix of exported and private funcs... Those that are exported are used in test/e2e/framework/util.go. Reckon we should export all these funcs regardless of external use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed all to make... and kept the same export status as before.

balanceModeCONNECTION lbBalancingMode = "CONNECTION"
)

type lbBalancingMode string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a link to the public documentations? Obviously I already forgot what those two modes mean :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ports, protocol := getPortsAndProtocol(svc.Spec.Ports)
scheme := schemeInternal
loadBalancerName := cloudprovider.GetLoadBalancerName(svc)
specialHCPath, specialHCPort := v1_service.GetServiceHealthCheckPathPort(svc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling these specialXXX seems not specific enough...Probably localXXX or localTrafficXXX?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think checking either one of Path or Port should be enough? Just saw you are only checking Port ==0 in another function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return nil, err
}

// Get health check settings assuming it's an OnlyLocal service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment looks a bit confusing, which exact settings is it referring to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return err
}

func (gce *GCECloud) ensureInternalLoadBalancerDeleted(clusterName, clusterID string, svc *v1.Service) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't see the sharedResourceLock is used, did it turn out that ILB management does not need locking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding me. Will add the lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but needs testing.

return ports, protocol
}

func (gce *GCECloud) getBSLink(name string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just name it getBackendServiceLink?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this? Java?!?!

Haha. I'll rename it, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

v, err := gce.service.GlobalAddresses.Get(gce.projectID, name).Do()
return v, mc.Observe(err)
}

func (gce *GCECloud) ReserveRegionAddress(addr *compute.Address, region string) (*compute.Address, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// DeleteRegionForwardingRule deletes the RegionalForwardingRule by name & region.
func (gce *GCECloud) DeleteRegionForwardingRule(name, region string) error {
mc := newForwardingRuleMetricContext("delete", "")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set region in each metric context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return a
}
return b
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take this out... Was being used in naming code that didn't make the final cut.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


func (gce *GCECloud) ListZonesInRegion(region string) ([]*compute.Zone, error) {
mc := newZonesMetricContext("list", region)
filter := fmt.Sprintf("region eq https://www.googleapis.com/compute/v1/projects/%v/regions/%v", gce.projectID, region)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate region link creation into own func in gce_util.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return gce.GetRegionAddress(addr.Name, region)
}

// DeleteGlobalStaticIP deletes a global static IP by name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match comment with func name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return v, mc.Observe(err)
}

func (gce *GCECloud) ReserveRegionAddress(addr *compute.Address, region string) (*compute.Address, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


func (gce *GCECloud) updateInternalLoadBalancer(clusterName, clusterID string, service *v1.Service, nodes []*v1.Node) error {
igName := getInstanceGroupName(clusterID)
_, err := gce.ensureInstanceGroups(igName, nodes)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update backend service if a new group is created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return err
}

if hcInUse {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment explaining this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if err != nil && !isNotFound(err) {
return err
}
// Create expected firewall
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unnecessary comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if err != nil {
return err
}
err = gce.ensureFirewall(fwName, fwDesc, sourceRanges.StringSlice(), ports, protocol, nodes)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use loadBalancerName and remove fwName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


func (gce *GCECloud) ensureHealthCheck(name, path string, port int32) (*compute.HealthCheck, error) {
glog.V(2).Infof("ensureHealthCheck(%v, %v, %v): checking existing health check", name, path, port)
wantedHC := newInternalLBHealthCheck(name, path, port)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename all wantedXXX to expectedXXX

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

wantedHC := newInternalLBHealthCheck(name, path, port)

hc, err := gce.GetHealthCheck(name)
if hc == nil || err != nil && isNotFound(err) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied logic from loadbalancer_external.

Separate into two code blocks. One checking for non-NotFound errors; another for handling hc == nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for _, igLink := range igLinks {
backends = append(backends, &compute.Backend{
Group: igLink,
BalancingMode: string(balancingMode),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment saying that balancing mode settings like MaxConnections and MaxRate cannot be used with ILB (currently).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


wantedBS := &compute.BackendService{
Name: name,
Region: gce.region,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Region. This is output only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mikedanese
Copy link
Member

@k8s-bot test this

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 4, 2017
@nicksardo
Copy link
Contributor Author

@k8s-bot pull-kubernetes-federation-e2e-gce test this

@nicksardo
Copy link
Contributor Author

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

@nicksardo
Copy link
Contributor Author

@mikedanese Can you re-lgtm this? I fixed it in the last commit. Thanks

@MrHohn
Copy link
Member

MrHohn commented Jun 4, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 4, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MrHohn, bowei, nicksardo, thockin

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@MrHohn
Copy link
Member

MrHohn commented Jun 4, 2017

@k8s-bot pull-kubernetes-federation-e2e-gce test this

@nicksardo
Copy link
Contributor Author

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

Federation e2e seems to be failing consistently. Will re-test that when there's hope.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jun 5, 2017

@nicksardo: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-federation-e2e-gce 025f178 link @k8s-bot pull-kubernetes-federation-e2e-gce test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@nicksardo
Copy link
Contributor Author

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this
@k8s-bot pull-kubernetes-federation-e2e-gce test this

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 46550, 46663, 46816, 46820, 46460)

@k8s-github-robot k8s-github-robot merged commit 4faf7f1 into kubernetes:master Jun 5, 2017
@myaghini
Copy link

myaghini commented Jun 6, 2017

Is there any documentation how to use this new feature?

@bowei
Copy link
Member

bowei commented Jun 6, 2017

We will be documenting it as part of the 1.7 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.