Skip to content

Conversation

@robscott
Copy link
Member

What type of PR is this?
/kind bug

What this PR does / why we need it:
This should ensure the EndpointSlice controller does not try to modify shared objects.

Which issue(s) this PR fixes:
Fixes #85364

Does this PR introduce a user-facing change?:

Fix bug where EndpointSlice controller would attempt to modify shared objects.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
Enhancement Issue: kubernetes/enhancements#752

/sig network
/priority critical-urgent
/cc @liggitt @freehan

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/network Categorizes an issue or PR as relevant to SIG Network. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Nov 15, 2019
@liggitt
Copy link
Member

liggitt commented Nov 16, 2019

picked into #85350 for testing with the cache mutation detector on

@liggitt
Copy link
Member

liggitt commented Nov 16, 2019

still seeing crashes with this change:

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/85350/pull-kubernetes-e2e-kind/1195497928295190529/artifacts/logs/kind-control-plane/pods/kube-system_kube-controller-manager-kind-control-plane_9f50debdb43e5bbd38bd6249029d9c61/kube-controller-manager/0.log

2019-11-16T00:31:19.606326793Z stdout F CACHE *v1beta1.EndpointSlice[3] ALTERED!
2019-11-16T00:31:19.606368305Z stdout F (*v1beta1.EndpointSlice)(0xc0013c6f20)({                                                                             (*v1beta1.EndpointSlice)(0xc0013c7080)({
2019-11-16T00:31:19.60638766Z stdout F  TypeMeta: (v1.TypeMeta) {                                                                                            TypeMeta: (v1.TypeMeta) {
2019-11-16T00:31:19.606403147Z stdout F   Kind: (string) "",                                                                                                   Kind: (string) "",
2019-11-16T00:31:19.606409744Z stdout F   APIVersion: (string) ""                                                                                              APIVersion: (string) ""
2019-11-16T00:31:19.606417476Z stdout F  },                                                                                                                   },
2019-11-16T00:31:19.606426795Z stdout F  ObjectMeta: (v1.ObjectMeta) {                                                                                        ObjectMeta: (v1.ObjectMeta) {
2019-11-16T00:31:19.606433799Z stdout F   Name: (string) (len=14) "kube-dns-l8dhk",                                                                            Name: (string) (len=14) "kube-dns-l8dhk",
2019-11-16T00:31:19.606442041Z stdout F   GenerateName: (string) (len=9) "kube-dns-",                                                                          GenerateName: (string) (len=9) "kube-dns-",
2019-11-16T00:31:19.606448691Z stdout F   Namespace: (string) (len=11) "kube-system",                                                                          Namespace: (string) (len=11) "kube-system",
2019-11-16T00:31:19.606457069Z stdout F   SelfLink: (string) (len=83) "/apis/discovery.k8s.io/v1beta1/namespaces/kube-system/endpointslices/kube-dns-l8dhk",   SelfLink: (string) (len=83) "/apis/discovery.k8s.io/v1beta1/namespaces/kube-system/endpointslices/kube-dns-l8dhk",
2019-11-16T00:31:19.606467652Z stdout F   UID: (types.UID) (len=36) "3fc20327-fa49-4202-a64f-d8eeff8db65e",                                                    UID: (types.UID) (len=36) "3fc20327-fa49-4202-a64f-d8eeff8db65e",
2019-11-16T00:31:19.606480428Z stdout F   ResourceVersion: (string) (len=3) "527",                                                                             ResourceVersion: (string) (len=3) "527",
2019-11-16T00:31:19.606487833Z stdout F   Generation: (int64) 2,                                                                                               Generation: (int64) 2,
2019-11-16T00:31:19.606494861Z stdout F   CreationTimestamp: (v1.Time) {                                                                                       CreationTimestamp: (v1.Time) {
2019-11-16T00:31:19.606501804Z stdout F    Time: (time.Time) {                                                                                                  Time: (time.Time) {
2019-11-16T00:31:19.606508056Z stdout F     wall: (uint64) 0,                                                                                                    wall: (uint64) 0,
2019-11-16T00:31:19.606524209Z stdout F     ext: (int64) 63709461061,                                                                                            ext: (int64) 63709461061,
2019-11-16T00:31:19.606530656Z stdout F     loc: (*time.Location)(0x6b3b1c0)({                                                                                   loc: (*time.Location)(0x6b3b1c0)({
2019-11-16T00:31:19.606536616Z stdout F      name: (string) (len=3) "UTC",                                                                                        name: (string) (len=3) "UTC",
2019-11-16T00:31:19.606542149Z stdout F      zone: ([]time.zone) <nil>,                                                                                           zone: ([]time.zone) <nil>,
2019-11-16T00:31:19.606548239Z stdout F      tx: ([]time.zoneTrans) <nil>,                                                                                        tx: ([]time.zoneTrans) <nil>,
2019-11-16T00:31:19.606553605Z stdout F      cacheStart: (int64) 0,                                                                                               cacheStart: (int64) 0,
2019-11-16T00:31:19.606559055Z stdout F      cacheEnd: (int64) 0,                                                                                                 cacheEnd: (int64) 0,
2019-11-16T00:31:19.606564711Z stdout F      cacheZone: (*time.zone)(<nil>)                                                                                       cacheZone: (*time.zone)(<nil>)
2019-11-16T00:31:19.606570711Z stdout F     })                                                                                                                   })
2019-11-16T00:31:19.606577998Z stdout F    }                                                                                                                    }
2019-11-16T00:31:19.606584141Z stdout F   },                                                                                                                   },
2019-11-16T00:31:19.606603028Z stdout F   DeletionTimestamp: (*v1.Time)(<nil>),                                                                                DeletionTimestamp: (*v1.Time)(<nil>),
2019-11-16T00:31:19.606609402Z stdout F   DeletionGracePeriodSeconds: (*int64)(<nil>),                                                                         DeletionGracePeriodSeconds: (*int64)(<nil>),
2019-11-16T00:31:19.606615469Z stdout F   Labels: (map[string]string) (len=2) {                                                                                Labels: (map[string]string) (len=2) {
2019-11-16T00:31:19.606621968Z stdout F    (string) (len=38) "endpointslice.kubernetes.io/managed-by": (string) (len=31) "endpointslice-controller.k8s.io",     (string) (len=26) "kubernetes.io/service-name": (string) (len=8) "kube-dns",
2019-11-16T00:31:19.606628983Z stdout F    (string) (len=26) "kubernetes.io/service-name": (string) (len=8) "kube-dns"                                          (string) (len=38) "endpointslice.kubernetes.io/managed-by": (string) (len=31) "endpointslice-controller.k8s.io"
2019-11-16T00:31:19.606635292Z stdout F   },                                                                                                                   },
2019-11-16T00:31:19.606641283Z stdout F   Annotations: (map[string]string) {                                                                                   Annotations: (map[string]string) (len=1) {
2019-11-16T00:31:19.60664745Z stdout F   },                                                                                                                    (string) (len=48) "endpoints.kubernetes.io/last-change-trigger-time": (string) (len=20) "2019-11-16T00:31:16Z"
2019-11-16T00:31:19.606653013Z stdout F   OwnerReferences: ([]v1.OwnerReference) (len=1 cap=1) {                                                               },
2019-11-16T00:31:19.606658241Z stdout F    (v1.OwnerReference) {                                                                                               OwnerReferences: ([]v1.OwnerReference) (len=1 cap=1) {
2019-11-16T00:31:19.606669521Z stdout F     APIVersion: (string) (len=2) "v1",                                                                                  (v1.OwnerReference) {
2019-11-16T00:31:19.606677021Z stdout F     Kind: (string) (len=7) "Service",                                                                                    APIVersion: (string) (len=2) "v1",
2019-11-16T00:31:19.606683014Z stdout F     Name: (string) (len=8) "kube-dns",                                                                                   Kind: (string) (len=7) "Service",
2019-11-16T00:31:19.606689159Z stdout F     UID: (types.UID) (len=36) "d5a0a0f1-dd3f-405b-bd7a-59f096a98115",                                                    Name: (string) (len=8) "kube-dns",
2019-11-16T00:31:19.606694995Z stdout F     Controller: (*bool)(0xc0013bd9fa)(true),                                                                             UID: (types.UID) (len=36) "d5a0a0f1-dd3f-405b-bd7a-59f096a98115",
2019-11-16T00:31:19.606700576Z stdout F     BlockOwnerDeletion: (*bool)(0xc0013bd9fb)(true)                                                                      Controller: (*bool)(0xc0013bda54)(true),
2019-11-16T00:31:19.606706019Z stdout F    }                                                                                                                     BlockOwnerDeletion: (*bool)(0xc0013bda55)(true)
2019-11-16T00:31:19.606711884Z stdout F   },                                                                                                                    }
2019-11-16T00:31:19.606718107Z stdout F   Finalizers: ([]string) <nil>,                                                                                        },
2019-11-16T00:31:19.606724658Z stdout F   ClusterName: (string) "",                                                                                            Finalizers: ([]string) <nil>,
2019-11-16T00:31:19.606730504Z stdout F   ManagedFields: ([]v1.ManagedFieldsEntry) <nil>                                                                       ClusterName: (string) "",
2019-11-16T00:31:19.606737714Z stdout F  },                                                                                                                    ManagedFields: ([]v1.ManagedFieldsEntry) <nil>
2019-11-16T00:31:19.606743821Z stdout F  AddressType: (v1beta1.AddressType) (len=4) "IPv4",                                                                   },
2019-11-16T00:31:19.606749852Z stdout F  Endpoints: ([]v1beta1.Endpoint) (len=2 cap=2) {                                                                      AddressType: (v1beta1.AddressType) (len=4) "IPv4",
2019-11-16T00:31:19.606760204Z stdout F   (v1beta1.Endpoint) {                                                                                                Endpoints: ([]v1beta1.Endpoint) (len=1 cap=1) {
2019-11-16T00:31:19.606776489Z stdout F    Addresses: ([]string) (len=1 cap=1) {                                                                               (v1beta1.Endpoint) {
2019-11-16T00:31:19.606782765Z stdout F     (string) (len=10) "10.244.0.3"                                                                                      Addresses: ([]string) (len=1 cap=1) {
2019-11-16T00:31:19.60678887Z stdout F    },                                                                                                                    (string) (len=10) "10.244.0.3"
2019-11-16T00:31:19.606794604Z stdout F    Conditions: (v1beta1.EndpointConditions) {                                                                           },
2019-11-16T00:31:19.606800989Z stdout F     Ready: (*bool)(0xc0013bda0a)(false)                                                                                 Conditions: (v1beta1.EndpointConditions) {
2019-11-16T00:31:19.606811443Z stdout F    },                                                                                                                    Ready: (*bool)(0xc0013bda56)(false)
2019-11-16T00:31:19.606817136Z stdout F    Hostname: (*string)(<nil>),                                                                                          },
2019-11-16T00:31:19.606828439Z stdout F    TargetRef: (*v1.ObjectReference)(0xc0004e51f0)({                                                                     Hostname: (*string)(<nil>),
2019-11-16T00:31:19.606834211Z stdout F     Kind: (string) (len=3) "Pod",                                                                                       TargetRef: (*v1.ObjectReference)(0xc0004e5260)({
2019-11-16T00:31:19.606858736Z stdout F     Namespace: (string) (len=11) "kube-system",                                                                          Kind: (string) (len=3) "Pod",
2019-11-16T00:31:19.606868259Z stdout F     Name: (string) (len=24) "coredns-6955765f44-8gf82",                                                                  Namespace: (string) (len=11) "kube-system",
2019-11-16T00:31:19.606873887Z stdout F     UID: (types.UID) (len=36) "adda3c83-d927-4726-9c6d-dad9277f1eb7",                                                    Name: (string) (len=24) "coredns-6955765f44-8gf82",
2019-11-16T00:31:19.606879735Z stdout F     APIVersion: (string) "",                                                                                             UID: (types.UID) (len=36) "adda3c83-d927-4726-9c6d-dad9277f1eb7",
2019-11-16T00:31:19.606885636Z stdout F     ResourceVersion: (string) (len=3) "525",                                                                             APIVersion: (string) "",
2019-11-16T00:31:19.606890953Z stdout F     FieldPath: (string) ""                                                                                               ResourceVersion: (string) (len=3) "525",
2019-11-16T00:31:19.606904573Z stdout F    }),                                                                                                                   FieldPath: (string) ""
2019-11-16T00:31:19.606911072Z stdout F    Topology: (map[string]string) (len=1) {                                                                              }),
2019-11-16T00:31:19.606916593Z stdout F     (string) (len=22) "kubernetes.io/hostname": (string) (len=18) "kind-control-plane"                                  Topology: (map[string]string) (len=1) {
2019-11-16T00:31:19.606922382Z stdout F    }                                                                                                                     (string) (len=22) "kubernetes.io/hostname": (string) (len=18) "kind-control-plane"
2019-11-16T00:31:19.606929498Z stdout F   },                                                                                                                    }
2019-11-16T00:31:19.606935812Z stdout F   (v1beta1.Endpoint) {                                                                                                 }
2019-11-16T00:31:19.606942763Z stdout F    Addresses: ([]string) (len=1 cap=1) {                                                                              },
2019-11-16T00:31:19.606949916Z stdout F     (string) (len=10) "10.244.0.2"                                                                                    Ports: ([]v1beta1.EndpointPort) (len=3 cap=3) {
2019-11-16T00:31:19.60695714Z stdout F    },                                                                                                                  (v1beta1.EndpointPort) {
2019-11-16T00:31:19.606963453Z stdout F    Conditions: (v1beta1.EndpointConditions) {                                                                           Name: (*string)(0xc000a4c800)((len=7) "dns-tcp"),
2019-11-16T00:31:19.606969995Z stdout F     Ready: (*bool)(0xc0016bcc70)(false)                                                                                 Protocol: (*v1.Protocol)(0xc000a4c810)((len=3) "TCP"),
2019-11-16T00:31:19.606978895Z stdout F    },                                                                                                                   Port: (*int32)(0xc0013bda58)(53),
2019-11-16T00:31:19.606984201Z stdout F    Hostname: (*string)(<nil>),                                                                                          AppProtocol: (*string)(<nil>)
2019-11-16T00:31:19.606995364Z stdout F    TargetRef: (*v1.ObjectReference)(0xc000464ee0)({                                                                    },
2019-11-16T00:31:19.607001803Z stdout F     Kind: (string) (len=3) "Pod",                                                                                      (v1beta1.EndpointPort) {
2019-11-16T00:31:19.60700834Z stdout F     Namespace: (string) (len=11) "kube-system",                                                                         Name: (*string)(0xc000a4c820)((len=7) "metrics"),
2019-11-16T00:31:19.607013998Z stdout F     Name: (string) (len=24) "coredns-6955765f44-xtvhs",                                                                 Protocol: (*v1.Protocol)(0xc000a4c830)((len=3) "TCP"),
2019-11-16T00:31:19.607019661Z stdout F     UID: (types.UID) (len=36) "e00b7e8e-25c9-482d-b188-40ef48fdc418",                                                   Port: (*int32)(0xc0013bda5c)(9153),
2019-11-16T00:31:19.607025327Z stdout F     APIVersion: (string) "",                                                                                            AppProtocol: (*string)(<nil>)
2019-11-16T00:31:19.607030833Z stdout F     ResourceVersion: (string) (len=3) "528",                                                                           },
2019-11-16T00:31:19.607036576Z stdout F     FieldPath: (string) ""                                                                                             (v1beta1.EndpointPort) {
2019-11-16T00:31:19.607041914Z stdout F    }),                                                                                                                  Name: (*string)(0xc000a4c840)((len=3) "dns"),
2019-11-16T00:31:19.607047383Z stdout F    Topology: (map[string]string) (len=1) {                                                                              Protocol: (*v1.Protocol)(0xc000a4c850)((len=3) "UDP"),
2019-11-16T00:31:19.607052904Z stdout F     (string) (len=22) "kubernetes.io/hostname": (string) (len=18) "kind-control-plane"                                  Port: (*int32)(0xc0013bda60)(53),
2019-11-16T00:31:19.607058292Z stdout F    }                                                                                                                    AppProtocol: (*string)(<nil>)
2019-11-16T00:31:19.607063769Z stdout F   }                                                                                                                    }
2019-11-16T00:31:19.607069689Z stdout F  },                                                                                                                   }
2019-11-16T00:31:19.607075292Z stdout F  Ports: ([]v1beta1.EndpointPort) (len=3 cap=4) {                                                                     })
2019-11-16T00:31:19.607080867Z stdout F   (v1beta1.EndpointPort) {                                                                                           
2019-11-16T00:31:19.607086479Z stdout F    Name: (*string)(0xc000a4c780)((len=7) "dns-tcp"),                                                                 
2019-11-16T00:31:19.607091459Z stdout F    Protocol: (*v1.Protocol)(0xc000a4c790)((len=3) "TCP"),                                                            
2019-11-16T00:31:19.607096964Z stdout F    Port: (*int32)(0xc0013bda2c)(53),                                                                                 
2019-11-16T00:31:19.607101835Z stdout F    AppProtocol: (*string)(<nil>)                                                                                     
2019-11-16T00:31:19.607108619Z stdout F   },                                                                                                                 
2019-11-16T00:31:19.607114255Z stdout F   (v1beta1.EndpointPort) {                                                                                           
2019-11-16T00:31:19.607120097Z stdout F    Name: (*string)(0xc000a4c7a0)((len=7) "metrics"),                                                                 
2019-11-16T00:31:19.607131104Z stdout F    Protocol: (*v1.Protocol)(0xc000a4c7b0)((len=3) "TCP"),                                                            
2019-11-16T00:31:19.607136552Z stdout F    Port: (*int32)(0xc0013bda3c)(9153),                                                                               
2019-11-16T00:31:19.607142328Z stdout F    AppProtocol: (*string)(<nil>)                                                                                     
2019-11-16T00:31:19.607147588Z stdout F   },                                                                                                                 
2019-11-16T00:31:19.607153054Z stdout F   (v1beta1.EndpointPort) {                                                                                           
2019-11-16T00:31:19.607169788Z stdout F    Name: (*string)(0xc000a4c7c0)((len=3) "dns"),                                                                     
2019-11-16T00:31:19.607176893Z stdout F    Protocol: (*v1.Protocol)(0xc000a4c7d0)((len=3) "UDP"),                                                            
2019-11-16T00:31:19.607182489Z stdout F    Port: (*int32)(0xc0013bda48)(53),                                                                                 
2019-11-16T00:31:19.607188681Z stdout F    AppProtocol: (*string)(<nil>)                                                                                     
2019-11-16T00:31:19.60719769Z stdout F   }                                                                                                                  
2019-11-16T00:31:19.607204215Z stdout F  }                                                                                                                   
2019-11-16T00:31:19.607209838Z stdout F })                                                                                                                   
2019-11-16T00:31:19.607214853Z stdout F                                                                                                                      
2019-11-16T00:31:19.607220092Z stdout F 
2019-11-16T00:31:19.609708967Z stderr F panic: cache *v1beta1.EndpointSlice modified

@liggitt
Copy link
Member

liggitt commented Nov 16, 2019

annotations were still changed:

2019-11-16T00:31:19.606641283Z stdout F   Annotations: (map[string]string) {                                                                                   Annotations: (map[string]string) (len=1) {
2019-11-16T00:31:19.60664745Z stdout F   }, 

as were endpoints and ports

@liggitt
Copy link
Member

liggitt commented Nov 16, 2019

I think reconcileByPortMapping can append to existing Endpoints arrays, which could have belonged to a pre-existing shared endpointslice:

		// Iterate through slices and fill them up with desired endpoints.
		for _, slice := range slices {
			for desiredSet.Len() > 0 && len(slice.Endpoints) < int(r.maxEndpointsPerSlice) {
				endpoint, _ := desiredSet.PopAny()
				slice.Endpoints = append(slice.Endpoints, *endpoint)
			}
		}
...
		// Fill the slice up with remaining endpoints.
		for desiredSet.Len() > 0 && len(sliceToFill.Endpoints) < int(r.maxEndpointsPerSlice) {
			endpoint, _ := desiredSet.PopAny()
			sliceToFill.Endpoints = append(sliceToFill.Endpoints, *endpoint)
		}

basically, any time we're assigning or appending to anything, we need to be sure we own it

@liggitt
Copy link
Member

liggitt commented Nov 16, 2019

I'd recommend adding a mini cache-mutation detector step in the endpointslice unit tests that you can snapshot all the endpointslices in the store before each sync, then check if they've been mutated after running a sync

@robscott robscott force-pushed the endpointslice-deepcopy branch from 5f46d26 to 8d467a3 Compare November 16, 2019 01:00
@robscott robscott force-pushed the endpointslice-deepcopy branch from 8d467a3 to a8c81ea Compare November 16, 2019 01:04
@liggitt
Copy link
Member

liggitt commented Nov 16, 2019

I'm running #85350 with the commit that disables endpoint slices now, fyi.

If you want to pick in my three commits from that PR into this one to get mutation detection in e2e, that might be good. I do think you should add the checks to the unit tests, since those might hit scenarios the e2es would not.

We also want to be careful we don't get too clever about being efficient and make this uninspectable. The more places we stick broken-apart pieces, then later append to them, the greater the chance of us missing something.

@robscott
Copy link
Member Author

robscott commented Nov 16, 2019

@liggitt No idea how to write a cache-mutation detector, it does sound incredibly useful though, any examples of something like that? It would certainly be trivial to deep copy all EndpointSlices at the start of reconcile, so if it's important this gets in soon, I can always do that, but obviously that's not good for performance at scale. I've been focusing my efforts on anytime the list of slices to update is added to, as those slices will change. The last update I pushed catches 1 more place that results in an addition to the list of slices to update, but can't say for sure that's actually everything. Would love to get this into unit tests.

@liggitt
Copy link
Member

liggitt commented Nov 16, 2019

simplest way would be to do something like https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/mutation_detector.go#L97-L132 (though you don't need a background goroutine in a deterministic unit test)

can construct one, add all existing endpointslices in the fake client before sync, then call check after sync.

type cacheCheck struct {
  objects []cacheObject
}
type cacheObject struct {
  original runtime.Object
  deepCopy runtime.Object
}
func (c *cacheCheck) Add(o runtime.Object) {
  c.objects = append(c.objects, cacheObject{
    original: o,
    deepCopy: o.DeepCopyObject(),
  })
}
func (c *cacheCheck) Check(t *testing.T) {
  for _, o := range c.objects {
    if !reflect.DeepEqual(o.original, o.deepCopy) {
      t.Errorf(... error showing diff ...)
    }
  }
}

@robscott robscott force-pushed the endpointslice-deepcopy branch from a8c81ea to 57d3bfb Compare November 23, 2019 01:04
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 23, 2019
@robscott
Copy link
Member Author

Hey @liggitt, thanks for the example implementation! I've added a version of that to the existing EndpointSlice controller tests. It was able to catch the problems I'd already caught in this PR, but did not find any additional issues. Would appreciate another round of review whenever you have some time.

@robscott robscott force-pushed the endpointslice-deepcopy branch from 57d3bfb to f81e9a4 Compare November 23, 2019 01:13
@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Nov 23, 2019
@robscott
Copy link
Member Author

/retest

@robscott
Copy link
Member Author

/test pull-kubernetes-e2e-gce-alpha-features
/test pull-kubernetes-e2e-gce-100-performance

@robscott
Copy link
Member Author

/retest

@liggitt
Copy link
Member

liggitt commented Nov 24, 2019

It is expected the performance tests will fail with the mutation cache checker force enabled.

Now that the feature is disabled by default, you aren't exercising endpointslice in the e2e tests. I'd recommend dropping my three commits out of this PR at this point.

@robscott robscott force-pushed the endpointslice-deepcopy branch from ddb5aca to 7681b35 Compare November 24, 2019 04:43
@robscott
Copy link
Member Author

Thanks for clarifying, somehow I'd thought the alpha test suite I ran on this PR after adding your commits would cover that. Pulled those commits back out.

@liggitt
Copy link
Member

liggitt commented Nov 24, 2019

what does our line/branch coverage look like from the unit tests? I still have a really hard time following the objects around the sync method's call graph to be sure there are no latent mutations.

@robscott robscott force-pushed the endpointslice-deepcopy branch from 7681b35 to 4229b99 Compare November 24, 2019 04:58
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 24, 2019
@robscott
Copy link
Member Author

  1. EndpointSlices are listed here in the syncService here: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/endpointslice/endpointslice_controller.go#L296
  2. They are passed into reconciler.reconcile(), the only place that can modify EndpointSlices in this flow: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/endpointslice/endpointslice_controller.go#L312

This PR adds tests to both endpointslice_controller_test.go to cover the overarching syncService call in the controller, and then significantly more to the reconciler_test.go, all covering that reconciler.reconcile() call, with a fairly wide variety of inputs. There's certainly potential that I've missed some strange combination of inputs in the tests that could somehow result in a mutation, but I can't think of anything right now. At the very least, this PR solves some very real potential for EndpointSlice mutation and adds tests to ensure that no mutations should be added in the future.

@liggitt
Copy link
Member

liggitt commented Nov 24, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 24, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, robscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 24, 2019
@k8s-ci-robot k8s-ci-robot merged commit 9905a33 into kubernetes:master Nov 24, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Nov 24, 2019
k8s-ci-robot added a commit that referenced this pull request Nov 25, 2019
…pstream-release-1.17

Automated cherry pick of #85368: Deep copying EndpointSlices in reconciler before modifying
k8s-ci-robot added a commit that referenced this pull request Feb 5, 2020
#85368-upstream-release-1.17

Automated cherry pick of #85703: Fixing Potential Race Condition in EndpointSlice Controller. #85368: Deep copying EndpointSlices in reconciler before modifying
@robscott robscott deleted the endpointslice-deepcopy branch March 11, 2021 04:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/network Categorizes an issue or PR as relevant to SIG Network. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EndpointSlice controller modifies annotations of shared object

3 participants