-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure subsets are updated when metadata changes #3840
Conversation
This is a follow-up for envoyproxy#3810. Now that updates can be fired without hosts being added or removed, we need to handle this case and check if subsets need to be added or removed when only metadata of existing endpoints has changed. Fixes envoyproxy#3839 Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
source/common/upstream/subset_lb.cc
Outdated
if (!host_set->hosts().size()) { | ||
continue; | ||
} | ||
update(host_set->priority(), host_set->hosts(), {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this is correct, because you haven't removed the copy of the hosts that are stored in the hierarchy with the old metadata. So a given host will still be mapped to it's previous subset. Not clear how you fix that, except to write some code to manually remove all the subsets and rebuild them all, which seems like a pain.
I thought I read a comment that said having updateDynamicHostList return true caused a full rebuild of the load balancer? If that's true, it seems like this shouldn't matter, but I guess it actually does just attempt to update the hosts. Given that it actually comes through as empty update, this is an argument for having metadata changes look like host removal and addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, what #3810 will trigger is a call to HostSetImpl::updateHosts()
:
https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/eds.cc#L203
which eventually ends up calling SubsetLoadBalancer::update()
via runUpdateCallbacks()
.
I think we'll refilter hosts in SubsetLoadBalancer::HostSubsetImpl::update()
:
https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/subset_lb.cc#L454
which should give us an updated subset... no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zuercher hold on, if hosts are still mapped to their old subset how come the assertions in my test case pass (e.g.: number of active & removed subsets)? Unless that's unconvering a new bug, that shouldn't be the case... :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zuercher the rebuild only happens for dumb LBs; for subset LB, the above update()
is invoked and it gets to do the more performant thing with hosts_added/hosts_removed
.
I do think this would be a lot simpler to model as a remove/add; @rgs1 what was the concern with this? Forcing a drain and stats reset? Could we just remove/add from the PoV of the cluster membership (i.e. in the hosts_added/hosts_removed
parameters) while not actually nuking the underlying host object in https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/upstream_impl.cc#L682?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the concern within #3810? a few things: draining, stats reset, making updateDynamicHostList()
too complex...
i can't come up with a super strong argument now, but i suspect we'll have more friction in the future if we conflate metadata updates with adds/removals....
i'll think about it a bit more, but i am also a bit tight on time budget, so happy to explore other alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we implement this as a pure host add? It looks like the delta in your PR is just doing this; you're basically treating hosts_added
as hosts_added_or_modified
in the recursion. Should we rename the field to indicate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, actually — I think we should move this into addMemberUpdateCb()
:
https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/subset_lb.cc#L41
which will make things more obvious. I'll push something in a bit.
Also, ensure the updated versions don't get matched anymore. Signed-off-by: Raul Gutierrez Segales <[email protected]>
Add more transitions (which simulate prod better). Signed-off-by: Raul Gutierrez Segales <[email protected]>
[ci skip] Signed-off-by: Raul Gutierrez Segales <[email protected]>
source/common/upstream/subset_lb.cc
Outdated
// It's possible that metadata changed, without hosts being added nor removed. | ||
// If so we need to add any new subsets, remove unused ones, and regroup hosts into | ||
// the right subsets. | ||
if (!hosts_added.size() && !hosts_removed.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that this logic doesn't work if the endpoints being updated had some additions/removals with other endpoints remaining the same but with only metadata changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow what combination of changes you are referring to.. Are you saying:
- metadata changes for some of the existing endpoints
- new endpoints are added (they must be different from the existing one, given how we operate now)
- some endpoints are removed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easiest way to verify is another test-case, but I believe the question here is:
If there were additions/removals for some endpoints, and metadata-only changes for others
- this block would not be taken, since the size did change
- the hosts added/removed would be taken care of down below
- the metadata-only changes are not addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now I got what you were asking thanks to @derekargueta. If there's an add/removal and metadata changes for existing endpoints we won't miss it. Because as mentioned above, we refilter hosts in SubsetLoadBalancer::HostSubsetImpl::update()
...
Let me throw in another unit test and add a comment about this.
source/common/upstream/subset_lb.cc
Outdated
if (!host_set->hosts().size()) { | ||
continue; | ||
} | ||
update(host_set->priority(), host_set->hosts(), {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zuercher the rebuild only happens for dumb LBs; for subset LB, the above update()
is invoked and it gets to do the more performant thing with hosts_added/hosts_removed
.
I do think this would be a lot simpler to model as a remove/add; @rgs1 what was the concern with this? Forcing a drain and stats reset? Could we just remove/add from the PoV of the cluster membership (i.e. in the hosts_added/hosts_removed
parameters) while not actually nuking the underlying host object in https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/upstream_impl.cc#L682?
* add a new unit test to handle metadata changes and adds/removals at the same time. * add a comment to explain why/how these updates actually worked. Signed-off-by: Raul Gutierrez Segales <[email protected]>
source/common/upstream/subset_lb.cc
Outdated
// | ||
// Note, note, note: if metadata for existing endpoints changed _and_ hosts were also added | ||
// or removed, we don't need to hit this path. That's fine, given that findOrCreateSubset() | ||
// will be called from processSubsets because it'll be trigger by either hosts_added or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/trigger/triggered/
Signed-off-by: Raul Gutierrez Segales <[email protected]>
* add a helper, refreshSubsets(), to build/sort subsets based on the current hosts * call refreshSubsets() from addMemberUpdateCb() if no hosts were added nor removed Signed-off-by: Raul Gutierrez Segales <[email protected]>
We were ignoring the priority given to the callback and rebuilding everything — now we only refresh that priority. Signed-off-by: Raul Gutierrez Segales <[email protected]>
This tests that after the following transitions: * init state * hosts added * metadata updated without adds/removals refreshSubsets() still does the right thing. That is, that `original_priority_set_` holds the current/updated list of hosts. Signed-off-by: Raul Gutierrez Segales <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks, this looks quite clean. I'd like to get @zuercher 's take on this before we merge.
source/common/upstream/subset_lb.cc
Outdated
void SubsetLoadBalancer::refreshSubsets(uint32_t priority) { | ||
const auto& host_sets = original_priority_set_.hostSetsPerPriority(); | ||
|
||
if (priority >= host_sets.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should just be an ASSERT
, since we shouldn't be asked to do a refresh at a non-existent priority level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes — updated.
Manually creating the metadata for each step is too verbose, so moved that to a helper. This should improve readability. Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern was that we aren't sending removed hosts to the underlying load balancers. That said, I just looked and it seems like only the Original DST LB actually uses hosts_added or hosts_removed from the callback (and that LB isn't allowed with subsets anyway).
We might want to consider whether PrioritySet's MemberUpdateCb needs some rethinking since it's sort of double duty, but I'm ok leaving that for another day.
* origin/master: Ensure subsets are updated when metadata changes (envoyproxy#3840) proxy_protocol: add support for HAProxy Proxy Protocol v2 (binary) (envoyproxy#3668) Signed-off-by: Snow Pettersen <[email protected]>
This is a follow-up for #3810. Now that updates can be fired without
hosts being added or removed, we need to handle this case and check
if subsets need to be added or removed when only metadata of existing
endpoints has changed.
Fixes #3839
Signed-off-by: Raul Gutierrez Segales [email protected]