Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure subsets are updated when metadata changes #3840

Merged
merged 13 commits into from
Jul 12, 2018

Conversation

rgs1
Copy link
Member

@rgs1 rgs1 commented Jul 11, 2018

This is a follow-up for #3810. Now that updates can be fired without
hosts being added or removed, we need to handle this case and check
if subsets need to be added or removed when only metadata of existing
endpoints has changed.

Fixes #3839

Signed-off-by: Raul Gutierrez Segales [email protected]

Raul Gutierrez Segales added 2 commits July 10, 2018 23:53
This is a follow-up for envoyproxy#3810. Now that updates can be fired without
hosts being added or removed, we need to handle this case and check
if subsets need to be added or removed when only metadata of existing
endpoints has changed.

Fixes envoyproxy#3839

Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
if (!host_set->hosts().size()) {
continue;
}
update(host_set->priority(), host_set->hosts(), {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is correct, because you haven't removed the copy of the hosts that are stored in the hierarchy with the old metadata. So a given host will still be mapped to it's previous subset. Not clear how you fix that, except to write some code to manually remove all the subsets and rebuild them all, which seems like a pain.

I thought I read a comment that said having updateDynamicHostList return true caused a full rebuild of the load balancer? If that's true, it seems like this shouldn't matter, but I guess it actually does just attempt to update the hosts. Given that it actually comes through as empty update, this is an argument for having metadata changes look like host removal and addition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what #3810 will trigger is a call to HostSetImpl::updateHosts():

https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/eds.cc#L203

which eventually ends up calling SubsetLoadBalancer::update() via runUpdateCallbacks().

I think we'll refilter hosts in SubsetLoadBalancer::HostSubsetImpl::update():

https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/subset_lb.cc#L454

which should give us an updated subset... no?

Copy link
Member Author

@rgs1 rgs1 Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuercher hold on, if hosts are still mapped to their old subset how come the assertions in my test case pass (e.g.: number of active & removed subsets)? Unless that's unconvering a new bug, that shouldn't be the case... :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuercher the rebuild only happens for dumb LBs; for subset LB, the above update() is invoked and it gets to do the more performant thing with hosts_added/hosts_removed.

I do think this would be a lot simpler to model as a remove/add; @rgs1 what was the concern with this? Forcing a drain and stats reset? Could we just remove/add from the PoV of the cluster membership (i.e. in the hosts_added/hosts_removed parameters) while not actually nuking the underlying host object in https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/upstream_impl.cc#L682?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the concern within #3810? a few things: draining, stats reset, making updateDynamicHostList() too complex...

i can't come up with a super strong argument now, but i suspect we'll have more friction in the future if we conflate metadata updates with adds/removals....

i'll think about it a bit more, but i am also a bit tight on time budget, so happy to explore other alternatives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we implement this as a pure host add? It looks like the delta in your PR is just doing this; you're basically treating hosts_added as hosts_added_or_modified in the recursion. Should we rename the field to indicate this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, actually — I think we should move this into addMemberUpdateCb():

https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/subset_lb.cc#L41

which will make things more obvious. I'll push something in a bit.

Raul Gutierrez Segales added 3 commits July 11, 2018 10:44
Also, ensure the updated versions don't get matched anymore.

Signed-off-by: Raul Gutierrez Segales <[email protected]>
Add more transitions (which simulate prod better).

Signed-off-by: Raul Gutierrez Segales <[email protected]>
[ci skip]

Signed-off-by: Raul Gutierrez Segales <[email protected]>
// It's possible that metadata changed, without hosts being added nor removed.
// If so we need to add any new subsets, remove unused ones, and regroup hosts into
// the right subsets.
if (!hosts_added.size() && !hosts_removed.size()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that this logic doesn't work if the endpoints being updated had some additions/removals with other endpoints remaining the same but with only metadata changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow what combination of changes you are referring to.. Are you saying:

  • metadata changes for some of the existing endpoints
  • new endpoints are added (they must be different from the existing one, given how we operate now)
  • some endpoints are removed

?

Copy link
Member

@derekargueta derekargueta Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easiest way to verify is another test-case, but I believe the question here is:
If there were additions/removals for some endpoints, and metadata-only changes for others

  1. this block would not be taken, since the size did change
  2. the hosts added/removed would be taken care of down below
  3. the metadata-only changes are not addressed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now I got what you were asking thanks to @derekargueta. If there's an add/removal and metadata changes for existing endpoints we won't miss it. Because as mentioned above, we refilter hosts in SubsetLoadBalancer::HostSubsetImpl::update()...

Let me throw in another unit test and add a comment about this.

if (!host_set->hosts().size()) {
continue;
}
update(host_set->priority(), host_set->hosts(), {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuercher the rebuild only happens for dumb LBs; for subset LB, the above update() is invoked and it gets to do the more performant thing with hosts_added/hosts_removed.

I do think this would be a lot simpler to model as a remove/add; @rgs1 what was the concern with this? Forcing a drain and stats reset? Could we just remove/add from the PoV of the cluster membership (i.e. in the hosts_added/hosts_removed parameters) while not actually nuking the underlying host object in https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/upstream_impl.cc#L682?

@htuch htuch self-assigned this Jul 11, 2018
* add a new unit test to handle metadata changes and adds/removals
  at the same time.
* add a comment to explain why/how these updates actually worked.

Signed-off-by: Raul Gutierrez Segales <[email protected]>
//
// Note, note, note: if metadata for existing endpoints changed _and_ hosts were also added
// or removed, we don't need to hit this path. That's fine, given that findOrCreateSubset()
// will be called from processSubsets because it'll be trigger by either hosts_added or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/trigger/triggered/

Raul Gutierrez Segales added 5 commits July 11, 2018 17:00
Signed-off-by: Raul Gutierrez Segales <[email protected]>
* add a helper, refreshSubsets(), to build/sort subsets
  based on the current hosts
* call refreshSubsets() from addMemberUpdateCb() if no
  hosts were added nor removed

Signed-off-by: Raul Gutierrez Segales <[email protected]>
We were ignoring the priority given to the callback and
rebuilding everything — now we only refresh that priority.

Signed-off-by: Raul Gutierrez Segales <[email protected]>
This tests that after the following transitions:

* init state
* hosts added
* metadata updated without adds/removals

refreshSubsets() still does the right thing. That is, that
`original_priority_set_` holds the current/updated list of
hosts.

Signed-off-by: Raul Gutierrez Segales <[email protected]>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks, this looks quite clean. I'd like to get @zuercher 's take on this before we merge.

void SubsetLoadBalancer::refreshSubsets(uint32_t priority) {
const auto& host_sets = original_priority_set_.hostSetsPerPriority();

if (priority >= host_sets.size()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should just be an ASSERT, since we shouldn't be asked to do a refresh at a non-existent priority level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes — updated.

Raul Gutierrez Segales added 2 commits July 12, 2018 12:06
Manually creating the metadata for each step is too verbose, so
moved that to a helper. This should improve readability.

Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
Copy link
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern was that we aren't sending removed hosts to the underlying load balancers. That said, I just looked and it seems like only the Original DST LB actually uses hosts_added or hosts_removed from the callback (and that LB isn't allowed with subsets anyway).

We might want to consider whether PrioritySet's MemberUpdateCb needs some rethinking since it's sort of double duty, but I'm ok leaving that for another day.

@htuch htuch merged commit e718500 into envoyproxy:master Jul 12, 2018
snowp added a commit to snowp/envoy that referenced this pull request Jul 12, 2018
* origin/master:
  Ensure subsets are updated when metadata changes (envoyproxy#3840)
  proxy_protocol: add support for HAProxy Proxy Protocol v2 (binary) (envoyproxy#3668)

Signed-off-by: Snow Pettersen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants