Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak MessagingService Shutdown #588

Draft
wants to merge 10 commits into
base: palantir-cassandra-2.2.18
Choose a base branch
from
Draft

Conversation

zpear
Copy link
Contributor

@zpear zpear commented Nov 26, 2024

A couple different, but sort of related changes in this PR:

  • On MessagingService shutdown we now close the incoming socket before shutting down the callback expiring map. Although in the previous ordering the callback map won't allow new callbacks to be registered, I think other C* nodes may continue successfully sending messages since at the networking layer they're still accepted. This might help a little bit, in conjunction with the next change, to avoid waiting until timeouts for requests we know will fail -- but probably not a ton.
  • If one C* node sees another as up, but cannot connect for whatever reason, it will loop trying to reconnect in OutboundTcpConnection for minRpcTimeout, which can be the same as read/write timeouts in practice. Once this is exceeded, all the messages on the backlog are cleared. If these were needed to satisfy a read/write, this node as the coordinator just waits until the failure callback is invoked via read/write timeout (which can be a long time).
    • If we make this separately configurable and assume recoverable network problems are short-lived, and non-recoverable ones take substantially longer than read/write timeouts, we can make this connectivity timeout small and invoke the failure callbacks much faster than the write timeout, which might improve the client experience.
    • Normally you'd hope this node sees the other as unreachable in this scenario as then it will not attempt to send a request. However, we've seen this not be the case relatively frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant