-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC: fix cases where gRPC call could be finished twice #2146
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM with a question.
return; | ||
|
||
if (!completions_.empty() && !is_grpc_call_finished_) { | ||
// Important: since the stream always has a pending read operation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is hard to parse because:
- it conflates pre- and post-conditions
- doesn't clearly indicate normal and error operation
- uses ambiguous phrasing (e.g. "cancellation" could seemingly apply to either the context or the call)
Consider phrasing something like:
Important: during normal operation, the stream always has a pending read operation so
Shutdown
would hang indefinitely if we didn't cancel the context_. However, if the stream has already failed, avoid canceling the context to avoid overwriting the status captured during theOnOperationFailed
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I like this wording.
// never been started. | ||
return; | ||
|
||
if (!completions_.empty() && !is_grpc_call_finished_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't entirely understand why both conditions must hold here. It seems as if is_grpc_call_finished_
should be sufficient, no?
In particular this suggests that it's possible for e.g. completions_
to be empty but is_grpc_call_finished_
to be false and we wouldn't want to call FinishGrpcCall
, but that seems wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this suggests that it's possible for e.g. completions_ to be empty but is_grpc_call_finished_ to be false and we wouldn't want to call FinishGrpcCall, but that seems wrong.
The stream is in this state before it is started. Finishing a non-started stream would also fail a gRPC assertion. I don't think we should ever run into the case when Stream
tries to Finish
a GrpcStream
before it is started. I'm mainly doing this because it's an easy check to make.
If you think protecting against this case isn't worthwhile, I'll simplify the check. Otherwise, I'll add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Looking at the prior state more closely there was a comment there that explained this and it made sense. It's possible you may want to merge something of that comment with what you have below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// never been started. | ||
return; | ||
|
||
if (!completions_.empty() && !is_grpc_call_finished_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Looking at the prior state more closely there was a comment there that explained this and it made sense. It's possible you may want to merge something of that comment with what you have below.
* master: (26 commits) Functions Interop (#2113) Add a travis cron job for CocoaPod symbol collision testing (#2154) Save schema version on downgrade, add test to verify (#2153) Silence Storage Unit Test `nil` warning. (#2150) Update versions for Release 5.14.0 (#2145) gRPC: fix cases where gRPC call could be finished twice (#2146) Fix Swizzler test warnings (#2144) Update Auth CHANGELOG.md (#2128) Make fuzz tests optional until they pass (#2143) Add support of Game Center sign in (#2127) Add test for deprecated FDLURLComponents init API. (#2133) fix a typo in integration test (#2131) Make fuzzing less verbose to avoid exceeding Travis log limit (#2126) Move to `domainURIPrefix` for FIRDynamicLinkComponents (#2119) Carthage instructions for new gRPCCertificates.bundle location (#2132) Fix pod lib lint GoogleUtilities.podspec --use-libraries regression (#2130) Avoid using default FIROptions directly. (#2124) Changelog entry for LRU GC (#2122) Revert "Add Firebase Source to Header Search Path" (#2123) Custom fdl domain (#2121) ...
gRPC C++ wrappers assert that performing any operations on a call results in an "ok" error code. In certain cases, it was possible for Firestore to call
grpc::ClientAsyncReaderWriter::Finish
twice, which causes that assertion to fail, crashing the app. I haven't succeeded in reproducing the issue, but I presume the following would be consistent with bug reports:Stream
stops due to idleness. Normally, a failed operation would cause the stream to stop, and also when an operation fails,Stream
doesn't callFinishImmediately
on theGrpcStream
. However, handling operation failure is split into two async steps: the first one is to finish the underlying gRPC call with a completion, and the second is to invoke the completion which will notify the parentStream
. Because the second step gets added to the back of the worker queue, theoretically the following configuration is possible:Stop due to idleness is considered a graceful stop, and so will call
GrpcStream::FinishImmediately
.GrpcStream::Shutdown
relies oncompletions_.empty()
check to avoid double finish. However, in this case, the check is insufficient, becauseGrpcStream
still has one completion -- the "finish" operation enqueued inOnOperationFailed
that won't be removed untilFinishAndNotify
.GrpcStream::OnOperationFailed
is called twice in quick succession, so that the secondOnOperationFailed
is executed beforeFinishAndNotify
enqueued by the firstOnOperationFailed
is. The failure is similar to case 1.These sound like pretty rare circumstances, which is consistent with how difficult the issue is to reproduce. There might be more cases; the best fix seems to be ensuring that gRPC call is never finished twice.
This PR:
is_grpc_call_finished_
, to track whether the underlying call has been finished without trying to interpret the current state ofGrpcStream
to infer that information;GrpcStream
, in case more related crashes are discovered;