[PLAT-5035] Capture user-reported error threads to disk to avoid deadlock #833
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Goal
This change is necessary to fix a deadlock that (occasionally) occurs when capturing thread traces for user-reported errors.
The notifier suspends all other threads before capturing a thread trace, but these may be holding locks in malloc, the Objective-C runtime, or other subsystems. Therefore we must only call async-signal-safe functions while other threads are suspended.
bsg_kscrw_i_captureThreadTrace
was calling malloc to create a buffer for the JSON output, and this was observed to deadlock when reporting multiple errors on several threads simultaneously.Design
This fix works by creating a temporary file to capture the JSON output using KSJSONEncoder's "regular" file output mechanism.
Using a file minimises the amount of memory required to perform the serialisation, and avoids needing to guess at an amount of memory to pre-allocate.
Changeset
bsg_kscrw_i_captureThreadTrace
now encodes JSON to file output, which is then parsed usingNSJSONSerialization
.The mechanism to capture KSJSONEncoder's in memory has been removed, as it is no longer used and not async-signal-safe.
Testing
This has been tested by manually running the
ManyConcurrentNotifyScenario
scenario in the test app on a real device and stepping through the new code in a debugger to ensure it's working as expected.I have verified that errors are still reported on the dashboard and display the expected information on the "threads" tab.
The code that has been modified is covered by unit tests and e2e tests.