-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Firestore Android get stuck permanently after a large volume of mutations #5417
Comments
Hi @fanwgwg, thank you for reporting this issue. I am unable to download the DB via the link provided. Could you please check if there are any abnormal transactions that trying to mutate a same document at the same time, or anything that would caught your eyes? I wonder if this was caused by your persistence settings + large volume of mutations. But without a reproducible code, it is hard to tell what is the root cause. |
@milaGGL The link should be valid as I'm able to download it from Incognito mode (unless you meant that it cannot be virus scanned by Google drive due to file being too big, as shown in the screenshot). For persistence settings, our app does not override any Firestore's persistence settings, so everything about offline persistence should be the default. For abnormal transactions, we did not identify any of it. Also, that's the reason why I did a dump of the firestore database when the "stuck" happens, which is to send to Firestore engineers for a further inspection in case there is anything mysterious. But for us, we never reproduced the same issue again so it's difficult for us to tell if there's anything abnormal. Regarding this:
I'm unable to tell if this has happened, as we never encountered the same issue during our 1 year development time with firestore android, so this must be a rare or race condition. However, I recall that all mutations to firestore android sdk is enqueued sequentially: Line 46 in 406c057
Please correct me if I'm not understanding it correctly |
@fanwgwg, yes, that is correct. Quick question, have there been any updates to your app recently, like upgrading to a newer version of the SDK? |
@milaGGL We've been following the latest version and Firebase BOM releases, typically we always update to the latest version within 1 week of its release. The latest firebase android BOM update was 32.3.1 on September 15, 2023, and we updated to this version on Sep 16th. When this issue happened (around Sep 20-ish), we were already using the 32.3.1 release. |
@milaGGL I've just encountered another case of "stuck" today, this time I've enabled Firestore log and here's log output from starting the app to when firestore got stuck: Google Drive Link One thing I found is that each time the "stuck" happens, the log ends with these two lines. After these two lines, no matter how long you wait for (no matter hours or days), firestore no longer prints any logs. Any further enqueued Task to Firestore will never complete, neither will them fail, but just in a waiting state.
|
@milaGGL I think I've found something that might be very close to the cause, there is a infinite loop running in this method: Line 159 in f6b5ecb
I've attached a screenshot of debugging at the breakpoints inside the method, as you can see, the |
@fanwgwg, this is amazing! I wonder if |
Yes Unfortunately I do not have a way to consistently reproduce the same issue, except that it only happens when there is a large volume of mutations. Even for large volume of mutations, which we often perform such tests on a daily basis, I've only encountered it twice so far. In case anyone else is encountering the same issue, our mitigation for the issue is to disable persistence because our app doesn't use the persistence feature, i.e.,
|
@milaGGL @ehsannas Friendly ping, is there any update on this issue? While disabling persistence works for now, we don't like this approach as mitigation as it might prevent us from developing features that rely on offline persistence, which is one of the core features of Firestore. Just thinking out loud based on my superficial understanding of the SDK, it seems like that:
|
@fanwgwg Thanks for the detailed investigation, I think you are right that there is a bug in Your operation put the SDK's GC under pressure, it try to go through all orphaned documents in batches (with batch size REMOVE_ORPHANED_DOCUMENTS_BATCH_SIZE), and it will stop when it processed a batch of a size different than REMOVE_ORPHANED_DOCUMENTS_BATCH_SIZE, this means it has processed all the orphaned documents with sequence number in scope. Typically, the number of orphaned documents do not exceed REMOVE_ORPHANED_DOCUMENTS_BATCH_SIZE, thus the loop breaks with one iteration. The operation you did triggered the loop into running multiple iterations. Unfortunately, from the second iteration onwards, the SDK did not "resume" the query from iteration 1, and instead it issue the same query from iteration 1, thus an infinite loop. We will get this fixed ASAP. |
@wu-hui Thanks for the quick response! Looking forward to the fix! |
[READ] Step 1: Are you in the right place?
Issues filed here should be about bugs in the code in this repository.
If you have a general question, need help debugging, or fall into some
other category use one of these other channels:
with the firebase tag.
google group.
of the above categories, reach out to the personalized
Firebase support channel.
[REQUIRED] Step 2: Describe your environment
[REQUIRED] Step 3: Describe the problem
During our development (our app is already in production with Firestore for nearly a year so we're very familiar with the Firestore Android SDK), we spotted one rare issue that we've never experienced before, which is Firestore getting stuck permanently after a large volume of mutations on the client side. Although this is our first time experiencing the issue, after days of debugging we couldn't find any reason why it was getting stuck, here's what we happens:
Task
won't complete if there was no internet connection, because it can only complete when a write is committed on serverTask
never completed, failed or cancelled.FirestoreWorker
has been running 100% of the CPU time (of its own thread).As of right now, we've already re-installed our app yesterday to recover from this issue. However, while we were debugging the issue, we did dump the Firestore database at that time, because I thought it might be helpful for Firestore engineers to investigate. Here's the link to the DB. The db itself is gigantic, at size of > 1GB, our app does not override any Firestore's persistence settings, so everything about offline persistence should be the default.
We've filed a support case to Firebase Support earlier (last week) with case number 10250473, however, they said they were to spot any issue on the server side and suggested me to contact the client SDK team directly.
Steps to reproduce:
Ever since then, we never encountered the issue again. Similarly during our ~1 year development with Firestore Android SDK, that was the only time we encountered the issue. We believe that it could be a rare case, however, when the issue does happen, it is SERIOUS because the only way to get out of it is to uninstall and reinstall the app. Therefore, we're kindly asking Firestore team to take a look at this issue.
Relevant Code:
Explained above
The text was updated successfully, but these errors were encountered: