-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AppCheck] Firestore stops working after leaving the website idle overnight - AppCheck HTTP error 403 (appCheck/fetch-status-error) + infinite loop / high CPU usage #6373
Comments
I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight. |
Hello @anisabboud. I am working with the AppCheck team to investigate. I'll reply back once I have any updates. The 403 error that you're seeing means that Firestore SDK has correctly identified that the token is expired and that it needs to get a new token. It has attempted to get a new AppCheck token, and AppCheck was not able to provide one (hence the error is coming from @firebase/app-check). So any Firestore operation after that will fail if AppCheck enforcement is ON. Interestingly, there was an issue with AppCheck and Flutter that saw the same error (Googlers can see b/230454903). The root cause might be the same. |
@anisabboud Would it be possible for you to provide a minimal app to reproduce this issue, that we could use for debugging? |
Hi @dconeybe, I don't currently have time to spin up a test app, but you could reproduce this issue on your own apps using Firestore listeners and AppCheck enforced. Just leave the tab open, put the computer to sleep, and check the console the next day. Try that a few days in a row. The problem is that it's cumbersome to reproduce - I need to wait for the next day every time... The app continued to function, but the console was outputting the message above in an infinite loop (~10,000 times within 5 minutes). |
I've re-assigned this issue to my colleague @hsubox76 who has far more expertise in the realm of AppCheck. If this turns out to be a Firestore issue, feel free to assign it back to me. |
I'm still not sure about the first error, but the second error doesn't seem to be related to the SDK itself, it seems like all the errors in the stack trace (in the last post) happen because the machine is offline, or at least the browser believes it is offline. If these happened after the computer woke up, then it seems like it either didn't reconnect to the internet properly, or the browser didn't recognize it. As for the first error (the 403), it will be difficult to find the source of the problem if we can't reproduce it. All I can guess from my knowledge is that 403 errors like that generally happen because an invalid recaptcha token was sent to the app check exchange endpoint. A possible cause of an invalid recaptcha token might be that something about the sleep/wake process screws up the recaptcha process and causes it to yield an invalid recaptcha token, which is then sent to the app check endpoint, which returns the 403. |
Hi @hsubox76 - thank you for the insights. |
Just chiming in here that we are experiencing the same error across a broad swath of our users. A refresh solves the issue presumably related to the reCAPTCHA expiration. For reference, we are on reCAPTCHA Enterprise. The issue appears to arise after an overnight stale session. |
@hsubox76 - Is there anything else you need to help identify and resolve this issue? Its difficult to replicate on our end given the time it takes for the token to expire. |
I'm not sure if we can prevent the 403 itself, if the problem is that reCAPTCHA doesn't work after a long sleep, but it shouldn't be making that many repeated requests. It looks like something is wrong with the throttling, which should throw here if it had previously gotten a 403 less than 1 day ago:
We need to look into why the throttle is failing. |
Thank you for the update @hsubox76 . If we cannot ensure functionality of reCAPTCHA after a long sleep, do you have any suggestions on catching this error to fire a re-load on the client side? |
I had similar issue once, with the infinite errors, but it was fixed here |
Confirmed that we have the
|
Someone is looking into why throttling isn't working. It would be great to catch if ReCAPTCHA has gone wrong after a long sleep, and reload ReCAPTCHA if so. Unfortunately, due to the nature of ReCAPTCHA, it won't tell you on the client side if it's got a bad token or why (because otherwise bad actors could see that and do trial and error to see how to get around it), it can only be validated on the back end. So I'm not sure if there's a way to detect if ReCAPTCHA has gone stale. |
Do we have a timeline for when this might be deployed? As it stands, |
This is merged and should be released this week, usually Thursday. This will only fix the throttling issue and prevent too many requests to the endpoint which all return 403. There may still be a lot of errors in the console, but they'll say that a request was throttled, instead of saying that it made a request and got a 403. This is important as it will prevent request quota problems. We don't have a way yet to automatically refresh it and make it work without the user manually refreshing. |
Thank you for the update. |
No, that would cause it to not work at all in the first place. Thanks for filing a separate issue, will address it there. |
Thank you looking into this issue, and for the update. I updated to the latest version of the Firebase SDK Today I left my laptop idle for 1.5 hours. When I came back to it, the fans were running at full speed, indicating extreme CPU usage. FYI: In the video above, AuthService basically listens to the auth state via subscribing to |
No, the problems started to occur after I enabled it. This is what I can see after leaving the computer sleeping overnight and waking it up the following morning. Multiple failed requests to Multiple failed requests to Multiple failed requests to These happened when the computer was sleeping. Then when it woke up it did the following: A successful A successful I even managed to update documents in Firestore without having to refresh this time, which is.. weird. So it seems like it already does what I proposed in the first place but it sometimes fails to update the Firestore in the cloud after waking up despite successful writes. |
I'm pretty sure this change (firebase 9.9.1) is causing infinite setTimeouts on reflect.app - we've just had to revert back to 9.9.0. |
Is it possible to provide a minimal reproduction of this infinite setTimeouts issue? I'm assuming this is a new issue introduced by the "throttle fix" in 9.9.1 and not the original sleep/wake issue, and if so, can you create a new issue and provide the minimal reproduction, or at least a sample of the code that leads to these errors, or logs? |
I can't give you a minimal reproduction, but I can give you a version of the app with the issue if that helps? All we did was upgrade firebase to 9.9.4. I tried each version until 9.0.0 (which fixed the issue). |
Created a new issue and responded here: #6606 For those having trouble with Firestore sleep/wake, which specific Firestore methods are not working after wake? I know @skog-newglue mentioned writes, are these |
@hsubox76 For me it's |
@hsubox76 Side note, I noticed today that it was firing A LOT of requests upon waking up. We're talking like 800+ requests to /token and /channel, so when I tried a write I couldn't even see the request in the network log. And what do you know, it hadn't updated the data. So could the problem be that it's not making the request in the first place and therefore I get the impression that I've had a successful write? Just thinking out loud here. I'm probably gonna try to add a console log and try again in a few days to see if the request is actually being sent. |
So if no one is able to provide a minimal repro, I understand, but it will be really hard to find and diagnose a bug without being able to reproduce it. I've created an app with only this code, nothing else (those 2 buttons at the end are in the index.html):
The buttons are in the index.html like so:
When you push the "gobutton" it initializes app check and starts an onSnapshot firestore listener on a certain doc. When you push the "writebutton" it writes a random number to the "randomNumber" field of that doc. I think this is a pretty good foundation for a test app, if someone wants to take this code and build an app, and modify it as needed so that it reproduces the same kinds of errors you are seeing, and let me know what steps to take to reproduce it, that would be really helpful. |
Hi, I've been running into this infinite loop problem as well. I haven't had any of the write issues because my app doesn't write directly to firestore, but I did notice something that might help your repro: I was testing my app while logged into multiple accounts at once, and had a number of tabs/incognito tabs open in Chrome/Firefox. I noticed that I consistently got the infinite loop when waking the computer from sleep overnight on the signed in tabs, but never on the tab that is signed out. In my app, the only meaningful difference between the signed-in and signed-out states are that I only attach snapshot listeners when the user is signed in. Could this indicate that there's some bug in the interaction between firestore and AppCheck (as opposed to being an AppCheck only bug)? I don't have the time to try it myself but I think your test app should try signing in the user before attaching the snapshot listener (i.e after your My app sets everything up almost identically to your test app, so I think if you add the sign-in you might be able to repro what I'm seeing. I also have LogRocket set up and it probably has recorded the infinite loop happening, but I assume sharing more console logs is probably not as helpful as getting a working repro. I hope this helps and let me know if there's anything else I can provide, this bug is blocking my app from going into production so I'm happy to help fix it |
Was able to reproduce it by shortening the token TTL - the errors seem to happen when the token is about to expire (or has expired) and tries to auto refresh. Working on a fix in #6617. This seems likely to address both the rapid requests problem and the "stops working until I refresh" problem. |
Sorry for the auto-close. We have a fix that should be released next week, 10/6. If you are able to test it out now, a staging version has been published (do not use in production). To get the staging version you can npm or yarn install |
Well it seems like no users were able to try the staging release but the production release is now out (9.11.0) so hopefully this fix works! I had to set up a somewhat contrived situation for the repro so I'm still not 100% sure the fix will work in users' real life situations, so let me know. I'll keep this issue open until I hear a few reports that this has been fixed in production apps for real. |
@hsubox76 first of all I want to say thank you very much for taking this issue seriously and for your perseverance in finding a fix. It's much appreciated. I updated to Firebase v9.11.0 three days ago and opened 4 tabs to test: The tabs with v9.10 triggered an infinite loop, whereas the tabs with v9.11 didn't trigger an infinite loop (so I believe the infinite loop + high CPU usage issue is probably resolved), but the throttling introduced in a different related commit might have triggered a different side-effect. I will describe everything below. v9.10Tab 1 with v9.10 - infinite loop after one day:Tab 2 with v9.10 - infinite loop after two days:Tab 1 vs Tab 2I.e., both tabs running v9.10 got stuck into an infinite loop eventually, but not on the same day, and the console & network tabs were not the same in both instances. The first tab kept showing v9.11The tabs running v9.11 did not get into an infinite loop, but did get into a LONG loop, showing hundreds of console warnings and network tab requests, including appcheck throttling and therefore Firestore errors which eventually rendered the app dysfunctional since it no longer could load data. I will explain the console behavior I observed step-by-step: Tab 3 running v9.11.0Step 1: A bunch of POST requests fail (could be while the computer is being put to sleep).
Step 2: After the
|
@anisabboud, do you have an ad blocker running which could be blocking those POST requests before and after sleep? I found others posting about the same issue and it was caused by an ad blocker. |
I would suggest creating a separate issue for your Auth issues as Auth does not use App Check and while Firestore is related to Auth, it can't cause Auth sign-ins to happen. This way the Auth team can focus on that issue specifically while we can look at any possibly lingering Firestore/App Check issues here. |
@MarkDuckworth I do have an ad blocker. I believe it was turned on in one tab and off in the other (different deployments of the same app), so I'm not sure it's the culprit. But will try disabling it anyway and retrying the experiment over the next couple of days. (Side note: I use windows hibernate instead of sleep.) @hsubox76 thank you for the clarification on the difference between Auth & Firestore - will try open a separate issue if I'm able to isolate the auth issue further.. |
The issue I mentioned 4 days ago happens also without AdBlock. Basically that's the flow that happened on the dead tab:
However, it seems that the infinite loop isn't happening lately, which is a huge relief. I'm still seeing ~1K unverified requests per day in the graph, which might be related to the issue I reported (the token expires, the token refresh fails, and then all the Firestore requests are considered invalid and fail). |
Does the error you are talking about happen after you return to the tab, or while you were away? A 503 from what I can tell indicates the server is overloaded. A 503 code causes a retry with an exponential backoff starting at about ~1s and increasing with each retry. Does this happen? You only listed one warning message. Are there any more about further attempts after that? If not, perhaps it succeeded on the second try, after 1s. Are there any successful firebaseappcheck.googleapis.com POSTs after that in the network tab? If that's the case, Firestore should be able to resume any operations after that as normal. Which Firestore operations are you using? onSnapshot? Or one time operations? Is it failing while App Check is retrying or after it has succeeded (if it did eventually succeed)? |
Q: Does the error happen after you return to the tab, or while you were away? Q: Did you see exponential backoff warnings?
Q: Which Firestore operations are you using? Q: Are there any successful firebaseappcheck.googleapis.com POSTs after that in the network tab? Overall the frequency of the issue was greatly reduced by the v9.11/v9.12 updates.
So far the issue has been correlated with the The AppCheck dashboard is also looking much better after your fix - currently only ~1K requests per day are labeled as "Unverified", compared to millions of requests caused by the infinite loop earlier. |
Yes, 403 errors are throttled for 1 day because it is likely there's something wrong with the attestation that won't be fixed with a simple retry, such as a bad API key or an attestation failure due to failing ReCAPTCHA. You can see the code for the throttling here (this also applies to 404):
Other errors, such as 503 server errors, could mean the server is overloaded, so it's ok to retry quickly, beginning with 1s. If the errors are happening while the computer is asleep, that seems reasonable, as that's when the internet connection might not be working. If it doesn't work upon returning, we can focus on and address that. It seems like the main issue in this thread is fixed, if you find the issues with listener errors after waking from sleep to be a big problem, let's start a separate issue focused on that. This issue has become very long and difficult to parse as it has many related but different issues thrown together, and it seems like the biggest problem is fixed. If there's a remaining issue (listener errors after waking from sleep), let's start a separate issue that only focuses on that. (I know the original reported issue was sort of about that but this really became the "infinite loop issue" thread at some point and that's what most of the discussion is about.) |
AppCheck problem
AppCheck works fine and enforced.
However, after leaving the webapp idle for a while (e.g., if I leave the tab open and go to sleep and put the laptop to sleep, then come to work the next day), Firestore stops working (as if it gets disconnected and doesn't reconnect or refresh AppCheck token).
Console error observed in production:
@firebase/app-check: FirebaseError: AppCheck: Fetch server returned an HTTP error status. HTTP status: 403. (appCheck/fetch-status-error).
Console error observed on localhost:
zone.js:1061 Unhandled Promise rejection: cancelled ; Zone: <root> ; Task: Promise.then ; Value: cancelled undefined
Environment
The text was updated successfully, but these errors were encountered: