Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AppCheck] Firestore stops working after leaving the website idle overnight - AppCheck HTTP error 403 (appCheck/fetch-status-error) + infinite loop / high CPU usage #6373

Closed
anisabboud opened this issue Jun 22, 2022 · 60 comments · Fixed by #6617
Assignees

Comments

@anisabboud
Copy link

AppCheck problem

AppCheck works fine and enforced.
However, after leaving the webapp idle for a while (e.g., if I leave the tab open and go to sleep and put the laptop to sleep, then come to work the next day), Firestore stops working (as if it gets disconnected and doesn't reconnect or refresh AppCheck token).

Console error observed in production: @firebase/app-check: FirebaseError: AppCheck: Fetch server returned an HTTP error status. HTTP status: 403. (appCheck/fetch-status-error).

Console error observed on localhost: zone.js:1061 Unhandled Promise rejection: cancelled ; Zone: <root> ; Task: Promise.then ; Value: cancelled undefined

Environment

  • Angular version: 14.0.2 (latest)
  • Firebase SDK version: 9.8.3 (latest)
  • AngularFire version: 7.4.1 (latest)
  • Firebase Product: AppCheck
@google-oss-bot
Copy link
Contributor

I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.

@dconeybe
Copy link
Contributor

Hello @anisabboud. I am working with the AppCheck team to investigate. I'll reply back once I have any updates.

The 403 error that you're seeing means that Firestore SDK has correctly identified that the token is expired and that it needs to get a new token. It has attempted to get a new AppCheck token, and AppCheck was not able to provide one (hence the error is coming from @firebase/app-check). So any Firestore operation after that will fail if AppCheck enforcement is ON. Interestingly, there was an issue with AppCheck and Flutter that saw the same error (Googlers can see b/230454903). The root cause might be the same.

@dconeybe
Copy link
Contributor

@anisabboud Would it be possible for you to provide a minimal app to reproduce this issue, that we could use for debugging?

@anisabboud
Copy link
Author

Hi @dconeybe, I don't currently have time to spin up a test app, but you could reproduce this issue on your own apps using Firestore listeners and AppCheck enforced. Just leave the tab open, put the computer to sleep, and check the console the next day. Try that a few days in a row.

The problem is that it's cumbersome to reproduce - I need to wait for the next day every time...
Today for instance, I encountered a different but related error when doing this - appCheck/fetch-network-error:
@firebase/app-check: FirebaseError: AppCheck: Fetch failed to connect to a network. Check Internet connection. Original error: Failed to fetch. (appCheck/fetch-network-error).

The app continued to function, but the console was outputting the message above in an infinite loop (~10,000 times within 5 minutes).

Attaching a partial log showing the infinite loop:
image

@dconeybe dconeybe assigned hsubox76 and unassigned dconeybe Jun 27, 2022
@dconeybe
Copy link
Contributor

I've re-assigned this issue to my colleague @hsubox76 who has far more expertise in the realm of AppCheck. If this turns out to be a Firestore issue, feel free to assign it back to me.

@hsubox76
Copy link
Contributor

I'm still not sure about the first error, but the second error doesn't seem to be related to the SDK itself, it seems like all the errors in the stack trace (in the last post) happen because the machine is offline, or at least the browser believes it is offline. If these happened after the computer woke up, then it seems like it either didn't reconnect to the internet properly, or the browser didn't recognize it.

As for the first error (the 403), it will be difficult to find the source of the problem if we can't reproduce it. All I can guess from my knowledge is that 403 errors like that generally happen because an invalid recaptcha token was sent to the app check exchange endpoint. A possible cause of an invalid recaptcha token might be that something about the sleep/wake process screws up the recaptcha process and causes it to yield an invalid recaptcha token, which is then sent to the app check endpoint, which returns the 403.

@anisabboud
Copy link
Author

Hi @hsubox76 - thank you for the insights.
Note that the infinite loop of errors mentioned in my last message continues indefinitely (for more than 10 minutes, producing tens of thousands of errors until the console crashes), whereas the internet connection comes back a couple of seconds after waking the computer up.
I.e., the loop should stop when the internet comes back, but it doesn't, which seems like a bug.

image

@clive-h-townsend
Copy link

Just chiming in here that we are experiencing the same error across a broad swath of our users. A refresh solves the issue presumably related to the reCAPTCHA expiration. For reference, we are on reCAPTCHA Enterprise. The issue appears to arise after an overnight stale session.

@anisabboud
Copy link
Author

Posting console screenshots from today, of the errors mentioned in the first post:

Production

Console error observed in production: @firebase/app-check: FirebaseError: AppCheck: Fetch server returned an HTTP error status. HTTP status: 403. (appCheck/fetch-status-error).

In production, app continues to function, but there's an infinite loop of AppCheck errors in the console, causing high CPU usage in Chrome:
image

localhost

Console error observed on localhost: zone.js:1061 Unhandled Promise rejection: cancelled ; Zone: <root> ; Task: Promise.then ; Value: cancelled undefined

On localhost, no data is coming from Firestore (as if it's disconnected), but there is no infinite loop of errors in the console - just a few:
image

@clive-h-townsend
Copy link

@hsubox76 - Is there anything else you need to help identify and resolve this issue? Its difficult to replicate on our end given the time it takes for the token to expire.

@anisabboud
Copy link
Author

Screenshot of the network tab (infinite loop 1000+ requests):

image

@hsubox76
Copy link
Contributor

I'm not sure if we can prevent the 403 itself, if the problem is that reCAPTCHA doesn't work after a long sleep, but it shouldn't be making that many repeated requests. It looks like something is wrong with the throttling, which should throw here if it had previously gotten a 403 less than 1 day ago:

throwIfThrottled(this._throttleData);

We need to look into why the throttle is failing.

@clive-h-townsend
Copy link

Thank you for the update @hsubox76 . If we cannot ensure functionality of reCAPTCHA after a long sleep, do you have any suggestions on catching this error to fire a re-load on the client side?

@orbachar
Copy link

I had similar issue once, with the infinite errors, but it was fixed here

#5842

@clive-h-townsend
Copy link

Confirmed that we have the isTokenAutoRefreshEnabled is set to true on our app check configuration.

initializeAppCheck(app, {
    provider: new ReCaptchaEnterpriseProvider(XXXXXXXXXXX),
    isTokenAutoRefreshEnabled: true,
})

@hsubox76
Copy link
Contributor

Someone is looking into why throttling isn't working.

It would be great to catch if ReCAPTCHA has gone wrong after a long sleep, and reload ReCAPTCHA if so. Unfortunately, due to the nature of ReCAPTCHA, it won't tell you on the client side if it's got a bad token or why (because otherwise bad actors could see that and do trial and error to see how to get around it), it can only be validated on the back end. So I'm not sure if there's a way to detect if ReCAPTCHA has gone stale.

@clive-h-townsend
Copy link

Do we have a timeline for when this might be deployed? As it stands, appCheck is causing more harm that good.

@hsubox76
Copy link
Contributor

hsubox76 commented Jul 19, 2022

This is merged and should be released this week, usually Thursday.

This will only fix the throttling issue and prevent too many requests to the endpoint which all return 403. There may still be a lot of errors in the console, but they'll say that a request was throttled, instead of saying that it made a request and got a 403. This is important as it will prevent request quota problems.

We don't have a way yet to automatically refresh it and make it work without the user manually refreshing.

@clive-h-townsend
Copy link

Thank you for the update.

@wliumelb
Copy link

#6471
could this be related? @hsubox76

@hsubox76
Copy link
Contributor

No, that would cause it to not work at all in the first place. Thanks for filing a separate issue, will address it there.

@anisabboud
Copy link
Author

Thank you looking into this issue, and for the update.

I updated to the latest version of the Firebase SDK 9.9.1, but still encountered some aspect of this issue / infinite loop:

Today I left my laptop idle for 1.5 hours. When I came back to it, the fans were running at full speed, indicating extreme CPU usage.
I looked at the network tab of the incognito development tab I had opened, and noticed the same behavior I mentioned in this previous comment. See GIF video below showing infinite network requests:

appcheck-loop

FYI: In the video above, AuthService basically listens to the auth state via subscribing to user(auth) (AngularFire/RxFire syntax).
image
So something is causing auth state to be updated an infinite loop after leaving the app idle for a while.

@chriswoodie
Copy link

@hsubox76

If App Check is not enabled, you don't have any of these issues with Firestore?

No, the problems started to occur after I enabled it.

This is what I can see after leaving the computer sleeping overnight and waking it up the following morning.

Multiple failed requests to POST https://securetoken.googleapis.com/v1/token?key=<my-key> with net::ERR_INTERNET_DISCONNECTED status.

Multiple failed requests to GET https://firestore.googleapis.com/google.firestore.v1.Firestore/Listen/..... with net::ERR_QUIC_PROTOCOL_ERROR 200 and net::ERR_NAME_NOT_RESOLVED.

Multiple failed requests to POST https://firestore.googleapis.com/google.firestore.v1.Firestore/Listen/..... with net::ERR_NAME_NOT_RESOLVED.

These happened when the computer was sleeping.

Then when it woke up it did the following:

A successful POST https://securetoken.googleapis.com/v1/token?key=<my-key>.

A successful POST https://www.google.com/recaptcha/api2/reload?k=<my-key> and after it a successful POST https://content-firebaseappcheck.googleapis.com/v1beta/projects/.....

I even managed to update documents in Firestore without having to refresh this time, which is.. weird.

So it seems like it already does what I proposed in the first place but it sometimes fails to update the Firestore in the cloud after waking up despite successful writes.

@maccman
Copy link

maccman commented Sep 14, 2022

I'm pretty sure this change (firebase 9.9.1) is causing infinite setTimeouts on reflect.app - we've just had to revert back to 9.9.0.

@hsubox76
Copy link
Contributor

Is it possible to provide a minimal reproduction of this infinite setTimeouts issue? I'm assuming this is a new issue introduced by the "throttle fix" in 9.9.1 and not the original sleep/wake issue, and if so, can you create a new issue and provide the minimal reproduction, or at least a sample of the code that leads to these errors, or logs?

@maccman
Copy link

maccman commented Sep 15, 2022

Is it possible to provide a minimal reproduction of this infinite setTimeouts issue? I'm assuming this is a new issue introduced by the "throttle fix" in 9.9.1 and not the original sleep/wake issue, and if so, can you create a new issue and provide the minimal reproduction, or at least a sample of the code that leads to these errors, or logs?

I can't give you a minimal reproduction, but I can give you a version of the app with the issue if that helps?

All we did was upgrade firebase to 9.9.4. I tried each version until 9.0.0 (which fixed the issue).

@hsubox76
Copy link
Contributor

Created a new issue and responded here: #6606

For those having trouble with Firestore sleep/wake, which specific Firestore methods are not working after wake? I know @skog-newglue mentioned writes, are these update, set, or transaction writes? Is get also not working? Or onSnapshot? Do they all report success with no errors?

@chriswoodie
Copy link

@hsubox76 For me it's setDoc and possibly addDoc as well.

@chriswoodie
Copy link

@hsubox76 Side note, I noticed today that it was firing A LOT of requests upon waking up. We're talking like 800+ requests to /token and /channel, so when I tried a write I couldn't even see the request in the network log. And what do you know, it hadn't updated the data.

So could the problem be that it's not making the request in the first place and therefore I get the impression that I've had a successful write? Just thinking out loud here. I'm probably gonna try to add a console log and try again in a few days to see if the request is actually being sent.

@hsubox76
Copy link
Contributor

So if no one is able to provide a minimal repro, I understand, but it will be really hard to find and diagnose a bug without being able to reproduce it. I've created an app with only this code, nothing else (those 2 buttons at the end are in the index.html):

import { initializeApp } from "firebase/app";
import {
  initializeAppCheck,
  ReCaptchaV3Provider,
  onTokenChanged,
} from "firebase/app-check";
import { getFirestore, onSnapshot, doc, updateDoc } from 'firebase/firestore';

const app = initializeApp({
  /** project config */
});

async function main() {
  const appCheck = initializeAppCheck(app, {
    provider: new ReCaptchaV3Provider(
      /** recaptcha site key */
    ),
    isTokenAutoRefreshEnabled: true
  });

  onTokenChanged(appCheck, (newToken) => console.log("new token", newToken));

  const firestore = getFirestore(app);
  const docRef = doc(firestore, 'chtest/doc1');
  onSnapshot(docRef, (snap) => { console.log(snap.data())})
}

async function write() {
  const firestore = getFirestore(app);
  const docRef = doc(firestore, 'chtest/doc1');
  try {
    await updateDoc(docRef, { randomNumber: Math.round(Math.random() * 1000).toString() })
  } catch(e) {
    console.log(e);
  }
}

document.getElementById("gobutton").addEventListener("click", main);
document.getElementById("writebutton").addEventListener("click", write);

The buttons are in the index.html like so:

  <button id="gobutton">go</button>
  <button id="writebutton">write</button>

When you push the "gobutton" it initializes app check and starts an onSnapshot firestore listener on a certain doc. When you push the "writebutton" it writes a random number to the "randomNumber" field of that doc.
I pushed this test app to Firebase hosting and went to the url. I pushed "go" to initialize the app and "write" a few times, to write a few random numbers.
Then I put the machine to sleep overnight.
After returning in the morning there were a number of repeated Firestore errors such as [2022-09-16T15:54:52.747Z] @firebase/firestore: Firestore (9.10.0): Connection WebChannel transport errored: ce {type: 'c', target: Y, g: Y, defaultPrevented: false, status: 1} and GET https://firestore.googleapis.com/google.firestore.v1.Firestore/Listen/channel?gsessionid=flqtyYCt_guAuazxiScW1jeABUZ8MOIRW125sDNeGPY&VER=8&database=projects%2Fappcheck-testing%2Fdatabases%2F(default)&RID=rpc&SID=FAOPCVbx-faguJtrJYxdhQ&CI=0&AID=8&TYPE=xmlhttp&zx=e3rogsaw1mnv&t=1 400
I did not see any app check errors.
When I pushed the "write" button again, however, it did a successful write, and I could see in the console that it propagated to the backend. Also, onSnapshot fired and logged the new value.
I am using Chrome on MacOS.

I think this is a pretty good foundation for a test app, if someone wants to take this code and build an app, and modify it as needed so that it reproduces the same kinds of errors you are seeing, and let me know what steps to take to reproduce it, that would be really helpful.

@ghost
Copy link

ghost commented Sep 19, 2022

Hi, I've been running into this infinite loop problem as well. I haven't had any of the write issues because my app doesn't write directly to firestore, but I did notice something that might help your repro:

I was testing my app while logged into multiple accounts at once, and had a number of tabs/incognito tabs open in Chrome/Firefox. I noticed that I consistently got the infinite loop when waking the computer from sleep overnight on the signed in tabs, but never on the tab that is signed out.

In my app, the only meaningful difference between the signed-in and signed-out states are that I only attach snapshot listeners when the user is signed in. Could this indicate that there's some bug in the interaction between firestore and AppCheck (as opposed to being an AppCheck only bug)?

I don't have the time to try it myself but I think your test app should try signing in the user before attaching the snapshot listener (i.e after your onTokenChanged call, add await signInWithEmailAndPassword(getAuth(), "[email protected]", "password"))

My app sets everything up almost identically to your test app, so I think if you add the sign-in you might be able to repro what I'm seeing. I also have LogRocket set up and it probably has recorded the infinite loop happening, but I assume sharing more console logs is probably not as helpful as getting a working repro.

I hope this helps and let me know if there's anything else I can provide, this bug is blocking my app from going into production so I'm happy to help fix it

@hsubox76
Copy link
Contributor

Was able to reproduce it by shortening the token TTL - the errors seem to happen when the token is about to expire (or has expired) and tries to auto refresh. Working on a fix in #6617. This seems likely to address both the rapid requests problem and the "stops working until I refresh" problem.

@hsubox76
Copy link
Contributor

Sorry for the auto-close. We have a fix that should be released next week, 10/6. If you are able to test it out now, a staging version has been published (do not use in production). To get the staging version you can npm or yarn install [email protected]. If anyone is able to try this out and let me know if it works, that would be great.

@hsubox76
Copy link
Contributor

hsubox76 commented Oct 6, 2022

Well it seems like no users were able to try the staging release but the production release is now out (9.11.0) so hopefully this fix works! I had to set up a somewhat contrived situation for the repro so I'm still not 100% sure the fix will work in users' real life situations, so let me know. I'll keep this issue open until I hear a few reports that this has been fixed in production apps for real.

@anisabboud
Copy link
Author

@hsubox76 first of all I want to say thank you very much for taking this issue seriously and for your perseverance in finding a fix. It's much appreciated.

I updated to Firebase v9.11.0 three days ago and opened 4 tabs to test:
Two tabs with v9.10 and two tabs with v9.11, and left them open a couple of days.

The tabs with v9.10 triggered an infinite loop, whereas the tabs with v9.11 didn't trigger an infinite loop (so I believe the infinite loop + high CPU usage issue is probably resolved), but the throttling introduced in a different related commit might have triggered a different side-effect. I will describe everything below.

v9.10

Tab 1 with v9.10 - infinite loop after one day:

v9.10 infinite loop - console tab & network tab

Tab 2 with v9.10 - infinite loop after two days:

v9.10 infinite loop - console tab & network tab

Tab 1 vs Tab 2

I.e., both tabs running v9.10 got stuck into an infinite loop eventually, but not on the same day, and the console & network tabs were not the same in both instances. The first tab kept showing "AppCheck: Requests throttled" warning, whereas the scond tab kept showing "AppCheck: ReCAPTCHA error" as you can see in the screenshots above.

v9.11

The tabs running v9.11 did not get into an infinite loop, but did get into a LONG loop, showing hundreds of console warnings and network tab requests, including appcheck throttling and therefore Firestore errors which eventually rendered the app dysfunctional since it no longer could load data. I will explain the console behavior I observed step-by-step:

Tab 3 running v9.11.0

Step 1: A bunch of POST requests fail (could be while the computer is being put to sleep).

2022-10-10 Firebase v9 11 step 1 - channel POST multiple errors
Just to clarify the AuthService console messages: These are logged when Firebase notifies the app that the AuthState changed - see https://firebase.google.com/docs/auth/web/manage-users

Step 2: After the ERR_INTERNET_DISCONNECTED errors, a warning is logged by Firestore that the Connection WebChannel transpored errored:

2022-10-10 Firebase v9 11 step 2 - ERR_INTERNET_DISCONNECTED

Step 3: A token request also fails, then AppCheck warning Requests throttled for 1d (I didn't see exponential backoff), which then triggers Step 4: permission errors in Firestore, since the AppCheck token basically expired and hasn't been renewed.

2022-10-10 Firebase v9 11 step 3 - token ERR_INTERNET_DISCONNECTED then throttling 1d then permission errors

Step 5: Following a bunch of permission errors, a bunch of AuthState updates triggered from Firebase AuthState - hundreds of times, but not infinite loop.

2022-10-10 Firebase v9 11 step 4 - hundreds of sign in attempts after permission errors

Step 6: The app is basically dysfunctional at this point - not showing any content besides the toolbar, since all data fetching failed.

Tab 4 running v9.11

Tab 4 demonstrated similar behavior to Tab 3. I.e.,

  1. A bunch of AuthState changes mixed with ERR_INTERNET_DISCONNECTED / ERR_NETWORK_IO_SUSPENDED errors.
    image
  2. Firestore warning: @firebase/firestore: Firestore (9.11.0): Connection WebChannel transport errored surrounded by more AuthState changes and ERR_INTERNET_DISCONNECTED / ERR_NETWORK_IO_SUSPENDED errors.
    image
  3. token request fails a few times, followed by appcheck throttling warnings: @firebase/app-check: AppCheck: Requests throttled due to 403 error. Attempts allowed again after 01d:00m:00s (appCheck/throttled).
    image
    image
  4. Firestore permission errors (since AppCheck token is no longer valid): FirebaseError: Missing or insufficient permissions.
    image
  5. Hundreds of AuthState changes with no console errors.
    image
  6. Dysfunctional app since Firestore is disconnected at this point.

Network tab

Digging into the network tab, most of the requests look like this:
network tab two requests preview

Summary

  • I haven't encountered the infinite loop with the new v9.11.0 version. Thank you!!
  • However, AppCheck still doesn't recover perfectly & successfully from idle, rendering Firestore inaccessible and the app dysfunctional.
  • Also, hundreds of AuthState changes are triggered by Firebase Auth for no apparent reason.

@MarkDuckworth
Copy link
Contributor

@anisabboud, do you have an ad blocker running which could be blocking those POST requests before and after sleep? I found others posting about the same issue and it was caused by an ad blocker.

@hsubox76
Copy link
Contributor

I would suggest creating a separate issue for your Auth issues as Auth does not use App Check and while Firestore is related to Auth, it can't cause Auth sign-ins to happen. This way the Auth team can focus on that issue specifically while we can look at any possibly lingering Firestore/App Check issues here.

@anisabboud
Copy link
Author

@MarkDuckworth I do have an ad blocker. I believe it was turned on in one tab and off in the other (different deployments of the same app), so I'm not sure it's the culprit. But will try disabling it anyway and retrying the experiment over the next couple of days. (Side note: I use windows hibernate instead of sleep.)

@hsubox76 thank you for the clarification on the difference between Auth & Firestore - will try open a separate issue if I'm able to isolate the auth issue further..

@anisabboud
Copy link
Author

The issue I mentioned 4 days ago happens also without AdBlock.
I left two tabs open for a couple of days. One of them survived so far, but the other one ran into the issue I mentioned.

Basically that's the flow that happened on the dead tab:

  1. Error POST https://content-firebaseappcheck.googleapis.com/v1/projects/brainko1/apps/1:...:...:exchangeRecaptchaV3Token?key=...
  2. Console warning: @firebase/app-check: AppCheck: Requests throttled due to 503 error. Attempts allowed again after 00m:01s (appCheck/throttled).
    • This is different from the warning that I encountered 4 days ago - Requests throttled due to 403 error. Attempts allowed again after 01d:00m:00s, but the result is the same ↓
  3. A bunch of Firestore errors on data the app is listening to (since the token refresh failed): FirebaseError: Missing or insufficient permissions.
  4. Dysfunctional app since no data was able to load.

However, it seems that the infinite loop isn't happening lately, which is a huge relief.
Looking at the AppCheck dashboard https://console.firebase.google.com/u/0/project/_/appcheck/products,
you can notice how the number of "Unverified: invalid requests" dropped from ~10M (due to infinite loop!) to ~1K over the past few days after the update from Firebase v9.10 to v9.11.
image

I'm still seeing ~1K unverified requests per day in the graph, which might be related to the issue I reported (the token expires, the token refresh fails, and then all the Firestore requests are considered invalid and fail).

@hsubox76
Copy link
Contributor

Does the error you are talking about happen after you return to the tab, or while you were away?

A 503 from what I can tell indicates the server is overloaded. A 503 code causes a retry with an exponential backoff starting at about ~1s and increasing with each retry. Does this happen? You only listed one warning message. Are there any more about further attempts after that? If not, perhaps it succeeded on the second try, after 1s. Are there any successful firebaseappcheck.googleapis.com POSTs after that in the network tab?

If that's the case, Firestore should be able to resume any operations after that as normal. Which Firestore operations are you using? onSnapshot? Or one time operations? Is it failing while App Check is retrying or after it has succeeded (if it did eventually succeed)?

@anisabboud
Copy link
Author

anisabboud commented Oct 18, 2022

Q: Does the error happen after you return to the tab, or while you were away?
A: So far encountered the issue personally only 3 times since update to Firebase v9.11/9.12, while I was away from the tab.

Q: Did you see exponential backoff warnings?
A: Saw only 1d / 1s warnings:

  • In the first instance (8 days ago - Oct 10 - running v9.11), there were a few 403 errors followed by AppCheck throttled warnings, but all of them said "Attempts allowed again after 01d" (1 day). I.e., they did not start from '1 sec'.
  • In the second instance (4 days ago - Oct 14 - running v9.12), there was a single 503 error, with a throttled warning "1s", so your suggestion seems logical - that perhaps the following requests did not fail after 1s, but somehow the first throttling/failure caused Firestore to return insufficient permissions errors that broke the listeners.

Q: Which Firestore operations are you using?
A: Mostly multiple active listeners on documents & collections/queries. In a few instances one time operations.

Q: Are there any successful firebaseappcheck.googleapis.com POSTs after that in the network tab?
A: Will check once I run into the issue again. I've had two tabs open over the past few days but so far they both survived!


Overall the frequency of the issue was greatly reduced by the v9.11/v9.12 updates.
This is an example of a successful coming back from sleep scenario from last night:

  1. I hibernate the laptop.
  2. A couple of Firestore channel requests fail with ERR_NETWORK_IO_SUSPENDED followed by ERR_INTERNET_DISCONNECTED.
  3. Three token requests failed ERR_INTERNET_DISCONNECTED
  4. Internet came back and no further errors - app functions fine.
    2022-10-18 normal wake up from sleep with no issues
    Sometimes a warning Firestore (9.12.1): Connection WebChannel transport errored is logged, but it doesn't cause a crash.

So far the issue has been correlated with the appCheck/throttled warning.
Perhaps Firestore should refrain from sending requests while the appCheck token is invalid/being refreshed, to avoid breaking listeners with permission errors? This could also eliminate the "Unverified" requests in the AppCheck dashboard.


The AppCheck dashboard is also looking much better after your fix - currently only ~1K requests per day are labeled as "Unverified", compared to millions of requests caused by the infinite loop earlier.
image

@hsubox76
Copy link
Contributor

Yes, 403 errors are throttled for 1 day because it is likely there's something wrong with the attestation that won't be fixed with a simple retry, such as a bad API key or an attestation failure due to failing ReCAPTCHA. You can see the code for the throttling here (this also applies to 404):

Other errors, such as 503 server errors, could mean the server is overloaded, so it's ok to retry quickly, beginning with 1s.

If the errors are happening while the computer is asleep, that seems reasonable, as that's when the internet connection might not be working. If it doesn't work upon returning, we can focus on and address that. It seems like the main issue in this thread is fixed, if you find the issues with listener errors after waking from sleep to be a big problem, let's start a separate issue focused on that. This issue has become very long and difficult to parse as it has many related but different issues thrown together, and it seems like the biggest problem is fixed. If there's a remaining issue (listener errors after waking from sleep), let's start a separate issue that only focuses on that. (I know the original reported issue was sort of about that but this really became the "infinite loop issue" thread at some point and that's what most of the discussion is about.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.