Skip to content

Conversation

@rasa
Copy link
Member

@rasa rasa commented Nov 1, 2021

No description provided.

@rasa rasa requested a review from a team as a code owner November 1, 2021 17:39
@rasa rasa mentioned this pull request Nov 1, 2021
@rashil2000
Copy link
Member

Sorry for asking but how will this ensure that we don't run into IP limits?

@rasa
Copy link
Member Author

rasa commented Nov 1, 2021

Sorry for asking but how will this ensure that we don't run into IP limits?

It doesn't. But we don't have access to Excavator's logs any more at https://scoop.r15.ch

@rashil2000
Copy link
Member

According to GitHub's docs, we can make 1000 requests per hour. This will be fine for the Main bucket, but for Extras, we will have to split the Action to run on 1000 manifests in one run and on 246 manifests in a second run (total 1246 manifests in Extras as of now).

image

@pratikpc
Copy link
Contributor

pratikpc commented Nov 1, 2021

Could we run GitHub Actions twice a day?

Once for odd files, another for even files?

@pratikpc
Copy link
Contributor

pratikpc commented Nov 1, 2021

@rasa @rashil2000 any idea how they count requests?

For example would checkver be 1 request, 32 bit download be 1 request and 64 bit download be one request?

In that case, we might face issues on Main repo as well.

@rashil2000
Copy link
Member

Not all manifests will use GitHub for checkver and downloading, right? I'm not sure if we can count something like this.

@pratikpc
Copy link
Contributor

pratikpc commented Nov 1, 2021

@rashil2000 depending on how they count even if 500 use them, we might come close to the limit

I'm not sure if we can count something like this.

Would a grep like command work to count the number of lines present in all files with github.com or checkver: "github" and ignore those starting with the word homepage?

@rashil2000
Copy link
Member

We will have to create an elaborate and ugly bash script then 😥 and use that instead of kiddong/Scoop-GitHubActions

@se35710
Copy link
Contributor

se35710 commented Nov 1, 2021

Another option is to ignore all Github checkver and updates for the main excavator, and implement a rate limited excavator for Github API dependent manifests that start a new run every other or third hour depending on number of manifests.

@se35710
Copy link
Contributor

se35710 commented Nov 1, 2021

Or add a checkver interval option, with say, hourly, daily, weekly, monthly options, and use that as a selector for the excavator.

@r15ch13
Copy link
Member

r15ch13 commented Nov 1, 2021

It doesn't. But we don't have access to Excavator's logs any more at scoop.r15.ch

That was an error on my side. If the DNS record gets updated the logs should be available again.

@rasa
Copy link
Member Author

rasa commented Nov 1, 2021

Logs are back up at https://scoop.r15.ch/ :)

@niheaven
Copy link
Member

niheaven commented Nov 2, 2021

remote:   [email protected]:ScoopInstaller/Extras.git        
remote: error: GH006: Protected branch update failed for refs/heads/master.        
remote: error: At least 1 approving review is required by reviewers with write access.        
To github.com:lukesampson/scoop-extras.git
 ! [remote rejected]     master -> master (protected branch hook declined)
error: failed to push some refs to '[email protected]:lukesampson/scoop-extras.git'
Pushing updates ...
hub push origin master
remote: error: GH006: Protected branch update failed for refs/heads/master.        
remote: error: At least 1 approving review is required by reviewers with write access.        
To github.com:ScoopInstaller/Main.git
 ! [remote rejected]     master -> master (protected branch hook declined)
error: failed to push some refs to '[email protected]:ScoopInstaller/Main.git'

So what's wrong? (and also too many 429...)

@Valinor
Copy link
Contributor

Valinor commented Nov 2, 2021

Only a little remark :
uses: kidonng/Scoop-GithubActions@stable
Scoop-GithubActions should be clone in ScoopInstaller organisation.
Last update for this repo is 2020.

@rashil2000
Copy link
Member

It looks like Excavator was indeed trying to push changes, but ran into some branch protection restrictions that were recently changed by @rasa I think

@r15ch13
Copy link
Member

r15ch13 commented Nov 2, 2021

It looks like Excavator was indeed trying to push changes, but ran into some branch protection restrictions that were recently changed by @rasa I think

Excavator won't work if the master branch is protected. It could be changed to create PRs though. (which would result in a lot of spam PRs 😄)
Or my account can be whitelisted (Excavator could also be changed to use the ScoopBot Account)

@rasa
Copy link
Member Author

rasa commented Nov 2, 2021

I'll turn off branch protection, for now. Once we work out the kinks, I will re-enable it.

@rashil2000
Copy link
Member

Excavator could also be changed to use the ScoopBot Account

I think this will be better in the long run

@rashil2000
Copy link
Member

@r15ch13 can we, for the time being, prioritize giving another account Excavator access (ScoopBot)?

We are facing multiple problems (lptstr/winfetch#112, ScoopInstaller/Extras#7258, ScoopInstaller/Extras#7263, ScoopInstaller/Extras#7265, #2847) with updates - mostly where download URLs are static but version numbers change at the source, so hash mismatch occurs. 😢

@rashil2000 rashil2000 marked this pull request as draft December 1, 2021 08:57
@rashil2000
Copy link
Member

rashil2000 commented Dec 1, 2021

I'm currently working on adding a parameter to the Excavator workflow, let's say X, which will check every X'th manifest depending on the current hour as the starting index. I've updated the cron to run 6 times in a day. If X is 3, it will go something like this:

(assuming a bucket has around 1000 manifests)

00:20 - every third manifest, starting at index 0
04:20 - every third manifest, starting at index 1
08:20 - every third manifest, starting at index 2
12:20 - every third manifest, starting at index 0
16:20 - every third manifest, starting at index 1
20:20 - every third manifest, starting at index 2

This means that each manifest will get checked twice a day for updates. Running 300 GitHub API calls every hour (3 calls per manifest - total around 900) ensures that we don't face rate limiting.

Is this approach fine?

@rashil2000
Copy link
Member

@niheaven is there a way to pass multiple app names to the checkver.ps1 and auto-pr.ps1 utilities?

@niheaven
Copy link
Member

niheaven commented Dec 1, 2021

auto-pr just direct push or make PR for every updated manifest, and checkver supports * and ? wildcards.

You can carefully split manifests by, e.g., names (a*, b*, etc.), and then check them separately.

Each part should have similar number of manifests.

@rashil2000
Copy link
Member

You can carefully split manifests by, e.g., names (a*, b*, etc.), and then check them separately.

Each part should have similar number of manifests.

This won't be possible alphabetically, because some letters (like q, z) will have very less manifests.

I think we can add an additional parameter to checkver.ps1, where it accepts an array of appnames?

That way I can filter manifests based on index and pass them to auto-pr.ps1.

@rasa
Copy link
Member Author

rasa commented Dec 2, 2021

I think tweaking the Exacavtor to

  1. Use authenticated requests, and
  2. pause based on the X-RateLimit-* HTTP headers
    is the best course of action.

See ScoopInstaller/Scoop#3912 and https://github.com/rasa/scoop-directory/blob/9ed3a43e0ee0b408d423f99be28c7a301f53675f/maintenance/github-crawler.py#L170 for example.

See also ScoopInstaller/Excavator#5 (comment)

@Milo123459
Copy link
Contributor

I completely agree. That'll be the best idea.

@rashil2000 rashil2000 changed the title Add excavator workflow chore(ci): Add excavator workflow Dec 4, 2021
@rashil2000
Copy link
Member

Current situation of using GitHub Actions for Excavator workflow

I experimented a lot with this today. Logs can be seen here - https://github.com/rashil2000/Extras/actions/workflows/excavator.yml

Scoop's checkver now uses authenticated GitHub API requests. But there are other problems:

@rasa
Copy link
Member Author

rasa commented Dec 4, 2021

We need to pause based on X-RateLimit-* headers here. @rasa can you help with this?

@rashil2000 Yes.

@rashil2000
Copy link
Member

I think there was some problem with GitHub yesterday...

Without making any changes, I let the Excavator workflow run at its scheduled times (6 times per day), for a total of 6 runs in my fork of Extras and 3 in my fork of Main. Check the logs here - https://github.com/rashil2000/Extras/actions/workflows/excavator.yml and https://github.com/rashil2000/Main/actions/workflows/excavator.yml.

There are no (403) Rate-Limiting errors now.

This PR is now ready for review.

@rashil2000 rashil2000 marked this pull request as ready for review December 5, 2021 17:38
@rashil2000
Copy link
Member

@r15ch13 once this is merged, you can disable your personal pipelines :)

@rashil2000
Copy link
Member

3 more runs on Extras and 3 more on Main overnight, still no errors. Merging this now.

@r15ch13 please disable ScoopInstaller/Excavator to prevent any possible conflicts

@r15ch13
Copy link
Member

r15ch13 commented Dec 6, 2021

okay, I have disabled Excavator for Main, Extras and Versions 👍🏽

@se35710
Copy link
Contributor

se35710 commented Dec 6, 2021

@r15ch13 If the Excavator is still running for Java, you can disable it there too.

@rashil2000
Copy link
Member

I have one more observation to state.

  • On Windows runners, there are no SSL errors, but we get heavily rate-limited. We need to pause based on X-RateLimit-* headers here.

These are not rate limiting errors. The 403 errors I saw are actually because of the GitHub API's hard limit of 1000 requests per hour. You can observe this if you try to run the excavator workflow more than once within an hour span.

The only way I can think around this is the manifest-splitting solution I mentioned earlier - as the number of manifests in the buckets increases, we will have to split checkver process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants