-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse checkout for git pulls #15824
base: main
Are you sure you want to change the base?
Sparse checkout for git pulls #15824
Conversation
CodSpeed Performance ReportMerging #15824 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @tetracionist! This is looking pretty good, but I have a couple of questions about this implementation after a first pass.
src/prefect/runner/storage.py
Outdated
Uses sparse-checkout on repository | ||
""" | ||
|
||
cmd = ["git", "sparse-checkout", "init"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the init
subcommand has been deprecated, and the set
subcommand is the preferred approach based on the git docs. It seems like we would update this method to only call git sparse-checkout set
with the provided directories, but let me know if that would cause issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be okay, I will try using git sparse-checkout set
src/prefect/runner/storage.py
Outdated
# Limit git history and set path to clone to | ||
cmd += ["--depth", "1", str(self.destination)] | ||
# For sparse-checkout it is recommended to use --filter=blob:none to reduce disk-space | ||
if self._sparse_checkout_mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like if sparse checkout mode is used then you won't be able to include submodules in the checkout. Is is possible to have both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this in the else block as I was unsure what submodules did, but are they something to do with adding code from another repo into the cloned repo?
I think this would be easy with cone mode but might be trickier without.
Would it be useful to also sparse-checkout the submodule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, submodules allow you to reference another git repository from a parent repository. I'm not sure how sparse checkout would interact with submodules, but I think it'd be useful to apply sparse checkout to submodules, also.
This pull request is stale because it has been open 14 days with no activity. To keep this pull request open remove stale label or comment. |
Added support for sparse-checkout when cloning a GitHub repo (#15185).
This is ideal for teams that have larger repos who don't want to clone all the folders in their repo.
When directories are specified in the Prefect.yaml or GitHubRepository storage class, we will use sparse checkout to get the directories and any file in the root directory (cone-mode). Also supports without cone-mode, although this is not recommended in the official git docs (see https://git-scm.com/docs/git-sparse-checkout)
Example usage in a
Prefect.yaml
pull: - prefect.deployments.steps.git_clone: repository: https://github.com/tetracionist/prefect.git branch: sparse-checkout-for-git-pulls access_token: directories: [src/integrations/prefect-azure, src/integrations/prefect-dask] cone_mode: True # set to true by default, only need to include if False
Checklist
<link to issue>
"mint.json
.