Next Steps
These next sections highlight features and additional information that you may find useful to make the most out of the Git repositories on the Hugging Face Hub.
How to programmatically manage repositories
Hugging Face supports accessing repos with Python via the huggingface_hub
library. The operations that we’ve explored, such as downloading repositories and uploading files, are available through the library, as well as other useful functions!
If you prefer to use git directly, please read the sections below.
Learning more about Git
A good place to visit if you want to continue learning about Git is this Git tutorial. For even more background on Git, you can take a look at GitHub’s Git Guides.
How to use branches
To effectively use Git repos collaboratively and to work on features without releasing premature code you can use branches. Branches allow you to separate your “work in progress” code from your “production-ready” code, with the additional benefit of letting multiple people work on a project without frequently conflicting with each others’ contributions. You can use branches to isolate experiments in their own branch, and even adopt team-wide practices for managing branches.
To learn about Git branching, you can try out the Learn Git Branching interactive tutorial.
Using tags
Git allows you to tag commits so that you can easily note milestones in your project. As such, you can use tags to mark commits in your Hub repos! To learn about using tags, you can visit this DevConnected post.
Beyond making it easy to identify important commits in your repo’s history, using Git tags also allows you to do A/B testing, clone a repository at a specific tag, and more! The huggingface_hub
library also supports working with tags, such as downloading files from a specific tagged commit.
How to duplicate or fork a repo (including LFS pointers)
If you’d like to copy a repository, depending on whether you want to preserve the Git history there are two options.
Duplicating without Git history
In many scenarios, if you want your own copy of a particular codebase you might not be concerned about the previous Git history. In this case, you can quickly duplicate a repo with the handy Repo Duplicator! You’ll have to create a User Access Token, which you can read more about in the security documentation.
Duplicating with the Git history (Fork)
A duplicate of a repository with the commit history preserved is called a fork. You may choose to fork one of your own repos, but it also common to fork other people’s projects if you would like to tinker with them.
Note that you will need to install Git LFS and the huggingface_hub
CLI to follow this process. When you want to fork or rebase a repository with LFS files you cannot use the usual Git approach that you might be familiar with since you need to be careful to not break the LFS pointers. Forking can take time depending on your bandwidth because you will have to fetch and re-upload all the LFS files in your fork.
For example, say you have an upstream repository, upstream, and you just created your own repository on the Hub which is myfork in this example.
-
Create a destination repository (e.g. myfork) in https://huggingface.co
-
Clone your fork repository:
git clone git@hf.co:me/myfork
- Fetch non-LFS files:
cd myfork
git lfs install --skip-smudge --local # affects only this clone
git remote add upstream [email protected]:friend/upstream
git fetch upstream
- Fetch large files. This can take some time depending on your download bandwidth:
git lfs fetch --all upstream # this can take time depending on your download bandwidth
4.a. If you want to completely override the fork history (which should only have an initial commit), run:
git reset --hard upstream/main
4.b. If you want to rebase instead of overriding, run the following command and resolve any conflicts:
git rebase upstream/main
- Prepare your LFS files to push:
git lfs install --force --local # this reinstalls the LFS hooks
huggingface-cli lfs-enable-largefiles . # needed if some files are bigger than 5GB
- And finally push:
git push --force origin main # this can take time depending on your upload bandwidth
Now you have your own fork or rebased repo in the Hub!
< > Update on GitHub