Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL substitution via insteadOf broken when NewEndpointFromCloneURL is used #5528

Open
shadowncs opened this issue Oct 1, 2023 · 6 comments
Milestone

Comments

@shadowncs
Copy link

Describe the bug
If one does not define lfs.url, git-lfs falls back to trying a URL based on the local clone. That is as designed and documented.
However, the URL substitution using insteadOf only applies to clone URL, not to the generated derrived URL.

To Reproduce

  1. in .gitconfig we replace originalserver with a different server, but only for LFS URLs:
[url "https://newserver/reponame.git/info/lfs"]
     insteadOf = https://originalserver/reponame.git/info/lfs
  1. git local clone from remote repo at https://originalserver/reponame
  2. no lfs.url defined anywhere, no .lfsconfig defined
  3. when doing git-lfs pull, git-lfs connects to the originalserver, not to the newserver

Expected behavior
When doing git-lfs pull, it replaces https://originalserver/reponame.git/info/lfs URL with https://newserver/reponame.git/info/lfs and connects to the newserver instead.

System environment
Linux Fedora

Output of git lfs env

git lfs env
git-lfs/3.3.0 (GitHub; linux amd64; go 1.19.6)
git version 2.41.0

Endpoint=https://originalserver/reponame.git/info/lfs (auth=none)
LocalWorkingDir=/home/shadowncs/tmp/repro
LocalGitDir=/home/shadowncs/tmp/repro/.git
LocalGitStorageDir=/home/shadowncs/tmp/repro/.git
LocalMediaDir=/home/shadowncs/tmp/repro/.git/lfs/objects
LocalReferenceDirs=
TempDir=/home/shadowncs/tmp/repro/.git/lfs/tmp
ConcurrentTransfers=8
TusTransfers=false
BasicTransfersOnly=false
SkipDownloadErrors=false
FetchRecentAlways=false
FetchRecentRefsDays=7
FetchRecentCommitsDays=0
FetchRecentRefsIncludeRemotes=true
PruneOffsetDays=3
PruneVerifyRemoteAlways=false
PruneRemoteName=origin
LfsStorageDir=/home/shadowncs/tmp/repro/.git/lfs
AccessDownload=none
AccessUpload=basic
DownloadTransfers=basic,lfs-standalone-file,ssh
UploadTransfers=basic,lfs-standalone-file,ssh
GIT_EXEC_PATH=/usr/libexec/git-core
git config filter.lfs.process = "git-lfs filter-process --skip"
git config filter.lfs.smudge = "git-lfs smudge --skip -- %f"
git config filter.lfs.clean = "git-lfs clean -- %f"

Additional context
This happens during LFS URL being initialized from clone URL here.
NewEndpoint runs the URL alias replace here, on the git clone remote URL. In our example, that URL is https://originalserver/reponame
After the git clone remote URL is (unsuccessfully) aliased, NewEndpointFromCloneURL() appends '/info/lfs' and concludes.
The above has been confirmed via println on a local git-lfs source copy and build.

Ideally we want to first append the /info/lfs part, building the full URL from the clone URL, before puttint it thru aliasing engine.

@chrisd8088
Copy link
Member

Hey, thanks for the report and analysis. I haven't been able to craft a reproduction test case yet, with or without setting lfs.transfer.enablehrefrewrite to true, but assuming that's possible, this does sound like something we should address.

@chrisd8088
Copy link
Member

So on the one hand I can see this behaviour with a fairly simple modification of one of our basic config tests, something like:

$ git init test
$ cd test
$ git remote add origin https://example.com/test
$ git config --local url.https://example.com/other.insteadOf https://example.com/test.git/info/lfs
$ git lfs env | grep Endpoint=
Endpoint=https://example.com/test.git/info/lfs (auth=none)

Whereas if we add a line here which looks like the following:

	ep.Url = e.ReplaceUrlAlias(operation, ep.Url)

then we get different results, more in line with what you were suggesting:

$ git lfs env | grep Endpoint=
Endpoint=https://example.com/other (auth=none)

That said, upon reflection I have a few concerns about introducing this change or something like it. The first is that since at least 2016 the Git LFS API documentation has stated that:

By default, Git LFS will append .git/info/lfs to the end of a Git remote url to build the LFS server URL it will use

So at a minimum we'd have to add to the specification to indicate that we might further rewrite the constructed URL according to any url.*.insteadOf rules in the configuration.

That leads to my second concern, which is when and how to perform this rewriting. Git itself generally only performs one URL rewriting pass (so a chain of url.*.insteadOf settings only results in the first one in the chain being applied). So I don't think we want to do a re-rewrite of the type I used above for the example. Admittedly, that was just a quick hack for test purposes.

At the moment, as was noted in the issue description, we perform the rewriting early, on whatever string we received from git config for the current remote (e.g., git config remote.origin.url). Among other checks, if the rewritten URL starts with file://, we return it as-is, without appending .git/info/lfs. The change proposed in this issue, to append .git/info/lfs before doing any rewriting, would also have to strip off any such suffix if the rewritten URL turned out to be a file:// one. While that's possible, it suggests there may be other subtleties to consider and ways in which this change might inadvertently affect users with existing configurations that make use of non-HTTPS or non-default Git LFS server URLs.

Third, there's already a way to achieve the desired result and override the URL which Git LFS would otherwise construct by configuring a remote.*.lfsurl value:

$ git config --local remote.origin.lfsurl https://example.com/other
$ git remote get-url origin
https://example.com/test
$ git lfs env | grep Endpoint=
Endpoint=https://example.com/other (auth=none)

Moreover, that remote.*.lfsurl value will be rewritten by the longest matching url.*.insteadOf configuration, if one exists.

Given the above, I'm inclined to think we should not alter the existing behaviour, but advise users who want to customize the server URLs constructed by Git LFS to use remote.*.lfsurl configuration settings.

Would that approach suffice for your use case?

@shadowncs
Copy link
Author

Hey Chris, thank you for looking into this.

Going by git-config definition for insteadOf, developers would expect that Any URL that starts with this value will be rewritten to start, instead, with <base>. That includes, IMO, internally generated URLs.

So either

  • docs should include git-lfs remote-derived URLs as partial exception (will not apply rewrite if substituted string includes any part of auto-appended .git/info/lfs). As you suggest, documenting the work-around is highly appreciated if you choose this path.
  • git-lfs should update its way of rewriting URLs.

Personally I prefer option 2 but my experience with LFS is limited and not sure if there are unexpected effects in doing so.
I'll let you decide which option is best, but I do want to discuss a bit the 2nd option before dismissing it.

I also do not recommend doing the URL substitution more than once, that is guaranteed to break some of the setups out there, which match other rules after substitution.

What I think may work, is to change the order of operations in NewEndpointFromCloneURL such that we first append .git/info/lfs if needed to rawUrl, then call NewEndpoint(). This way, substitutions run on the fully generated URL.
The file:// case needs an extra check after substitution (as you note), to remove .git/info/lfs if it was appended before substitution and substitution generated a file:// URL, and if appended suffix is still present after substitution. Doing that would have file:// behave just like before, that is, no .git/info/lfs added, whether substitutions were applied or not.

Best regards,

@chrisd8088
Copy link
Member

Thanks for the thoughtful reply! May I ask, just for context, if you're able to use remote.*.lfsurl to solve your particular configuration problem?

What I think may work, is to change the order of operations in NewEndpointFromCloneURL such that we first append .git/info/lfs if needed to rawUrl, then call NewEndpoint(). This way, substitutions run on the fully generated URL.
The file:// case needs an extra check after substitution (as you note), to remove .git/info/lfs if it was appended before substitution and substitution generated a file:// URL, and if appended suffix is still present after substitution. Doing that would have file:// behave just like before, that is, no .git/info/lfs added, whether substitutions were applied or not.

That's what I was considering too, but I confess I'm concerned there may be other edge cases which have to be considered where a change would break existing users' configurations. (We'll also have to handle adding and removing trailing / characters appropriately, both before and after rewriting; that shouldn't be too hard, though.)

I think what I'll do is mark this as an enhancement for the next major release, where we can break backwards compatibility with the behaviour of prior releases. In the meantime, though, I'll add something to our documentation to clarify that at present we append the .git/info/lfs suffix after rewriting URLs.

@chrisd8088 chrisd8088 added this to the v4.0.0 milestone Oct 6, 2023
@shadowncs
Copy link
Author

shadowncs commented Oct 6, 2023

The current work-around which we're using is to have a repo root .lfsconfig with the original pre-substitution LFS URL, which then goes cleanly thru substitutions as expected, being fully defined and not derrived.

[lfs]
	url = https://example.com/test/info/lfs

While it may seem redundant - after all this URL is also auto-derrived inside git-lfs - the fact this is present in the .lfsconfig makes it go thru substitutions and matches successfully. I believe this is better since it allows us to change URLs for all repos from a single central place, the global .gitconfig. This is an AOSP build, in case you're wondering, so hundreds and hundreds of git repos.

I also tested the remote.origin.lfsurl you suggested, setting it to the final desired URL. That works too, but it's not appropriate for our use-case since many repos would need to be changed every time we switch e.g. to a different server.

I do understand what you say about side effects - this is the kind of change that's really hard to unit test 100% of in-the-wild cases. So seems appropriate for next major release, in the mean time if you can update git-lfs documentation to reflect this corner-case, maybe it would save some soul a day or 2 of debugging.

Thanks again for looking at this and for all your work here!

@chrisd8088
Copy link
Member

chrisd8088 commented Oct 11, 2023

I'm working on some documentation updates, first of all, and thank you for the additional context regarding your use case. If I understand correctly, you'd like to specify a placeholder Git LFS server URL in each repository's .lfsconfig file, but one which defines the repository's real name, and then remap those URLs to an actual server using rewriting rules in your Git configuration.

I thought I should ask if you've tried using the remote.*.lfsurl configuration combined with url.*.insteadOf rewriting rules. I noted in #5528 (comment) above that the regular URL rewriting configurations should be applied to any URL derived from a remote.*.lfsurl configuration, even if it's in an .lfsconfig file.

So for instance, if I put the following into .lfsconfig:

[remote "origin"]
	lfsurl = https://fakeserver/reponame.git/info/lfs

and this into a Git configuration file like ~/.gitconfig:

[url "https://realserver/"]
	insteadOf = https://fakeserver/

then the Git LFS server URL is constructed as https://realserver/reponame.git/info/lfs:

$ git lfs env | grep Endpoint=
Endpoint=https://realserver/reponame.git/info/lfs (auth=none)

In other words, I don't think you need to set remote.*.lfsurl to the final desired URL; you should be able to set it to a per-repository placeholder and then rewrite it as necessary. Would that help with your use case?

chrisd8088 added a commit to chrisd8088/git-lfs that referenced this issue Oct 16, 2023
Our Server Discovery documentation describes how we form Git LFS
server API endpoint URLs based on Git remotes and other Git and Git LFS
configurations.  This documentation was introduced in commit
21e1695 of PR git-lfs#1641 and has only
been lightly modified since then.

As discusssed in issue git-lfs#5528, we append "/info/lfs" to Git remote URLs,
but only after rewriting those URLs according to any "url.{name}.insteadOf"
configurations, which is somewhat counter to what is implied by Git's
own documentation of these settings.  We may change this behaviour
in a future Git LFS major release.  For the present, though, we simply
add a fuller description of how we convert Git remote URLs into Git LFS
API server URLs to our documentation, specifically touching on the
sequence in which we rewrite, convert, and append the "/info/lfs"
path segments to the remote URLs.

In PR git-lfs#3748 we added support for Git LFS endpoints defined by "file://"
URLs, and in PR git-lfs#3918 we enhanced this by also accepting local file paths
in remote URLs, so we also include comments on how we handle these
cases in our new documentation.

Further, although support for distinct URLs used only for upload operations
has existed since PR git-lfs#949, it is not mentioned in our Server Discovery
documentation, so we add notes on this feature now as well.

4469 - default remotes (lfsdefault, lfspushdefault)
5066 - autodetect, searchall
@github-project-automation github-project-automation bot moved this to Enhancements in Backlog Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Enhancements
Development

No branches or pull requests

2 participants