Apply insteadOf
rewriting rules to Git LFS object URLs by default and without converting to API endpoints
#5547
Labels
Milestone
PR #3590 introduced the
lfs.transfer.enableHrefRewrite
setting which, when set totrue
rather than its default offalse
, causes Git LFS to applyurl.<base>.insteadOf
andurl.<base>.pushInsteadOf
settings to the URLs of individual Git LFS objects.There was some discussion in PR #3590 as to whether this should become the default with the next major release of Git LFS, when a pure SSH transfer protocol would be available. The concern noted at the time was that with a configuration such as the following, Git LFS would attempt to use SSH to download and upload Git LFS objects, when only HTTPS was supported:
In practice, though, this is not what happens, because the implementation from PR #3590 calls the
NewEndpoint()
method of theEndpointFinder
interface in thelfsapi
package, which performs the appropriate rewriting, but then converts any URL with a scheme other thanfile://
orhttp[s]://
into an HTTPS URL. This conversion effectively reconstructs thehttps://
URL, but only after removing any query arguments. For some Git LFS hosting services like GitHub, query arguments are required for successful Git LFS object URL requests, and so these requests would then fail with those services.In effect, when
lfs.transfer.enableHrefRewrite
istrue
, Git LFS applies the same logic to Git LFS object URLs as it normally uses when attempting to guess the appropriate server API URL from the URL of the current branch's remote (or whatever remote is specified with configuration settings such asremote.lfsDefault
) when no explicit server API is defined withlfs.url
orremote.<name>.lfsurl
. This is likely not the intent of the implementation in PR #3590, but it is the current behaviour.One consequence, as described above, is that it is actually not possible at present to rewrite Git LFS object URLs from HTTPS to SSH, even with a
url.<base>.insteadOf
rule like the one above.It is also not possible to rewrite Git LFS object URLs to any protocol other than
http://
orhttps://
, such asfile://
, and use them successfully. The rewrite succeeds, as such, because when the server API URL is found to be afile://
URL, we allow it without change, unlike when the rewrite results in anssh://
URL. However, amissing protocol
error is then generated because the scheme is neitherhttp://
norhttps://
. This is likely reasonable, as we generally expect transfer adapters to use a consistent protocol for both their API queries and individual object transfers.(Also, as reported in #5495, the existing implementation only affects the URL used to download or upload objects, and not the URL used to verify successful uploads.)
We should consider whether we can change the implementation to only perform the necessary rewriting on Git LFS object URLs (and no other transformations as if trying to determine an API endpoint) as a bug fix in a v3.x release, or if we need to wait for the next major release to do so.
And we should review whether we can make this rewriting the default in the next major release, as was proposed in #3590 (review). However, we should only do this when we are confident in the implementation.
That includes consideration of how Git LFS should behave if it is instructed to use different protocols for its batch requests and individual object URL requests.
At the moment, at least for the common case of transfers over HTTP(S), the basic download and upload transfer adapters always use the
newHTTPRequest()
method of theadapterBase
structure, and this is what both performs the rewriting (and attempts conversion as if the determining an API endpoint) and then checks that the result contains an HTTP(S) protocol. So only HTTP(S) requests are expected or allowed.And for the pure SSH transfer adapter, individual objects are expected to be transferred using SSH as well as batch requests, and so only the
ssh
adapter is defined in the internal batch response.Similarly, when the remote server API has a
file://
URL, which by default causes the standalone file custom transfer agent to be used, that implicitly means batches of objects are collected and passed to the agent without a batch query, and the agent maps their paths to map to file paths within the local Git repository referenced by the initialfile://
URL. So no rewriting of individual object URLs occurs at all in this case.Do we, therefore, want to simply disallow the rewriting of per-object URLs to use different protocols? What would it mean, for instance, to rewrite an HTTPS URL into an SSH one (assuming we fix the problem where we try to convert it back into an HTTPS URL to use as an API endpoint); should we then switch over somehow and try to use the SSH transfer adapter for just that one object?
Finally, our test suite appears to only validate the
lfs.transfer.enableHrefRewrite
configuration setting using a pair of negative checks which show object URLs have been rewritten to invalid ones. It would be desirable to have a more complete set of tests which validate successful downloads and uploads after URLs are rewritten, and which also confirm the expected behaviour when URLs are rewritten to different protocols (whether that is permitted or not).The text was updated successfully, but these errors were encountered: