Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] S3 fixes and improvements #101

Merged
merged 4 commits into from
Apr 23, 2022
Merged

Conversation

rcoup
Copy link
Contributor

@rcoup rcoup commented Apr 6, 2022

This is WIP, there's a bunch of requirements upgrades which I ended up doing to get the dev environment working & tests passing, but I can rebase that away.

A series of S3 improvements. Happy to break them up if you want, but more after feedback at this point :-)

s3_external: force the upload Content-Type

git-lfs as of v2.5.0 detects the file Content-Type and provides it during PUT requests to the storage when using the basic transfer. If we don't include the same Content-Type value in the S3 signature generation, then S3 rejects the upload.

Unfortunately git-lfs doesn't pass the Content-Type value to the Batch request, so we can't include it in the S3 signature generation. And while setting lfs.contenttype=false in the client config forces sending application/octet-stream, S3 isn't happy with that either (IIUC it defaults to binary/octet-stream or expects Content-Type to be omitted).

But if the Batch response upload action includes a Content-Type header then git-lfs will always use that for the PUT request. So change Giftless to return application/octet-stream, and also use it for signature generation.

s3_external: use sha256 checksums when uploading

Use the new S3 content integrity support to pass the object sha256 to S3 when uploading objects, so S3 can confirm integrity.

This requires a newer boto3 version.

tests: s3: force us-east-1 region to match urls

If you run the tests with a different default region set in your local S3 config, you get failures. Fix it by forcing to us-east-1.

If you're reasonably happy I can either t

rcoup added 4 commits April 6, 2022 15:45
git-lfs as of v2.5.0 [1] detects the file Content-Type and provides it
during PUT requests to the storage when using the basic transfer. If we
don't include the same `Content-Type` value in the S3 signature
generation, then S3 rejects it during upload.

Unfortunately git-lfs doesn't pass the `Content-Type` value to the Batch
request, so it can't be included in S3 signature generation. And while
setting `lfs.contenttype=false` in the client config forces sending
`application/octet-stream` [2], S3 isn't happy with that either.

If the Batch response upload action includes a `Content-Type` header
then git-lfs will always use that for the PUT request, so always return
`application/octet-stream`, and also use it for signature generation.

[1] git-lfs/git-lfs#3137
[2] git-lfs/git-lfs#3163
- move dev-requirements to "latest"
- upgrade reequirements sensibly
- fix minor azure typing issues
- azure test fixes wrt URL changes (/bucket/prefix%2Frepo%2Forg%2Fsha -> /bucket/prefix/repo/org/sha)
Use the new S3 content integrity support [1] to pass the object sha256
to S3 when uploading objects, so S3 can confirm integrity.

[1] https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html
@mtb-xt
Copy link

mtb-xt commented Apr 13, 2022

@rcoup did it work for you for s3_external? I've tried with 'stock' giftless and with your patches applied, I'm still getting rejected by S3. Would you be interested in looking at it?

@rcoup
Copy link
Contributor Author

rcoup commented Apr 13, 2022

@mtb-xt yes, is working well for me. What errors are you seeing? Using something like this should show all the requests & most of the responses that are happening, for both git itself and git-lfs' interaction with the git-lfs server & any storage endpoints:

GIT_TRANSFER_TRACE=1 GIT_CURL_VERBOSE=1 GIT_TRACE=1 git push

FWIW, I'm predominantly using EC2 instance roles with IAM roles/policies for bucket access; and git-lfs v2.13.3

@mtb-xt
Copy link

mtb-xt commented Apr 14, 2022

@rcoup hey, thanks for helping out! I'm getting 400 Bad Requests, which lead me to believe some of the headers are wrong -_-
I'm using git-lfs/3.1.2 (GitHub; linux amd64; go 1.17.7)

Here are some sanitized logs -> https://gist.github.com/mtb-xt/7c8641535aa615e6f4eeea24fc4c2285

@mtb-xt
Copy link

mtb-xt commented Apr 14, 2022

Aha, this weird issue was that my bucket had KMS encryption key managed by AWS, and that requires using SigV4 signatures in presigned urls.
This has nothing to do with your PR, I'll make a PR of my own to make this configurable, thanks for your help! :)

@rcoup
Copy link
Contributor Author

rcoup commented Apr 14, 2022

Argh, their docs say that SigV4 is now the default, but apparently not

@mtb-xt
Copy link

mtb-xt commented Apr 23, 2022

@rufuspollock @shevron are you guys going to accept this PR? There are some more fixes needed for S3, but I'd like to start off from changes in this branch

@rufuspollock rufuspollock merged commit 13af128 into datopian:master Apr 23, 2022
@rufuspollock
Copy link
Member

rufuspollock commented Apr 23, 2022

@rcoup first a big thank-you for the contribution 👍 . Really welcome. This looked good and as per @mtb-xt i have merged this. I did want to check that:

a) the PR was in good state to merge (from review and discussion seemed so but was not sure from your last comments). I do notice build tests are failing for what seems like a dependency issue https://github.com/datopian/giftless/runs/6138620405?check_suite_focus=true
b) you are happy with the MIT license

@rcoup rcoup mentioned this pull request Apr 25, 2022
1 task
@rcoup
Copy link
Contributor Author

rcoup commented Apr 25, 2022

@rufuspollock
a) I've fixed the dependency issue in #103.
b) MIT is good 👍

@rcoup rcoup deleted the rc-s3-fixes branch April 25, 2022 13:21
@rcoup
Copy link
Contributor Author

rcoup commented Apr 25, 2022

@mtb-xt CC me in your PR please?

rufuspollock added a commit that referenced this pull request Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants