The s1 simple data store.
Upload directory trees to S3 or GS cloud buckets as "submissions". Each submission takes a user assigned
identifier and a human readable name. The cloud location of the submission has the key structure
submissions/{uuid}--{name}/{tree}
.
All uploads are checksummed and verified. Multpart uploads use defined chunk sizes to consistently track compose S3 ETags.
pip install git+https://github.com/DataBiosphere/ssds
Make a new submission
ssds staging upload --submission-id my_submission_id --name my_cool_submission_name /local/path/to/my/submission
Update an existing submission
ssds staging upload --submission-id my_existing_submission_id /local/path/to/my/submission
List all staging submissions
ssds staging list
List contents of a staging submission
ssds staging list-submission --submission-id my_submission_id
The above commands can target staging deployments other than the default with the --deployment
argument.
Available deployments can be listed with
ssds deployment list-staging
and
ssds deployment list-release
Submissions can be synced between staging deployments with
ssds staging sync --submission-id my_existing_submission_id --dst-deployment my_dst_deployment
For working with requester pays Google Storage buckets, the billing project is specified by setting the
environment variable GOOGLE_PROJECT
, e.g.
export GOOGLE_PROJECT="my-gcp-billing-project"
Run tests with
make test
If mypy
linting fails, you may need to run
mypy --install-types
Many tests require access to test buckets listed in ssds/deployment.py
.
These buckets are in the pangenomics
AWS account.
Be sure to configure your S3 and GS credentials have access.
Project home page GitHub
Please report bugs, issues, feature requests, etc. on GitHub.
1super, splendidly, serendipitous, sometimes, sporadically, etc.