title | summary | toc | key | docs_area |
---|---|---|---|---|
Use Cloud Storage |
CockroachDB constructs a secure API call to the cloud storage specified in a URL passed to various operation statements. |
true |
use-cloud-storage-for-bulk-operations.html |
manage |
CockroachDB constructs a secure API call to the cloud storage specified in a URL passed to one of the following statements:
- [
BACKUP
]({% link {{ page.version.version }}/backup.md %}) - [
RESTORE
]({% link {{ page.version.version }}/restore.md %}) - [
IMPORT INTO
]({% link {{ page.version.version }}/import-into.md %}) - [
EXPORT
]({% link {{ page.version.version }}/export.md %}) - [
CREATE CHANGEFEED
]({% link {{ page.version.version }}/create-changefeed.md %})
{% include {{ page.version.version }}/misc/note-egress-perimeter-cdc-backup.md %}
{{site.data.alerts.callout_success}} We strongly recommend using cloud/remote storage. {{site.data.alerts.end}}
URLs for the files you want to import must use the format shown below. For examples, see Example file URLs.
[scheme]://[host]/[path]?[parameters]
{% include {{ page.version.version }}/misc/external-connection-note.md %}
The following table provides a list of the parameters supported by each storage scheme. For detail on authenticating to each cloud storage provider, see the [Cloud Storage Authentication]({% link {{ page.version.version }}/cloud-storage-authentication.md %}) page.
Location | Scheme | Host | Parameters
------------------------------------------------------------+-------------+--------------------------------------------------+----------------------------------------------------------------------------
Amazon S3 | s3
| Bucket name | [AUTH
]({% link {{ page.version.version }}/cloud-storage-authentication.md %}#amazon-s3-specified): implicit
or specified
(default: specified
). When using specified
pass user's AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
.
[ASSUME_ROLE
]({% link {{ page.version.version }}/cloud-storage-authentication.md %}#set-up-amazon-s3-assume-role) (optional): Pass the ARN of the role to assume. Use in combination with AUTH=implicit
or specified
.
[AWS_SESSION_TOKEN
]({% link {{ page.version.version }}/cloud-storage-authentication.md %}) (optional): For more information, see Amazon's guide on temporary credentials. S3_STORAGE_CLASS
(optional): Specify the Amazon S3 storage class for created objects. Note that Glacier Flexible Retrieval and Glacier Deep Archive are not compatible with incremental backups. Default: STANDARD
.
Azure Blob Storage | azure-blob
/ azure
| Storage container | AZURE_ACCOUNT_NAME
: The name of your Azure account.AZURE_ACCOUNT_KEY
: Your Azure account key. You must url encode your Azure account key before authenticating to Azure Storage. For more information, see [Authentication - Azure Storage]({% link {{ page.version.version }}/cloud-storage-authentication.md %}#azure-blob-storage-specified-authentication).AZURE_ENVIRONMENT
: (optional) {% include {{ page.version.version }}/misc/azure-env-param.md %}AZURE_CLIENT_ID
: Application (client) ID for your App Registration.AZURE_CLIENT_SECRET
: Client credentials secret generated for your App Registration.AZURE_TENANT_ID
: Directory (tenant) ID for your App Registration.
{% include {{ page.version.version }}/backups/azure-storage-tier-support.md %}
Note: {% include {{ page.version.version }}/misc/azure-blob.md %}
Google Cloud Storage | gs
| Bucket name | AUTH
: implicit
, or specified
(default: specified
); CREDENTIALS
[ASSUME_ROLE
]({% link {{ page.version.version }}/cloud-storage-authentication.md %}#set-up-google-cloud-storage-assume-role) (optional): Pass the service account name of the service account to assume.
For more information, see [Authentication - Google Cloud Storage]({% link {{ page.version.version }}/cloud-storage-authentication.md %}#google-cloud-storage-specified).
HTTP | file-http(s)
/ http(s)
| Remote host | N/A
Note: Using http(s)
without the file-
prefix is deprecated as a [changefeed sink]({% link {{ page.version.version }}/changefeed-sinks.md %}) scheme. There is continued support for http(s)
, but it will be removed in a future release. We recommend implementing the file-http(s)
scheme for changefeed messages.
For more information, refer to [Authentication - HTTP]({% link {{ page.version.version }}/cloud-storage-authentication.md %}#http-authentication).
NFS/Local 1 | nodelocal
| nodeID
2 (see Example file URLs) | N/A
S3-compatible services | s3
| Bucket name | {{site.data.alerts.callout_danger}} While Cockroach Labs actively tests Amazon S3, Google Cloud Storage, and Azure Storage, we do not test S3-compatible services (e.g., MinIO, Red Hat Ceph).{{site.data.alerts.end}}AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, AWS_SESSION_TOKEN
, AWS_REGION
3 (optional), AWS_ENDPOINT
For more information, see [Authentication - S3-compatible services]({% link {{ page.version.version }}/cloud-storage-authentication.md %}#s3-compatible-services-authentication).
{{site.data.alerts.callout_success}} The location parameters often contain special characters that need to be URI-encoded. Use Javascript's encodeURIComponent function or Go language's url.QueryEscape function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters. {{site.data.alerts.end}}
{{site.data.alerts.callout_info}}
You can disable the use of implicit credentials when accessing external cloud storage services for various operations by using the [--external-io-disable-implicit-credentials
flag]({% link {{ page.version.version }}/cockroach-start.md %}#security).
{{site.data.alerts.end}}
1 The file system backup location on the NFS drive is relative to the path specified by the [--external-io-dir
]({% link {{ page.version.version }}/cockroach-start.md %}#flags-external-io-dir) flag set while [starting the node]({% link {{ page.version.version }}/cockroach-start.md %}). If the flag is set to disabled
, then imports from local directories and NFS drives are disabled.
2 Using a nodeID
is required and the data files will be in the extern
directory of the specified node. In most cases (including single-node clusters), using nodelocal://1/<path>
is sufficient. If every node has the [--external-io-dir
]({% link {{ page.version.version }}/cockroach-start.md %}#flags-external-io-dir) flag pointed to a common NFS mount, or other form of network-backed, shared, or synchronized storage, you can use the word self
instead of a node ID to indicate that each node should write individual data files to its own extern
directory.
3 The AWS_REGION
parameter is optional since it is not a required parameter for most S3-compatible services. Specify the parameter only if your S3-compatible service requires it.
Example URLs for [BACKUP
]({% link {{ page.version.version }}/backup.md %}), [RESTORE
]({% link {{ page.version.version }}/restore.md %}), or [EXPORT
]({% link {{ page.version.version }}/export.md %}) given a bucket or container name of acme-co
and an employees
subdirectory:
Location | Example
-------------+----------------------------------------------------------------------------------
Amazon S3 | s3://acme-co/employees?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456
Azure Blob Storage | azure-blob://acme-co/employees?AUTH=specified&AZURE_ACCOUNT_NAME={account name}&AZURE_CLIENT_ID={client ID}&AZURE_CLIENT_SECRET={client secret}&AZURE_TENANT_ID={tenant ID}
Google Cloud Storage | gs://acme-co/employees?AUTH=specified&CREDENTIALS=encoded-123
NFS/Local | nodelocal://1/path/employees
For detail on forming the URLs and the different authentication methods, refer to the [Cloud Storage Authentication]({% link {{ page.version.version }}/cloud-storage-authentication.md %}) page.
Example URLs for [IMPORT INTO
]({% link {{ page.version.version }}/import-into.md %}) and given a bucket or container name of acme-co
and a filename of employees
:
Location | Example
-------------+----------------------------------------------------------------------------------
Amazon S3 | s3://acme-co/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456
Azure Blob Storage | azure-blob://acme-co/employees.sql?AUTH=specified&AZURE_ACCOUNT_NAME={account name}&AZURE_CLIENT_ID={client ID}&AZURE_CLIENT_SECRET={client secret}&AZURE_TENANT_ID={tenant ID}
Google Cloud Storage | gs://acme-co/employees.sql?AUTH=specified&CREDENTIALS=encoded-123
HTTP | http://localhost:8080/employees.sql
NFS/Local | nodelocal://1/path/employees
Example URLs for [CREATE CHANGEFEED
]({% link {{ page.version.version }}/create-changefeed.md %}):
{% include {{ page.version.version }}/cdc/list-cloud-changefeed-uris.md %}
{{site.data.alerts.callout_info}}
HTTP storage can only be used for [IMPORT INTO
]({% link {{ page.version.version }}/import-into.md %}) and [CREATE CHANGEFEED
]({% link {{ page.version.version }}/create-changefeed.md %}).
{{site.data.alerts.end}}
Transport Layer Security (TLS) is used for encryption in transit when transmitting data to or from Amazon S3, Google Cloud Storage, and Azure.
For encryption at rest, if your cloud provider offers transparent data encryption, you can use that to ensure that your backups are not stored on disk in cleartext.
CockroachDB also provides client-side encryption of backup data, for more information, see [Take and Restore Encrypted Backups]({% link {{ page.version.version }}/take-and-restore-encrypted-backups.md %}).
This section describes the minimum permissions required to run CockroachDB operations. While we provide the required permissions for Amazon S3 and Google Cloud Storage, the provider's documentation provides detail on the setup process and different options regarding access management.
Depending on the actions an operation performs, it will require different access permissions to a cloud storage bucket.
This table outlines the actions that each operation performs against the storage bucket:
Operation | Permission | Description |
---|---|---|
Backup | Write | Backups write the backup data to the bucket/container. During a backup job, a BACKUP CHECKPOINT file will be written that tracks the progress of the backup. |
Get | Backups need get access after a pause to read the checkpoint files on resume. | |
List | Backups need list access to the files already in the bucket. For example, BACKUP uses list to find previously taken backups when executing an incremental backup and to find the latest checkpoint file. |
|
Delete (optional) | To clean up BACKUP CHECKPOINT files that the backup job has written, you need to also include a delete permission in your bucket policy (e.g., s3:DeleteObject ). However, delete is not necessary for backups to complete successfully in v22.1 and later. |
|
Restore | Get | Restores need access to retrieve files from the backup. Restore also requires access to the LATEST file in order to read the latest available backup. |
List | Restores need list access to the files already in the bucket to find other backups in the backup collection. This contains metadata files that describe the backup, the LATEST file, and other versioned subdirectories and files. |
|
Import | Get | Imports read the requested file(s) from the storage bucket. |
Export | Write | Exports need write access to the storage bucket to create individual export file(s) from the exported data. |
Enterprise changefeeds | Write | Changefeeds will write files to the storage bucket that contain row changes and resolved timestamps. |
These actions are the minimum access permissions to be set in an Amazon S3 bucket policy:
Operation | S3 permission
-------------+----------------------------------------------------------------------------------
Backup | s3:PutObject
, s3:GetObject
, s3:ListBucket
Restore | s3:GetObject
, s3:ListBucket
Import | s3:GetObject
Export | s3:PutObject
Enterprise Changefeeds | s3:PutObject
See Policies and Permissions in Amazon S3 for detail on setting policies and permissions in Amazon S3.
An example S3 bucket policy for a backup:
{
"Version": "2012-10-17",
"Id": "Example_Policy",
"Statement": [
{
"Sid": "ExampleStatement01",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{ACCOUNT_ID}:user/{USER}"
},
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::{BUCKET_NAME}",
"arn:aws:s3:::{BUCKET_NAME}/*"
]
}
]
}
In Google Cloud Storage, you can grant users roles that define their access level to the storage bucket. For the purposes of running CockroachDB operations to your bucket, the following table lists the permissions that represent the minimum level required for each operation. GCS provides different levels of granularity for defining the roles in which these permissions reside. You can assign roles that already have these permissions configured, or make your own custom roles that include these permissions.
For more detail about Predefined, Basic, and Custom roles, see IAM roles for Cloud Storage.
Operation | GCS Permission
-------------+----------------------------------------------------------------------------------
Backup | storage.objects.create
, storage.objects.get
, storage.objects.list
Restore | storage.objects.get
, storage.objects.list
Import | storage.objects.get
Export | storage.objects.create
Changefeeds | storage.objects.create
For guidance on adding a user to a bucket's policy, see Add a principal to a bucket-level policy.
To complete a backup successfully, BACKUP
requires [read and write permissions]({% link {{ page.version.version }}/backup.md %}#required-privileges) to cloud storage buckets. Delete and overwrite permissions are not required. As a result, you can write backups to cloud storage buckets with object locking enabled. This allows you to store backup data using a write-once-read-many (WORM) model, which refers to storage that prevents any kind of deletion, encryption or modification to the objects once written.
{{site.data.alerts.callout_info}} We recommend enabling object locking in cloud storage buckets to protect the validity of a backup for restores. {{site.data.alerts.end}}
For specific cloud-storage provider documentation, see the following:
- AWS S3 Object Lock
- Retention policies and Bucket Lock in Google Cloud Storage
- Immutable storage in Azure Storage
When storing objects in Amazon S3 buckets during [backups]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}), [exports]({% link {{ page.version.version }}/export.md %}), and [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}), you can specify the S3_STORAGE_CLASS={class}
parameter in the URI to configure a storage class type.
The following S3 connection URI uses the INTELLIGENT_TIERING
storage class:
's3://{BUCKET NAME}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}&S3_STORAGE_CLASS=INTELLIGENT_TIERING'
While Cockroach Labs supports configuring an AWS storage class, we only test against S3 Standard. We recommend implementing your own testing with other storage classes.
[Incremental backups]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#incremental-backups) are not compatible with the S3 Glacier Flexible Retrieval or Glacier Deep Archive storage classes. Incremental backups require the reading of previous backups on an ad-hoc basis, which is not possible with backup files already in Glacier Flexible Retrieval or Glacier Deep Archive. This is because these storage classes do not allow immediate access to an S3 object without first restoring the archived object to its S3 bucket.
Refer to the AWS documentation on Restoring an archived object for steps.
When you are restoring archived backup files from Glacier Flexible Retrieval or Glacier Deep Archive back to an S3 bucket, you must restore both the full backup and incremental backup layers for that backup. By default, CockroachDB stores the incremental backup layers in a separate top-level directory at the backup's storage location. Refer to [Backup collections]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#backup-collections) for detail on the backup directory structure at its storage location.
Once you have restored all layers of a backup's archived files back to its S3 bucket, you can then [restore]({% link {{ page.version.version }}/restore.md %}) the backup to your CockroachDB cluster.
This table lists the valid CockroachDB parameters that map to an S3 storage class:
CockroachDB parameter | AWS S3 storage class
----------------------+--------------------------
STANDARD
| S3 Standard
REDUCED_REDUNDANCY
| Reduced redundancy Note: Amazon recommends against using this storage class.
STANDARD_IA
| Standard Infrequent Access
ONEZONE_IA
| One Zone Infrequent Access
INTELLIGENT_TIERING
| Intelligent Tiering
GLACIER
| Glacier Flexible Retrieval
DEEP_ARCHIVE
| Glacier Deep Archive
OUTPOSTS
| Outpost
GLACIER_IR
| Glacier Instant Retrieval
You can view an object's storage class in the Amazon S3 Console from the object's Properties tab. Alternatively, use the AWS CLI to list objects in a bucket, which will also display the storage class:
aws s3api list-objects-v2 --bucket {bucket-name}
{
"Key": "2022/05/02-180752.65/metadata.sst",
"LastModified": "2022-05-02T18:07:54+00:00",
"ETag": "\"c0f499f21d7886e4289d55ccface7527\"",
"Size": 7865,
"StorageClass": "STANDARD"
},
...
"Key": "2022-05-06/202205061217256387084640000000000-1b4e610c63535061-1-2-00000000-
users-7.ndjson",
"LastModified": "2022-05-06T12:17:26+00:00",
"ETag": "\"c60a013619439bf83c505cb6958b55e2\"",
"Size": 94596,
"StorageClass": "INTELLIGENT_TIERING"
},
For a specific operation, see the following examples:
- [Back up with an S3 storage class]({% link {{ page.version.version }}/backup.md %}#back-up-with-an-s3-storage-class)
- [Create a changefeed with an S3 storage class]({% link {{ page.version.version }}/create-changefeed.md %}#create-a-changefeed-with-an-s3-storage-class)
- [Export tabular data with an S3 storage class]({% link {{ page.version.version }}/export.md %}#export-tabular-data-with-an-s3-storage-class)
- [
BACKUP
]({% link {{ page.version.version }}/backup.md %}) - [
RESTORE
]({% link {{ page.version.version }}/restore.md %}) - [
IMPORT INTO
]({% link {{ page.version.version }}/import-into.md %}) - [
EXPORT
]({% link {{ page.version.version }}/export.md %}) - [
CREATE CHANGEFEED
]({% link {{ page.version.version }}/create-changefeed.md %}) - [Cluster Settings]({% link {{ page.version.version }}/cluster-settings.md %})