Use CMEK with Dataproc Serverless

When you use Dataproc Serverless, data is stored on disks on the underlying serverless infrastructure and in a Cloud Storage staging bucket. This data is encrypted using a Google-generated data encryption key (DEK) and key encryption key (KEK). You can use a customer-managed encryption key (CMEK) to create, use, and revoke the key encryption key (KEK). Google retains control over the data encryption key (DEK). For more information on Google data encryption keys, see Default encryption at Rest.

Use CMEK

Follow the steps in this section to use CMEK to encrypt data that Dataproc Serverless writes to persistent disk and to the Dataproc staging bucket.

  1. Create a key using the Cloud Key Management Service (Cloud KMS).

  2. Copy the resource name.

    Copy the resource name.
    The resource name is is constructed as follows:

    projects/PROJECT_ID/locations/REGION/keyRings/KEY_RING_NAME/cryptoKeys/KEY_NAME
    
  3. Enable the Compute Engine, Dataproc, and Cloud Storage Service Agent service accounts to use your key:

    1. See Protect resources by using Cloud KMS keys > Required Roles to assign the Cloud KMS CryptoKey Encrypter/Decrypter role to the Compute Engine Service Agent service account. If this service account is not listed on the IAM page in Google Cloud console, click Include Google-provided role grants to list it.
    2. Assign the Cloud KMS CryptoKey Encrypter/Decrypter role to the Dataproc Service Agent service account. You can use the Google Cloud CLI to assign the role:

       gcloud projects add-iam-policy-binding KMS_PROJECT_ID \
       --member serviceAccount:service-PROJECT_NUMBER@dataproc-accounts.iam.gserviceaccount.com \
       --role roles/cloudkms.cryptoKeyEncrypterDecrypter
      

      Replace the following:

      KMS_PROJECT_ID: the ID of your Google Cloud project that runs Cloud KMS. This project can also be the project that runs Dataproc resources.

      PROJECT_NUMBER: the project number (not the project ID) of your Google Cloud project that runs Dataproc resources.

    3. Enable the Cloud KMS API on the project that runs Dataproc Serverless resources.

    4. If the Dataproc Service Agent role is not attached to the Dataproc Service Agent service account, then add the serviceusage.services.use permission to the custom role attached to the Dataproc Service Agent service account. If the Dataproc Service Agent role is attached to the Dataproc Service Agent service account, you can skip this step.

    5. Follow the steps to add your key on the bucket.

  4. When you submit a batch workload:

    1. Specify your key in the Batch kmsKey parameter.
    2. Specify the name of your Cloud Storage bucket in the Batch stagingBucket parameter.