Get started with Batch

This page describes how to get started with Batch for Google Cloud.

Overview

Batch is a fully managed service that lets you schedule, queue, and execute batch processing workloads on Google Cloud resources. For example, consider using Batch for high performance computing (HPC), machine learning (ML), and data-processing workloads. Batch provisions resources and manages capacity on your behalf, allowing your batch workloads to run at scale.

Using Batch, you don't need to configure and manage third-party job schedulers, provision and deprovision resources, or request resources one zone at a time. To run a job, you specify parameters for the resources required for your workload, then Batch obtains resources and queues the job for execution. Batch provides native integration with other Google Cloud services to aid in the scheduling, execution, storage, and analysis of batch jobs, so you can focus on submitting a job and consuming the results.

Batch consists of the following components:

  • Job: a scheduled program that runs a set of tasks to completion without any user interaction, typically for computational workloads. For example, a job might be a single shell script or a complex, multipart computation.

    Specifically, a Batch job represents an array of one or more tasks and the environment to run those tasks in. You define the program for the job as a sequence of one or more runnables. Each task runs the sequence of runnables on the job's resources. You can configure a job's tasks to run in parallel or sequentially.

  • Task: one execution of a job's sequence of runnables. When a job has multiple tasks, you specify how you want each execution to vary by referencing the environment variable for a task's index in the job's runnables.

  • Runnable: an executable script or container that you define as part of a job.

  • Resources: the infrastructure needed to run a job. At minimum, you only need to specify the compute resources required per task: CPU cores, memory, and (if required) extra boot disk storage. Optionally, you can also specify other resource options for the job. Batch automatically creates and deletes the resources that meet your specifications during the job's run time.

    Specifically, each Batch job runs on a regional managed instance group (MIG), which is a group of one or more matching Compute Engine virtual machine (VM) instances that are each located in one of the included zones. Each VM has dedicated hardware for CPU cores and memory—which affect the performance of your job—and a boot disk—which stores an operating system (OS) image and instructions for running your job. If specified, a job might also include or access additional resources—for example, GPUs or external storage volumes. The number of VMs provisioned for a job depends on the hardware requirements and task scheduling options that you specify.

In summary, Batch lets you create and run jobs that each automatically provision and utilize the resources required to execute its tasks.

For more information about job behavior and options, see the Job creation and execution overview page.

Pricing

There is no additional cost for using Batch. You are only charged for the cost of the underlying resources required to execute your jobs.

For more information about the costs associated with Batch and how to filter Cloud Billing reports to view Batch costs, see Pricing.

Restrictions

Batch has the following restrictions:

  • You cannot exceed the Batch quotas and limits for your project.
  • You can only specify one machine type, which can be predefined or custom, per job.
  • You cannot specify more than one task group per job. All jobs have only one task group named group0.

Prerequisites

To start using Batch, complete the following prerequisites:

  1. If your project has not used Batch before, enable Batch for your project.
  2. Set up Batch for each new user.

Enable Batch for a project

To start using Batch with a project, do the following:

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  2. Make sure that billing is enabled for your Google Cloud project.

  3. Make sure that Batch is enabled for your project:

    1. Enable the APIs for Batch using the Google Cloud console or the Google Cloud CLI.

      Console

      Enable the Batch, Compute Engine, and Cloud Logging APIs.

      Enable the APIs

      gcloud

      Enable the Batch, Compute Engine, and Cloud Logging APIs:

      gcloud services enable batch.googleapis.com compute.googleapis.com logging.googleapis.com
    2. Make sure your project has one or more service accounts with the necessary permissions to create and access the resources for running Batch jobs.

      Importantly, any users that you want to allow to create and run Batch jobs in your project must be granted permission to use one of these service accounts. So, don't select service accounts that have more permissions than you want to grant to these users.

      The service account each job uses by default is the Compute Engine default service account, but you can also customize which service account a job uses.

      Unless you plan to use only the default configuration for your job's service accounts, grant the following IAM roles to any service accounts that your project uses for Batch jobs.

      • Batch Agent Reporter (roles/batch.agentReporter) on the project
      • To let jobs generate logs in Cloud Logging: Logs Writer (roles/logging.logWriter) on the project

      The service account that you use for a job might also require additional roles based on the configuration options that you want to use for your jobs. Refer to the documentation for each task to see if additional permissions are required. For more information about granting roles to service accounts, see Restricting service accounts and Manage access to service accounts.

    3. Make sure you are familiar with your project's Batch service agent:

      After you create a Batch job, the Batch service agent is automatically created for your project with the following name:

      service-PROJECT_NUMBER@gcp-sa-cloudbatch.iam.gserviceaccount.com
      

      Replace PROJECT_NUMBER with the project number of your project.

      The Batch service agent is automatically granted the Google Batch Service Agent (roles/batch.serviceAgent) IAM role. This configuration is required for your project to use Batch.

      However, certain use cases—for example, running a job on a Shared VPC network—require you to grant additional permissions to your project's Batch service agent.

      For more information, see Service agents.

Set up Batch for a new user

To start using Batch as a user, do the following:

  1. To get the permissions that you need to use Batch, ask your administrator to grant you the required IAM roles on the project. Refer to the documentation for each task to see its required permissions.

    For example, if you want to start learning how to use Batch by creating a basic job, consider requesting roles for the following tasks:

    • To create jobs:
    • To list and describe jobs: Batch Job Editor (roles/batch.jobsEditor) or Batch Job Viewer (roles/batch.jobsViewer) on the project
    • To view logs for jobs: Logs Viewer (roles/logging.viewer) on the project
    • To delete jobs: Batch Job Editor (roles/batch.jobsEditor) on the project

    For more information about granting roles, see Manage access.

  2. If you want to use the command-line examples for Batch, set up the Google Cloud CLI by doing the following. Learn more about authentication for the Google Cloud CLI.

    1. Install the Google Cloud CLI, then initialize it by running the following command:

      gcloud init
    2. Recommended: Set a default project using the gcloud config set project command:

      gcloud config set project PROJECT_ID
      

      Replace PROJECT_ID with the project ID of your project.

  3. If you want to use the API examples or client library examples for Batch, see Authenticate to Batch.

Get support

You can discuss Batch with the community by using the Batch label on Cloud Forums.

If you have issues with Batch, see the troubleshooting documentation.

To get support or provide feedback for Batch, use the following resources:

  • For billing issues with Google Cloud, contact Billing support.

  • If you have a paid support package, contact Google Cloud Support directly for issues with Batch.

    Google Cloud offers different support packages to meet different needs, such as 24/7 coverage, phone support, and access to a technical support manager. For more information, see Google Cloud Support.

  • To provide any feedback or feature requests for Batch, or to report issues for Batch without a paid support package, click the Send feedback button, which you can find at the beginning and end of each Batch documentation page. Then, select one of the following:

    • For feedback related to Batch documentation, select "Documentation feedback."
    • For all other feedback about Batch, select "Product feedback."

What's next