Run TensorFlow code on TPU Pod slices
This document shows you how to perform a calculation using TensorFlow on a TPU Pod. You will perform the following steps:
- Create a TPU Pod slice with TensorFlow software
- Connect to the TPU VM using SSH
- Create and run an example script
The TPU VM relies on a Service Accounts
for permissions to call Cloud TPU API. By default, your TPU VM will use the
default Compute Engine service account
which includes all needed Cloud TPU permissions. If you use your own
service account you need to add the TPU Viewer
role to your service account. For more information on Google Cloud roles, see Understanding roles.
You can specify your own service account using the --service-account
flag when
creating your TPU VM.
Set up your environment
-
In the Cloud Shell, run the following command to make sure you are running the current version of
gcloud
:$ gcloud components update
If you need to install
gcloud
, use the following command:$ sudo apt install -y google-cloud-sdk
Create some environment variables:
$ export PROJECT_ID=project-id $ export TPU_NAME=tpu-name $ export ZONE=europe-west4-a $ export RUNTIME_VERSION=tpu-vm-tf-2.18.0-pod-pjrt $ export ACCELERATOR_TYPE=v3-32
Create a v3-32 TPU Pod slice with TensorFlow runtime
$ gcloud compute tpus tpu-vm create ${TPU_NAME}} \ --zone=${ZONE} \ --accelerator-type=${ACCELERATOR_TYPE} \ --version=${RUNTIME_VERSION}
Command flag descriptions
zone
- The zone where you plan to create your Cloud TPU.
accelerator-type
- The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
version
- The Cloud TPU software version.
Connect to your Cloud TPU VM using SSH
$ gcloud compute tpus tpu-vm ssh ${TPU_NAME} \ --zone=${ZONE}
Create and run an example script
Set the following environment variables.
(vm)$ export TPU_NAME=tpu-name (vm)$ export TPU_LOAD_LIBRARY=0
Create a file named
tpu-test.py
in the current directory and copy and paste the following script into it.import tensorflow as tf print("Tensorflow version " + tf.__version__) cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver() print('Running on TPU ', cluster_resolver.cluster_spec().as_dict()['worker']) tf.config.experimental_connect_to_cluster(cluster_resolver) tf.tpu.experimental.initialize_tpu_system(cluster_resolver) strategy = tf.distribute.experimental.TPUStrategy(cluster_resolver) @tf.function def add_fn(x,y): z = x + y return z x = tf.constant(1.) y = tf.constant(1.) z = strategy.run(add_fn, args=(x,y)) print(z)
Run this script with the following command:
(vm)$ python3 tpu-test.py
This script performs a calculation on a each TensorCore of a TPU Pod slice. The output will look similar to the following:
PerReplica:{ 0: tf.Tensor(2.0, shape=(), dtype=float32), 1: tf.Tensor(2.0, shape=(), dtype=float32), 2: tf.Tensor(2.0, shape=(), dtype=float32), 3: tf.Tensor(2.0, shape=(), dtype=float32), 4: tf.Tensor(2.0, shape=(), dtype=float32), 5: tf.Tensor(2.0, shape=(), dtype=float32), 6: tf.Tensor(2.0, shape=(), dtype=float32), 7: tf.Tensor(2.0, shape=(), dtype=float32), 8: tf.Tensor(2.0, shape=(), dtype=float32), 9: tf.Tensor(2.0, shape=(), dtype=float32), 10: tf.Tensor(2.0, shape=(), dtype=float32), 11: tf.Tensor(2.0, shape=(), dtype=float32), 12: tf.Tensor(2.0, shape=(), dtype=float32), 13: tf.Tensor(2.0, shape=(), dtype=float32), 14: tf.Tensor(2.0, shape=(), dtype=float32), 15: tf.Tensor(2.0, shape=(), dtype=float32), 16: tf.Tensor(2.0, shape=(), dtype=float32), 17: tf.Tensor(2.0, shape=(), dtype=float32), 18: tf.Tensor(2.0, shape=(), dtype=float32), 19: tf.Tensor(2.0, shape=(), dtype=float32), 20: tf.Tensor(2.0, shape=(), dtype=float32), 21: tf.Tensor(2.0, shape=(), dtype=float32), 22: tf.Tensor(2.0, shape=(), dtype=float32), 23: tf.Tensor(2.0, shape=(), dtype=float32), 24: tf.Tensor(2.0, shape=(), dtype=float32), 25: tf.Tensor(2.0, shape=(), dtype=float32), 26: tf.Tensor(2.0, shape=(), dtype=float32), 27: tf.Tensor(2.0, shape=(), dtype=float32), 28: tf.Tensor(2.0, shape=(), dtype=float32), 29: tf.Tensor(2.0, shape=(), dtype=float32), 30: tf.Tensor(2.0, shape=(), dtype=float32), 31: tf.Tensor(2.0, shape=(), dtype=float32) }
Clean up
When you are done with your TPU VM follow these steps to clean up your resources.
Disconnect from the Compute Engine:
(vm)$ exit
Delete your Cloud TPU.
$ gcloud compute tpus tpu-vm delete ${TPU_NAME} \ --zone=${ZONE}
Verify the resources have been deleted by running the following command. Make sure your TPU is no longer listed. The deletion might take several minutes.
$ gcloud compute tpus tpu-vm list \ --zone=${ZONE}