Rescue an inaccessible VM


If your Linux VM is inaccessible due to any reason, you can try rescue the VM using the following steps.

Required roles

To get the permissions that you need to rescue a VM, ask your administrator to grant you the following IAM roles on the project:

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to rescue a VM. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to rescue a VM:

  • compute.instances.create on project
  • compute.disks.create on project
  • compute.instances.get on project
  • compute.disks.createSnapshot on disks
  • compute.instances.attachDisk on new VM
  • compute.disks.use on disk
  • compute.instances.start on new and inaccessible VM
  • compute.instances.stop on new and inaccessible VM

You might also be able to get these permissions with custom roles or other predefined roles.

Rescue a VM

If you can't connect to your VM, or your boot disk is full, you must create a temporary VM to rescue the inaccessible VM.

  1. (Optional) Stop the inaccessible VM.
  2. Create a snapshot from the boot disk of the inaccessible VM.
  3. Create a temporary VM using a public image closest to inaccessible VM's OS.
  4. Check if you're able to connect to the temporary VM using SSH.
  5. Add an additional disk to the temporary VM by following these steps:

    1. In the Google Cloud console, go to the VM instances page.

      Go to VM instances

    2. Click the name of the temporary VM that you created.

    3. Click Edit.

    4. Under Additional disks, click Add new disk, and then do the following:

      1. Add the disk name, like my-recovery-disk
      2. For Source type, select the Snapshot tab.
      3. In the Source snapshot drop-down menu, select the snapshot of the source VM that you created earlier in these steps.
      4. Click Done.
    5. Click Save.

  6. Connect to the temporary VM using SSH.

  7. Execute the following command:

    ls -l /dev/disk/by-id/google-*

    The output will be similar to the following:

      /dev/disk/by-id/google-my-vm -> ../../sda
      /dev/disk/by-id/google-my-vm-part1 -> ../../sda1
      /dev/disk/by-id/google-my-vm-part14 -> ../../sda14
      /dev/disk/by-id/google-my-vm-part15 -> ../../sda15
      /dev/disk/by-id/google-my-recovery-disk -> ../../sdb
      /dev/disk/by-id/google-my-recovery-disk-part1 -> ../../sdb1
      /dev/disk/by-id/google-my-recovery-disk-part2 -> ../../sdb2
      /dev/disk/by-id/google-my-recovery-disk-part5 -> ../../sdb5
    

    Use the symlinks (/dev/disk/by-id/google-my-recovery-disk-partN) to locate the underlying device and partitions for the newly added disk, for example, /dev/sdb1.

    The symlink for the disk is either google-DISK_NAME or, if you specified a custom device name for the disk, google-DEVICE_NAME. Make note of the device name that the new disk symlink points to.

  8. Create a mount point at /mnt/newdisk:

    sudo mkdir /mnt/newdisk
  9. Mount the additional disk partition to the mount point /mnt/newdisk:

     sudo mount -o discard,defaults DISK_NAME /mnt/newdisk

    Replace DISK_NAME with the device name that you noted earlier in these steps — for example /dev/sdb1.

    If you see the error Filesystem has duplicate UUID XXXXXX - can't mount, mount: /mnt/newdisk: wrong fs type, bad option or bad superblock on /dev/sdb, use the following command:

     sudo mount -o nouuid DISK_NAME /mnt/newdisk

    The inaccessible VM's file system is now mounted at /mnt/newdisk. You can navigate the file system, change config files, fix issues or retrieve the data.

Revert the changes and boot the inaccessible VM back

After the issue is fixed or data is retrieved, you need to bring back the actual VM. Use the following steps to restore the original VM:

  1. Unmount the additional disk which is mounted at /mnt/newdisk in the temporary VM:

     cd ~
     sudo umount /mnt/newdisk
  2. In the Google Cloud console, go to the VM instances page.

    Go to VM instances

    1. Select the temporary VM that you created.

    2. Click Edit.

    3. Under Additional disks, click for the disk created in earlier steps to detach the additional disk from the temporary VM.

    4. Click Save.

  3. Go to the VM instances page in the Google Cloud console.

    Go to VM instances

    1. If the inaccessible VM is still running, stop the VM.

    2. Click the name of the VM you just stopped, and then click Edit.

    3. Under Boot disk, click Detach book disk to detach the exiting boot disk from the inaccessible VM.

    4. Next, click CONFIGURE BOOT DISK to attach the disk you created and fixed previously in Rescue a VM on this page.

      1. In the Boot Disk section, click the Existing disks tab.
      2. In the drop-down list, select the disk that you created in the previous section, for example my-recovery-disk.
      3. Click Select and then click Save.
    5. Start the VM.

  4. You should now be able to connect to the VM using SSH.