ImageGCFailed, unable to delete images and reclaim disk.

**Kubernetes version**: 1.6.2 master and nodes
**Environment**: GKE
**What happened**: The following has been happening for the last week or two.

I noticed loads of pods being evicted with the following message from `kubectl describe`:
```
Node:		gke-prow-build-pool-a89df2af-4bc8/
Status:		Failed
Reason:		Evicted
Message:	The node was low on resource: nodefs.
```
The node shows ready but also that it has disk pressure from `kubectl get no`:
```yaml
status:
  conditions:
  - lastHeartbeatTime: 2017-05-09T17:38:33Z
    lastTransitionTime: 2017-05-09T15:43:02Z
    message: kubelet has disk pressure
    reason: KubeletHasDiskPressure
    status: "True"
    type: DiskPressure
  - lastHeartbeatTime: 2017-05-09T17:38:33Z
    lastTransitionTime: 2017-05-09T01:07:06Z
    message: kubelet is posting ready status. AppArmor enabled
    reason: KubeletReady
    status: "True"
    type: Ready
```
`kubectl describe no` shows lots of `ImageGCFailed`.
```
  FirstSeen	LastSeen	Count	From						SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----						-------------	--------	------			-------
  4h		19s		564	kubelet, gke-prow-build-pool-a89df2af-4bc8			Warning		EvictionThresholdMet	Attempting to reclaim nodefs
  4h		18s		54	kubelet, gke-prow-build-pool-a89df2af-4bc8			Warning		ImageGCFailed		(events with common reason combined)

```
Kubelet logs show that it's failing to delete the images and free up disk space. For each image it shows this every 10 seconds:
```
A  I0509 17:59:31.183907    1453 image_gc_manager.go:335] [imageGCManager]: Removing image "sha256:fa60023475d842a7a62d38fa27a0d3f6fd672be5ea1f09e6d07f8459d2c0c60a" to free 1105710474 bytes 
A  E0509 17:59:31.186643    1453 remote_image.go:124] RemoveImage "sha256:fa60023475d842a7a62d38fa27a0d3f6fd672be5ea1f09e6d07f8459d2c0c60a" from image service failed: rpc error: code = 2 desc = Error response from daemon: conflict: unable to delete fa60023475d8 (must be forced) - image is being used by stopped container 8641d5395d30 
A  E0509 17:59:31.186705    1453 kuberuntime_image.go:126] Remove image "sha256:fa60023475d842a7a62d38fa27a0d3f6fd672be5ea1f09e6d07f8459d2c0c60a" failed: rpc error: code = 2 desc = Error response from daemon: conflict: unable to delete fa60023475d8 (must be forced) - image is being used by stopped container 8641d5395d30 
```

**What you expected to happen**:

I would be happy if the node were marked unschedulable when it's out of disk. I would also be happy if the images successfully clean up. As it is, the node just evicts any pod that attempts to run on it.

**How to reproduce it**:

I don't know how to reproduce from scratch, but I've cordoned this node and can give access to someone for debugging.

Please let me know if you need more information, and apologies if this is a dupe.

cc @kubernetes/sig-node-bugs 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ImageGCFailed, unable to delete images and reclaim disk. #45558

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ImageGCFailed, unable to delete images and reclaim disk. #45558

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions