Skip to content

Commit

Permalink
Document Draino metrics
Browse files Browse the repository at this point in the history
  • Loading branch information
Nic Cope committed Aug 28, 2018
1 parent 9137608 commit eb28ba8
Showing 1 changed file with 21 additions and 3 deletions.
24 changes: 21 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ wrong with a node - for instance by watching node logs or running a script. The
Cluster Autoscaler can be configured to delete nodes that are underutilised.
Adding Draino to the mix enables autoremediation:

1. The Node Problem Detector detects a permanent node issue and sets the
1. The Node Problem Detector detects a permanent node problem and sets the
corresponding node condition.
2. Draino notices the node condition. It immediately cordons the node to prevent
new pods being scheduled there, and schedules a drain of the node.
Expand Down Expand Up @@ -43,7 +43,7 @@ Flags:
grace period for it to have been deleted.
--drain-buffer=10m0s Minimum time between starting each drain. Nodes
are always cordoned immediately.
--node-label=KEY=VALUE ...
--node-label=KEY=VALUE ...
Only nodes with this label will be eligible for
cordoning and draining. May be specified multiple
times.
Expand All @@ -56,4 +56,22 @@ Args:
## Deployment
Draino is automatically built from master and pushed to the [Docker Hub](https://hub.docker.com/r/negz/draino/).
Builds are tagged `negz/draino:latest` and `negz/drain:$(git rev-parse --short HEAD)`.
An [example Kubernetes deployment manifest](manifest.yml) is provided.
An [example Kubernetes deployment manifest](manifest.yml) is provided.

## Monitoring
Draino provides a simple healthcheck endpoint at `/healthz` and Prometheus
metrics at `/metrics`. The following metrics exist:

```bash
$ kubectl -n kube-system exec -it ${DRAINO_POD} -- apk add curl
$ kubectl -n kube-system exec -it ${DRAINO_POD} -- curl http://localhost:10002/metrics
# HELP draino_nodes_cordoned Number of nodes cordoned.
# TYPE draino_nodes_cordoned counter
draino_nodes_cordoned{node_name="coolnode",result="succeeded"} 1
draino_nodes_cordoned{node_name="ambivalentnode",result="succeeded"} 1
draino_nodes_cordoned{node_name="lamenode",result="failed"} 1
# HELP draino_nodes_drained Number of nodes drained.
# TYPE draino_nodes_drained counter
draino_nodes_drained{node_name="coolnode",result="succeeded"} 1
draino_nodes_drained{node_name="ambivalentnode",result="failed"} 1
```

0 comments on commit eb28ba8

Please sign in to comment.