aptos-monitoring

A monitoring solution for Aptos nodes utilizing docker containers with Prometheus, Grafana, cAdvisor, NodeExporter, and alerting with AlertManager.

Thank you to the Rhino Stake team for their excellent dashboard!

Install

Clone this repository on your Docker host, cd into aptos-monitoring directory and run compose up:

git clone https://github.com/LavenderFive/aptos-monitoring
cd aptos-monitoring

ADMIN_USER=admin ADMIN_PASSWORD=admin ADMIN_PASSWORD_HASH=JDJhJDE0JE91S1FrN0Z0VEsyWmhrQVpON1VzdHVLSDkyWHdsN0xNbEZYdnNIZm1pb2d1blg4Y09mL0ZP docker-compose up -d

Caddy v2 does not accept plaintext passwords. It MUST be provided as a hash value. The above password hash corresponds to ADMIN_PASSWORD 'admin'. To know how to generate hash password, refer Updating Caddy to v2

Prerequisites:

Docker Engine >= 1.13
Docker Compose >= 1.11

Containers:

Prometheus (metrics database) http://<host-ip>:9090
Prometheus-Pushgateway (push acceptor for ephemeral and batch jobs) http://<host-ip>:9091
AlertManager (alerts management) http://<host-ip>:9093
Alertmanager-discord (disabled by default) http://<host-ip>:9094
Grafana (visualize metrics) http://<host-ip>:3000
- Infinity Plugin
NodeExporter (host metrics collector)
cAdvisor (containers metrics collector)
Caddy (reverse proxy and basic auth provider for prometheus and alertmanager)

Setup Grafana

Aptos Grafana Dashboard

This monitoring solution comes built in with Rhinostake's Aptos Monitoring dashboard, and will require all of its setup to work. Grafana, Prometheus, and Infinity are installed automatically, but setting up the Prometheus jobs is still necessary.

1. Create Persistent Storage

To support persistent storage, you'll first need to create the volume:

docker volume create grafana-storage

2. Prometheus Jobs

Add your node endpoints under /prometheus/prometheus.yaml.

3. Checkly Integation (optional)

Uncomment the checkly block under/prometheus/prometheus.yaml
Follow the steps outlined by Checkly for Prometheus Integration

Navigate to http://<host-ip>:3000 and login with user admin password admin. You can change the credentials in the compose file or by supplying the ADMIN_USER and ADMIN_PASSWORD environment variables on compose up. The config file can be added directly in grafana part like this

grafana:
  image: grafana/grafana:7.2.0
  env_file:
    - config

and the config file format should have this content

GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=changeme
GF_USERS_ALLOW_SIGN_UP=false

If you want to change the password, you have to remove this entry, otherwise the change will not take effect

- grafana_data:/var/lib/grafana

Grafana is preconfigured with dashboards and Prometheus as the default data source:

Name: Prometheus
Type: Prometheus
Url: http://prometheus:9090
Access: proxy

Monitor Services Dashboard

The Monitor Services Dashboard shows key metrics for monitoring the containers that make up the monitoring stack:

Prometheus container uptime, monitoring stack total memory usage, Prometheus local storage memory chunks and series
Container CPU usage graph
Container memory usage graph
Prometheus chunks to persist and persistence urgency graphs
Prometheus chunks ops and checkpoint duration graphs
Prometheus samples ingested rate, target scrapes and scrape duration graphs
Prometheus HTTP requests graph
Prometheus alerts graph

Define alerts

Two alert groups have been setup within the alert.rules configuration file:

Monitoring services alerts targets
Aptos alerts aptos

You can modify the alert rules and reload them by making a HTTP POST call to Prometheus:

curl -X POST http://admin:admin@<host-ip>:9090/-/reload

Monitoring services alerts

Trigger an alert if any of the monitoring targets (node-exporter and cAdvisor) are down for more than 30 seconds:

- alert: monitor_service_down
    expr: up == 0
    for: 30s
    labels:
      severity: critical
    annotations:
      summary: "Monitor service non-operational"
      description: "Service {{ $labels.instance }} is down."

Aptos alerts

Trigger an alert if any of the mainnet Aptos nodes fall out of sync for 30 seconds.

  - alert: node_not_syncing
  expr: avg(increase(aptos_state_sync_version{chain="mainnet", type="synced"}[30s])) < 1
  for: 15s
  labels:
    severity: critical
  annotations:
    summary: "Aptos Node Not Syncing"
    description: "Service {{ $labels.job }} {{ $labels.chain }} is not syncing."

Setup alerting

The AlertManager service is responsible for handling alerts sent by Prometheus server. AlertManager can send notifications via email, Pushover, Slack, HipChat or any other system that exposes a webhook interface. A complete list of integrations can be found here.

You can view and silence notifications by accessing http://<host-ip>:9093.

The notification receivers can be configured in alertmanager/config.yml file.

To receive alerts via Slack you need to make a custom integration by choose incoming web hooks in your Slack team app page. You can find more details on setting up Slack integration here.

Copy the Slack Webhook URL into the api_url field and specify a Slack channel.

route:
    receiver: 'slack'

receivers:
    - name: 'slack'
      slack_configs:
          - send_resolved: true
            text: "{{ .CommonAnnotations.description }}"
            username: 'Prometheus'
            channel: '#<channel>'
            api_url: 'https://hooks.slack.com/services/<webhook-id>'

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
alertmanager		alertmanager
caddy		caddy
grafana/provisioning		grafana/provisioning
helpers/aws		helpers/aws
loki		loki
prometheus		prometheus
screens		screens
LICENSE		LICENSE
README.md		README.md
config		config
docker-compose.exporters.yml		docker-compose.exporters.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aptos-monitoring

Install

Setup Grafana

Aptos Grafana Dashboard

1. Create Persistent Storage

2. Prometheus Jobs

3. Checkly Integation (optional)

Define alerts

Setup alerting

About

Releases

Packages

Contributors 2

License

LavenderFive/aptos-monitoring

Folders and files

Latest commit

History

Repository files navigation

aptos-monitoring

Install

Setup Grafana

Aptos Grafana Dashboard

1. Create Persistent Storage

2. Prometheus Jobs

3. Checkly Integation (optional)

Define alerts

Setup alerting

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages