Skip to content

Manage dead hosts that salt still knows about #149

@clarkperkins

Description

@clarkperkins

Narrative

Sometimes we run into an issue where salt still keeps track of keys for hosts that have died or been deleted (usually on spot clusters). This takes a big hit on performance when salt is provisioning or orchestrating, so it would be good if we can automatically delete these keys from salt. After talking with @WLPhoenix, we think it would be good if we can have a cron job that will periodically check the status of all salt minions and delete keys for minions that are non-responsive.

Development Tasks

Write a cron job that:

  1. Purges the keys for all hosts living in terminated, or deleted stacks
  2. For any stack that has all hosts dead, change the status to either terminated or stopped depending on the status of the hosts
  3. Attempt to re-launch & re-provision stacks which have some hosts that have died
  • Part 3 could be kind of hairy, because sometimes the hosts are still alive and just the minion daemon has died
  • This will never delete a stack from the stackdio database, only alter its status
  • We DO NOT want to delete keys for stopped stacks. When you try to start them again, they reuse the same keys. Salt cloud does not regenerate keys in this case.

Note: This would all be done using some combination salt-cloud -Q to check the status of the hosts and salt manage.down to see a list of unresponsive hosts.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions