Skip to content

Add pulse checks for ensuring disk health on VmHosts #2202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 25, 2025

Conversation

mohi-kalantari
Copy link
Contributor

Multiple checks using different linux tools are added to determine the health of the disks. The code also supports both raided and not raided VmHosts

Copy link

github-actions bot commented Nov 6, 2024

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@mohi-kalantari mohi-kalantari changed the title Add pulse checks for ensuring disk health on VmHosts WIP: Add pulse checks for ensuring disk health on VmHosts Nov 6, 2024
@mohi-kalantari mohi-kalantari force-pushed the vmh-disk-monitoring branch 3 times, most recently from cd3bc90 to e2bd310 Compare November 8, 2024 08:55
@aamederen
Copy link
Contributor

Since this is also a warm-up, it could be nice to split this change into multiple commits and explain each change in the commit messages. The migration could be one commit, population of the UNIX devices could be another and finally the change in the health check logic, each with their own specs.

@mohi-kalantari mohi-kalantari force-pushed the vmh-disk-monitoring branch 2 times, most recently from 453f521 to e1c64f0 Compare November 12, 2024 11:44
@mohi-kalantari
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@mohi-kalantari
Copy link
Contributor Author

recheck

@mohi-kalantari mohi-kalantari force-pushed the vmh-disk-monitoring branch 2 times, most recently from 17deb53 to a928c4a Compare November 12, 2024 12:27
Copy link
Contributor

@aamederen aamederen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I would however like to have one more eye on this

@aamederen aamederen requested a review from a team November 13, 2024 11:53
Copy link
Collaborator

@fdr fdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at a minimum, two things:

  • If storing an array, why not use either a text array, or even more likely, a jsonb
  • The migration should be broken out from other application changes to not make rollbacks fussy.

@fdr fdr requested review from jeremyevans and byucesoy November 19, 2024 02:44
@mohi-kalantari
Copy link
Contributor Author

I'll also fix the commits and separate the migration commit and others when review is done. thanks for pointing out. @fdr

Copy link
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I only have one remaining comment.

Copy link
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous comment was addressed. One additional comment.

@mohi-kalantari mohi-kalantari force-pushed the vmh-disk-monitoring branch 2 times, most recently from f00ed76 to df2f8ce Compare November 25, 2024 06:31
Copy link
Member

@byucesoy byucesoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks OK IMO.

@mohi-kalantari, do we have any blocker for this PR, other than Daniel's comments?

@mohi-kalantari mohi-kalantari force-pushed the vmh-disk-monitoring branch 4 times, most recently from 69517bf to 76b482d Compare January 20, 2025 11:21
@mohi-kalantari mohi-kalantari changed the title WIP: Add pulse checks for ensuring disk health on VmHosts Add pulse checks for ensuring disk health on VmHosts Jan 24, 2025
@mohi-kalantari mohi-kalantari force-pushed the vmh-disk-monitoring branch 4 times, most recently from 1501ef3 to 6bc194f Compare February 24, 2025 13:06
In order to determine and save the disk devices of each VmHost
We store an array of strings representing the device names. Also
updated the schema caches.
Added functions for determining the disk devices and also
extracting the underlying device names of raided disks.
smartmontools is installed when bootstrapping a VmHost. It will be used
as a healthcheck for disk devices of VmHosts.
Multiple healthchecks were added to check disks using different tools
like smartctl, nvme cli, read/write on disk and reading kernel logs.
@mohi-kalantari mohi-kalantari merged commit 88cd29d into ubicloud:main Feb 25, 2025
3 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Feb 25, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants