Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc/cephfs: document purge queue and its perf counters #60794

Merged
merged 1 commit into from
Dec 30, 2024

Conversation

dparmar18
Copy link
Contributor

@dparmar18 dparmar18 commented Nov 21, 2024

Fixes: https://tracker.ceph.com/issues/68571

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@github-actions github-actions bot added cephfs Ceph File System documentation labels Nov 21, 2024
@dparmar18 dparmar18 marked this pull request as ready for review November 25, 2024 13:29
@dparmar18 dparmar18 requested a review from a team as a code owner November 25, 2024 13:29
@dparmar18 dparmar18 requested a review from a team November 25, 2024 13:29
@dparmar18
Copy link
Contributor Author

Request @ceph/cephfs for suggestions/corrections.

doc/cephfs/purge-queue.rst Outdated Show resolved Hide resolved
doc/cephfs/purge-queue.rst Outdated Show resolved Hide resolved
doc/cephfs/purge-queue.rst Outdated Show resolved Hide resolved
@vshankar vshankar self-assigned this Nov 26, 2024
doc/cephfs/purge-queue.rst Outdated Show resolved Hide resolved
doc/cephfs/purge-queue.rst Show resolved Hide resolved
doc/cephfs/purge-queue.rst Outdated Show resolved Hide resolved
@zdover23
Copy link
Contributor

jenkins test docs

@zdover23
Copy link
Contributor

@dparmar18, I'll take care of all the corrections and alterations. Thanks for this contribution!


MDS maintains a data structure known as **Purge Queue** which is responsible
for managing and executing the sequential deletion of files.
There is a Purge queue for every MDS rank. Purge queues consist of purge items
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first Purge here should be lowercased.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of stuff that needs to be rewritten here. I'm going to make a "purge queue" PR to hash out the purge queue definition, and I will also make a grammar-and-elegance pass. This suggestion will be incorporated into that latter PR. As always, good catch.

@dparmar18
Copy link
Contributor Author

From what I understand is - @zdover23 will take care of all the grammar and touch ups in this PR but just FYI we'd still need an approval from @ceph/cephfs to validate the content credibility.

@zdover23
Copy link
Contributor

From what I understand is - @zdover23 will take care of all the grammar and touch ups in this PR but just FYI we'd still need an approval from @ceph/cephfs to validate the content credibility.

@dparmar18, This is correct. If you get someone from CephFS to verify the technical accuracy of this content, I will merge this PR and raise a new PR in which I will clean the English and make sure that the RST file is ready to be backported to release branches. @vshankar, could you assign someone to check the technical accuracy of the information added to the docs in this PR?

@vshankar
Copy link
Contributor

From what I understand is - @zdover23 will take care of all the grammar and touch ups in this PR but just FYI we'd still need an approval from @ceph/cephfs to validate the content credibility.

@dparmar18, This is correct. If you get someone from CephFS to verify the technical accuracy of this content, I will merge this PR and raise a new PR in which I will clean the English and make sure that the RST file is ready to be backported to release branches. @vshankar, could you assign someone to check the technical accuracy of the information added to the docs in this PR?

I will have a look at this now.

@zdover23
Copy link
Contributor

From what I understand is - @zdover23 will take care of all the grammar and touch ups in this PR but just FYI we'd still need an approval from @ceph/cephfs to validate the content credibility.

@dparmar18, This is correct. If you get someone from CephFS to verify the technical accuracy of this content, I will merge this PR and raise a new PR in which I will clean the English and make sure that the RST file is ready to be backported to release branches. @vshankar, could you assign someone to check the technical accuracy of the information added to the docs in this PR?

I will have a look at this now.

Cool.


.. note:: Generally, the defaults are adequate for most clusters. However, in
case of huge clusters, if the need arises, values might be tuned to
4-5 times of the default value as a starting point and further
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4x-5x for all the above configurations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also provide a sample configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4x-5x for all the above configurations?

So, the users can start with the most basic one i.e. setting filer_max_purge_ops to 40-50 which mostly should work. If it doesn't then rest three mentioned above can be tuned to 4x-5x.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also provide a sample configuration.

I can but it's usually just one or two conf val changes (as I mentioned above). Do you want me to add example of how it's done in the doc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, I think so. Providing a sample would help users not to second guess. The configs may not be perfect for their cluster, but it provides an understanding of what all to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, I think so. Providing a sample would help users not to second guess. The configs may not be perfect for their cluster, but it provides an understanding of what all to change.

Added some examples, PTAL.

doc/cephfs/purge-queue.rst Outdated Show resolved Hide resolved

When a client requests deletion of a directory (say ``rm -rf``):

- MDS queues the files and subdirectories (purge items) from journal in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there is a purge queue journal (pq). Its a bit unclear which journal is being referred to here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The journal here comes from osdc/Journaler.h". Since this is MDS's side, would it be correct to call it MDS Journal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there is MDS metadata journal (mdlog) and the MDS purge queue journal (pq). Both of course use osdc/Journaler.h class, but my question here is which of the above two journals is being referred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's purge queue journal

ceph/src/mds/PurgeQueue.cc

Lines 129 to 131 in 5061b31

journaler("pq", MDS_INO_PURGE_QUEUE + rank, metadata_pool,
CEPH_FS_ONDISK_MAGIC, objecter_, nullptr, 0,
&finisher),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. So, let's mention that explicitly in the doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar, I'll get this in the document when I get back to my office. It'll be in today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At ease @zdover23. I have requested one small update from @dparmar18 and then its good to go 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger wilco, @vshankar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dparmar18
Copy link
Contributor Author

dparmar18 commented Dec 24, 2024

@zdover23 made some minor changes and fixes while also adding some more content i felt was necessary. PTAL https://github.com/ceph/ceph/compare/ddd994291dba374fe122a7d59dfc3835fe665356..65bd5c2f0e08c68f18e02e0c21466bdd8fa4c8ee

@zdover23
Copy link
Contributor

@zdover23 made some minor changes and fixes while also adding some more content i felt was necessary. PTAL https://github.com/ceph/ceph/compare/ddd994291dba374fe122a7d59dfc3835fe665356..65bd5c2f0e08c68f18e02e0c21466bdd8fa4c8ee

@dparmar18 I'm on it.

@dparmar18
Copy link
Contributor Author

oh shoot while resolving conflicts, i messed up this part:

.. confval:: filer_max_purge_ops 
.. confval:: mds_max_purge_files
.. confval:: mds_max_purge_ops
.. confval:: mds_max_purge_ops_per_pg

- filer_max_purge_ops (default 10)
- mds_max_purge_files (default 64)
- mds_max_purge_ops (default 8192)
- mds_max_purge_ops_per_pg (default 0.5)

@dparmar18
Copy link
Contributor Author

oh shoot while resolving conflicts, i messed up this part:

.. confval:: filer_max_purge_ops 
.. confval:: mds_max_purge_files
.. confval:: mds_max_purge_ops
.. confval:: mds_max_purge_ops_per_pg

- filer_max_purge_ops (default 10)
- mds_max_purge_files (default 64)
- mds_max_purge_ops (default 8192)
- mds_max_purge_ops_per_pg (default 0.5)

removed redundant entries. https://github.com/ceph/ceph/compare/65bd5c2f0e08c68f18e02e0c21466bdd8fa4c8ee..611478de3af6a38c593949cc3101d14a3e549311

@zdover23
Copy link
Contributor

oh shoot while resolving conflicts, i messed up this part:

.. confval:: filer_max_purge_ops 
.. confval:: mds_max_purge_files
.. confval:: mds_max_purge_ops
.. confval:: mds_max_purge_ops_per_pg

- filer_max_purge_ops (default 10)
- mds_max_purge_files (default 64)
- mds_max_purge_ops (default 8192)
- mds_max_purge_ops_per_pg (default 0.5)

removed redundant entries. https://github.com/ceph/ceph/compare/65bd5c2f0e08c68f18e02e0c21466bdd8fa4c8ee..611478de3af6a38c593949cc3101d14a3e549311

@dparmar18 Just add the information that Venky asked for, and I'll make sure that this builds correctly. Let me know when it's ready for me to fix up.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start documenting this. Nice work @dparmar18

@zdover23 FYI

@zdover23
Copy link
Contributor

Good start documenting this. Nice work @dparmar18

@zdover23 FYI

@vshankar, Which release branches should I backport this change to?

@vshankar
Copy link
Contributor

Good start documenting this. Nice work @dparmar18
@zdover23 FYI

@vshankar, Which release branches should I backport this change to?

Updated the tracker - quincy,reef,squid

@zdover23 zdover23 merged commit ff1134f into ceph:main Dec 30, 2024
12 checks passed
@zdover23
Copy link
Contributor

#61193 - Squid backport
#61194 - Reef backport
#61195 - Quincy backport

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cephfs Ceph File System documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants