-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: Ec optimisations main collaboration branch #60782
Open
aainscow
wants to merge
53
commits into
ceph:main
Choose a base branch
from
aainscow:ec_optimisations_main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The align function does not need to determine whether an interval overlaps an existing interval, because the insert code has been changed recently to allow multiple inserts of overlapping regions. Also worth noting that the original code did not quite do the right thing and may have led to overlapping inserts anyway. Also, added a unit test and fixed a template issue with align. Signed-off-by: Alexander H Ainscow <[email protected]> interval_set: Allow interval_sets to be aligned
The current implementation of interval set only allows inserts to be made when the new interval does not overlap the old. The "union_insert" method works around this by creating a new interval map, then performing a union of the two sets - an inefficient process. This commit adapts the insert method to have the flexibility of union_insert, but with the efficiency of the old insert method. We should add a similar erase method in the future, although there is no need for this PR. Signed-off-by: Alexander H Ainscow <[email protected]>
Before this commit, the erase function would assert that intervals to be erased entirely exist within an existing interval. This leads to a number of issues: 1. Easy to create hard-to-test edge conditions for client code. 2. Often calling code would do inefficient thing, such a execute intersect, then subtract. 3. It is a particular surprise that this issue also applies to subtract. This new version of the erase method allows for any sized interval to be used in erase. This allows for both non or partially overlapping erases, but also can recurse over multiple contained regions, making it much more convenient to erase. The new code should be as efficient for existing erases supported by the code. For a single non-overlapping erase, the complexity is O(log n) and the constant penalty should be roughly the same. For multiple erases, the complexity is somewhere between O(log n) and O(n), scaling with the number of entries which must be erased. The complexity of subtract is the same as before O(n log m), but should have a lower constant than previously held. I have also removed various workarounds used in interval_set required by the limitations of the old erase and insert functions. The added unit tests should be extensive enough to give confidence that the new code has a functional superset of the old code. Signed-off-by: Alexander H Ainscow <[email protected]>
* More fixes for multi-region reads * Enable partial stripe reads that span stripes Signed-off-by: Bill Scales <[email protected]>
As preperation for later commits, here we move the decision about which extents are read to an earlier point in the read operation. Later commits will then extend this to perform fewer reads. Signed-off-by: Alexander H Ainscow <[email protected]>
In a previous commit, there was some inefficient code, which: 1. Creates a pair 2. Passes pair to function 3. Function does some trivial maths, then populates a new pair 4. Pair is returned and immediately deconstructed. This is inefficient, but possibly more importantly generated messy code. This code is a minor refactor to remove some of the pairs. Signed-off-by: Alexander H Ainscow <[email protected]>
The restrictions here assert that IOs are rounded to chunk size, which is no longer a requirement. Signed-off-by: Alexander H Ainscow <[email protected]>
These are utilities for consumption by future commits. These equality operators will largely be used by test harnesses and are intented to work intuitively. Signed-off-by: Alexander H Ainscow <[email protected]>
Add a harness to mock up ECCommon::ReadPipeline, thus allowing more testing of functions with read pipeline. This is a very minimal stub and will be extended when required for tests. Signed-off-by: Alexander H Ainscow <[email protected]>
Adapt the functions which determine the extents of each shard that need to be read, such that they pick the minimal reads possible. This code was tested with subsequent test harness commits only and is not expected to work outside such an environment, as the IO complete path cannot yet cope with minimal reads. There is quite a bit of fiddly integer maths going on here and this will be neatened up in a later commit once further testing has taken place and this maths is no longer undergoing frequent changes. Signed-off-by: Alexander H Ainscow <[email protected]>
Further code-neatness enhancements to remove temporary pair objects. It is noted that the compiler likely optimises out such pairs, however, where they reduce readability, they should be removed. Signed-off-by: Alexander H Ainscow <[email protected]>
…tests. Fix some warnings from clang-tidy. Also fix a unit test which was broken due to a rebase. Signed-off-by: Alexander H Ainscow <[email protected]>
This test was added to debug a problem found in an more complex test. Signed-off-by: Alexander H Ainscow <[email protected]>
…equests. FIXME: Need more detail Signed-off-by: Alexander H Ainscow <[email protected]>
This adds a new setting called osd_ec_partial_reads_experimental. When enabled, the EC code will issue much more optimal reads, however there are known scenarios in which reads will not be recovered correctly. This most likely will lead to OSD processes crashing and data becoming inaccessible. When disabled, most of the new code is still being exercised, but much of the efficiency is lost, particularly for large chunk sizes. Signed-off-by: Alexander H Ainscow <[email protected]>
Signed-off-by: Alexander H Ainscow <[email protected]>
…function. There is already a FIXME comment in this function, but asserting here is a good prompt to make sure this is fixed in the future. Signed-off-by: Alexander H Ainscow <[email protected]>
Signed-off-by: Alexander H Ainscow <[email protected]>
…shards Following a review comment, observing that read_request is never null and as such can be passed by reference, rather than by pointer. A further review request suggested that this function should return read_request in a tl::expected wrapper. However, this ends up with passing this function more parameters to enable it to construct the read request, thus making the code less efficient, so I did not implement this. Signed-off-by: Alexander H Ainscow <[email protected]>
This method returns true if the interval entirely contains the specified range, without gaps. Signed-off-by: Alexander H Ainscow <[email protected]>
Signed-off-by: Alexander H Ainscow <[email protected]>
Major changes 1. Move the bulk of the logic to ECUtil, for re-use in the write path 2. Remove references to reads in the method. 3. Move from using vectors to maps for want to read shards. The vectors initially were thought to be more efficient, but with multiple loops over the vector, this probably wasnt the case (although I never measured any performance). Signed-off-by: Alexander H Ainscow <[email protected]>
When performing optimised reads, prior to this commit, if the experimental flag was enabled, then recovery would be impossible if it involved re-reading an existing chunk. For example. A client performs a read which covers end of chunk A and the start of chunk B. If chunk B fails, then the start of chunk A is required for recovery. Signed-off-by: Alexander H Ainscow <[email protected]>
This utility tracks an extent map per shard for erasure coding. It provides a number of capabilities tailored to the requirements of the EC write path. This commit is starting a move to a new convention with erasure coding referring to which address space offsets/lengths refer to. The convention will be: "Rados object offset/size/start/end/length" refers to offsets within the object itself. This will be shortened to a prefix of "ro_" for the most part. "Shard offset/size/start/end/length" refers to offsets within the individual shards. Typically both "shard" and "offset" are required. No shortening has been established as yet. Examples: insert_in_shard(shard, offset, bufferlist) Insert a buffer into a particular shard. insert_ro_extent_map?(ro_extent_map) Convert ro_extent_map to shard addressing, then insert (merge) it into the sharded extent maps. Additionally, the (generalised) get_min_shards function has been renamed to reflect its primary role of converting between ro and shard address spaces. Signed-off-by: Alexander H Ainscow <[email protected]>
The sinfo structure is more efficient to access and now stashes: * k * m * shard mapping (aka chunk mapping) - map of shard indexing used in plugin to the k-then-m indexing assumed by many utilities. Signed-off-by: Alexander H Ainscow <[email protected]>
This commit should disappear on a reabase. Signed-off-by: Alexander H Ainscow <[email protected]>
Reorganisers the main function to be more modular, moving functionality into a TestRunner object and anonymous namespaces. Adds a new "interactive" test mode that allows you to direct the order of operations rather than using the preset sequences. This allows complete precise control of the tool when you want to test specific IOs. Signed-off-by: Jon Bailey <[email protected]>
Signed-off-by: Alexander H Ainscow <[email protected]>
Signed-off-by: Alexander H Ainscow <[email protected]>
…and delete old interfaces.
1. Add interval_set.erase_after, to avoid fiddling around with interval_set.erase(X, large_value). 2. Optimise "subtract" so that it ignores non-overlapping ranges efficiently. 3. Add interval_map interface to get the first extent, without the last.
1. Do not send a dummy write if there are any IOs or commit in flight at all to that PG. 2. Change the request interface to a two phase approach, to allow the caller to schedule multuple cache requests. 3. Remove some unnecessary debug code (check)buffers_pinned call)
1. Remove some dead code 2. Tweak some variable names 3. Move new methods.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We have multiple people working EC optimisations. This pull request is here to check that the basic acceptance tests are passing. Anyone is welcome to review it!
I will add more detail to the PR as and when it is expected to no longer be a draft.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
x
between the brackets:[x]
. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e