osd: Ec optimisations main collaboration branch #60782

aainscow · 2024-11-20T10:45:19Z

We have multiple people working EC optimisations. This pull request is here to check that the basic acceptance tests are passing. Anyone is welcome to review it!

I will add more detail to the PR as and when it is expected to no longer be a draft.

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

The align function does not need to determine whether an interval overlaps an existing interval, because the insert code has been changed recently to allow multiple inserts of overlapping regions. Also worth noting that the original code did not quite do the right thing and may have led to overlapping inserts anyway. Also, added a unit test and fixed a template issue with align. Signed-off-by: Alexander H Ainscow <[email protected]> interval_set: Allow interval_sets to be aligned

The current implementation of interval set only allows inserts to be made when the new interval does not overlap the old. The "union_insert" method works around this by creating a new interval map, then performing a union of the two sets - an inefficient process. This commit adapts the insert method to have the flexibility of union_insert, but with the efficiency of the old insert method. We should add a similar erase method in the future, although there is no need for this PR. Signed-off-by: Alexander H Ainscow <[email protected]>

Before this commit, the erase function would assert that intervals to be erased entirely exist within an existing interval. This leads to a number of issues: 1. Easy to create hard-to-test edge conditions for client code. 2. Often calling code would do inefficient thing, such a execute intersect, then subtract. 3. It is a particular surprise that this issue also applies to subtract. This new version of the erase method allows for any sized interval to be used in erase. This allows for both non or partially overlapping erases, but also can recurse over multiple contained regions, making it much more convenient to erase. The new code should be as efficient for existing erases supported by the code. For a single non-overlapping erase, the complexity is O(log n) and the constant penalty should be roughly the same. For multiple erases, the complexity is somewhere between O(log n) and O(n), scaling with the number of entries which must be erased. The complexity of subtract is the same as before O(n log m), but should have a lower constant than previously held. I have also removed various workarounds used in interval_set required by the limitations of the old erase and insert functions. The added unit tests should be extensive enough to give confidence that the new code has a functional superset of the old code. Signed-off-by: Alexander H Ainscow <[email protected]>

* More fixes for multi-region reads * Enable partial stripe reads that span stripes Signed-off-by: Bill Scales <[email protected]>

As preperation for later commits, here we move the decision about which extents are read to an earlier point in the read operation. Later commits will then extend this to perform fewer reads. Signed-off-by: Alexander H Ainscow <[email protected]>

In a previous commit, there was some inefficient code, which: 1. Creates a pair 2. Passes pair to function 3. Function does some trivial maths, then populates a new pair 4. Pair is returned and immediately deconstructed. This is inefficient, but possibly more importantly generated messy code. This code is a minor refactor to remove some of the pairs. Signed-off-by: Alexander H Ainscow <[email protected]>

The restrictions here assert that IOs are rounded to chunk size, which is no longer a requirement. Signed-off-by: Alexander H Ainscow <[email protected]>

These are utilities for consumption by future commits. These equality operators will largely be used by test harnesses and are intented to work intuitively. Signed-off-by: Alexander H Ainscow <[email protected]>

Add a harness to mock up ECCommon::ReadPipeline, thus allowing more testing of functions with read pipeline. This is a very minimal stub and will be extended when required for tests. Signed-off-by: Alexander H Ainscow <[email protected]>

Adapt the functions which determine the extents of each shard that need to be read, such that they pick the minimal reads possible. This code was tested with subsequent test harness commits only and is not expected to work outside such an environment, as the IO complete path cannot yet cope with minimal reads. There is quite a bit of fiddly integer maths going on here and this will be neatened up in a later commit once further testing has taken place and this maths is no longer undergoing frequent changes. Signed-off-by: Alexander H Ainscow <[email protected]>

Further code-neatness enhancements to remove temporary pair objects. It is noted that the compiler likely optimises out such pairs, however, where they reduce readability, they should be removed. Signed-off-by: Alexander H Ainscow <[email protected]>

…tests. Fix some warnings from clang-tidy. Also fix a unit test which was broken due to a rebase. Signed-off-by: Alexander H Ainscow <[email protected]>

This test was added to debug a problem found in an more complex test. Signed-off-by: Alexander H Ainscow <[email protected]>

…equests. FIXME: Need more detail Signed-off-by: Alexander H Ainscow <[email protected]>

This adds a new setting called osd_ec_partial_reads_experimental. When enabled, the EC code will issue much more optimal reads, however there are known scenarios in which reads will not be recovered correctly. This most likely will lead to OSD processes crashing and data becoming inaccessible. When disabled, most of the new code is still being exercised, but much of the efficiency is lost, particularly for large chunk sizes. Signed-off-by: Alexander H Ainscow <[email protected]>

Signed-off-by: Alexander H Ainscow <[email protected]>

…function. There is already a FIXME comment in this function, but asserting here is a good prompt to make sure this is fixed in the future. Signed-off-by: Alexander H Ainscow <[email protected]>

Signed-off-by: Alexander H Ainscow <[email protected]>

…shards Following a review comment, observing that read_request is never null and as such can be passed by reference, rather than by pointer. A further review request suggested that this function should return read_request in a tl::expected wrapper. However, this ends up with passing this function more parameters to enable it to construct the read request, thus making the code less efficient, so I did not implement this. Signed-off-by: Alexander H Ainscow <[email protected]>

This method returns true if the interval entirely contains the specified range, without gaps. Signed-off-by: Alexander H Ainscow <[email protected]>

Signed-off-by: Alexander H Ainscow <[email protected]>

Major changes 1. Move the bulk of the logic to ECUtil, for re-use in the write path 2. Remove references to reads in the method. 3. Move from using vectors to maps for want to read shards. The vectors initially were thought to be more efficient, but with multiple loops over the vector, this probably wasnt the case (although I never measured any performance). Signed-off-by: Alexander H Ainscow <[email protected]>

When performing optimised reads, prior to this commit, if the experimental flag was enabled, then recovery would be impossible if it involved re-reading an existing chunk. For example. A client performs a read which covers end of chunk A and the start of chunk B. If chunk B fails, then the start of chunk A is required for recovery. Signed-off-by: Alexander H Ainscow <[email protected]>

This utility tracks an extent map per shard for erasure coding. It provides a number of capabilities tailored to the requirements of the EC write path. This commit is starting a move to a new convention with erasure coding referring to which address space offsets/lengths refer to. The convention will be: "Rados object offset/size/start/end/length" refers to offsets within the object itself. This will be shortened to a prefix of "ro_" for the most part. "Shard offset/size/start/end/length" refers to offsets within the individual shards. Typically both "shard" and "offset" are required. No shortening has been established as yet. Examples: insert_in_shard(shard, offset, bufferlist) Insert a buffer into a particular shard. insert_ro_extent_map?(ro_extent_map) Convert ro_extent_map to shard addressing, then insert (merge) it into the sharded extent maps. Additionally, the (generalised) get_min_shards function has been renamed to reflect its primary role of converting between ro and shard address spaces. Signed-off-by: Alexander H Ainscow <[email protected]>

The sinfo structure is more efficient to access and now stashes: * k * m * shard mapping (aka chunk mapping) - map of shard indexing used in plugin to the k-then-m indexing assumed by many utilities. Signed-off-by: Alexander H Ainscow <[email protected]>

This commit should disappear on a reabase. Signed-off-by: Alexander H Ainscow <[email protected]>

Reorganisers the main function to be more modular, moving functionality into a TestRunner object and anonymous namespaces. Adds a new "interactive" test mode that allows you to direct the order of operations rather than using the preset sequences. This allows complete precise control of the tool when you want to test specific IOs. Signed-off-by: Jon Bailey <[email protected]>

Signed-off-by: Alexander H Ainscow <[email protected]>

…and delete old interfaces.

1. Add interval_set.erase_after, to avoid fiddling around with interval_set.erase(X, large_value). 2. Optimise "subtract" so that it ignores non-overlapping ranges efficiently. 3. Add interval_map interface to get the first extent, without the last.

1. Do not send a dummy write if there are any IOs or commit in flight at all to that PG. 2. Change the request interface to a two phase approach, to allow the caller to schedule multuple cache requests. 3. Remove some unnecessary debug code (check)buffers_pinned call)

1. Remove some dead code 2. Tweak some variable names 3. Move new methods.

aainscow and others added 30 commits November 19, 2024 11:32

osd: further EC partial stripe read fixes

4067c25

* More fixes for multi-region reads * Enable partial stripe reads that span stripes Signed-off-by: Bill Scales <[email protected]>

erasure-code: Remove overly strict assertions in ECUtils

0314a26

The restrictions here assert that IOs are rounded to chunk size, which is no longer a requirement. Signed-off-by: Alexander H Ainscow <[email protected]>

erasure-code: Override operator== for some some ec structures

30a84a2

These are utilities for consumption by future commits. These equality operators will largely be used by test harnesses and are intented to work intuitively. Signed-off-by: Alexander H Ainscow <[email protected]>

erasure-code: Harness for TestECBackend.cc

403a2d8

Add a harness to mock up ECCommon::ReadPipeline, thus allowing more testing of functions with read pipeline. This is a very minimal stub and will be extended when required for tests. Signed-off-by: Alexander H Ainscow <[email protected]>

erasure-code: ECUtil - remove pairs

39908b0

Further code-neatness enhancements to remove temporary pair objects. It is noted that the compiler likely optimises out such pairs, however, where they reduce readability, they should be removed. Signed-off-by: Alexander H Ainscow <[email protected]>

erasure-code: Clean up TestECBackend.cc warnings and fix some broken …

e982eb5

…tests. Fix some warnings from clang-tidy. Also fix a unit test which was broken due to a rebase. Signed-off-by: Alexander H Ainscow <[email protected]>

erasure-code: new TestECBackend test

47d4159

This test was added to debug a problem found in an more complex test. Signed-off-by: Alexander H Ainscow <[email protected]>

erasure-code: Adapt io completions to cope with more efficient read r…

ced320a

…equests. FIXME: Need more detail Signed-off-by: Alexander H Ainscow <[email protected]>

erasure_code: Fix some minor white space issues.

2f73f1b

Signed-off-by: Alexander H Ainscow <[email protected]>

erasure_code: Assert if code attempts to execute not-yet-implemented …

8858e0a

…function. There is already a FIXME comment in this function, but asserting here is a good prompt to make sure this is fixed in the future. Signed-off-by: Alexander H Ainscow <[email protected]>

erasure_code: Fix unit tests by enabling experimental mode

2563dc4

Signed-off-by: Alexander H Ainscow <[email protected]>

interval_map: Add "contains" for interval_map.const_iterator.

fa8f417

This method returns true if the interval entirely contains the specified range, without gaps. Signed-off-by: Alexander H Ainscow <[email protected]>

interval_map: add unit test for contains

3da8f2f

Signed-off-by: Alexander H Ainscow <[email protected]>

const inverval map change

e338ddd

This commit should disappear on a reabase. Signed-off-by: Alexander H Ainscow <[email protected]>

test/osd: Fix chunksize parameter on ceph_test_radio_io_sequence

9290be8

Signed-off-by: Alexander H Ainscow <[email protected]>

test/osd Bill's test injects

a157cb4

Signed-off-by: Alexander H Ainscow <[email protected]>

osd/test: Cherry Picks from Jon Bailey

3ea2c03

aainscow added 23 commits November 19, 2024 11:32

osd: Rollback code cherry picked from Bill Scales

5e4585f

osd/test: Update interface in ceph erasure code tool

f8e45bc

osd/test - Updated test harnesses

0565f01

osd: Erasure coding write optimisations

3b4c34c

osd: Fix for rebase conflict

91146e0

osd/test: Fix extent cache test

9440d5c

osd: Create shard extent set class

694efbc

osd: Some C++ modernisations

09aa427

osd: Fix shard_extent_set.at

b4a450d

osd: Add new EC Extent Cache

a3e3474

osd/test: Add missing IO exerciser files

16eaf70

osd: add various missing files

56e557e

osd: Remove spurious include

98c702e

osd/test: Adapt erasure code tool to use new decode/encode interface …

06e43cf

…and delete old interfaces.

osd: Fix up ec non-optimised path

d664ebb

osd: Remove dead code

fa7740a

osd: Remove dead code

78fd38c

osd: Turn off EC debug and some misc fixes

d37174b

osd: Optimised debug, so its less intrusive if its left on by mistake.

839d903

osd: Various minor improvements to EC code.

3d891e6

1. Remove some dead code 2. Tweak some variable names 3. Move new methods.

osd/test: Make test_extent_cache.cc work

62f167a

aainscow requested a review from a team as a code owner November 20, 2024 10:45

github-actions bot added build/ops common core mon tests labels Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: Ec optimisations main collaboration branch #60782

osd: Ec optimisations main collaboration branch #60782

aainscow commented Nov 20, 2024

osd: Ec optimisations main collaboration branch #60782

Are you sure you want to change the base?

osd: Ec optimisations main collaboration branch #60782

Conversation

aainscow commented Nov 20, 2024

Contribution Guidelines

Checklist