Skip to content

Conversation

@bluepal-avanthi-dundigala
Copy link

@bluepal-avanthi-dundigala bluepal-avanthi-dundigala commented Dec 9, 2025

Scope & Purpose

(Please describe the changes in this PR for reviewers, motivation, rationale - mandatory)

  • 💩 Bugfix
  • 🍕 New feature
  • 🔥 Performance improvement
  • 🔨 Refactoring/simplification

Checklist

  • Tests
    • Regression tests
    • C++ Unit tests
    • integration tests
    • resilience tests
  • 📖 CHANGELOG entry made
  • 📚 documentation written (release notes, API changes, ...)
  • Backports
    • Backport for 3.12.0: (Please link PR)
    • Backport for 3.11: (Please link PR)
    • Backport for 3.10: (Please link PR)

Related Information

(Please reference tickets / specification / other PRs etc)

  • Docs PR:
  • Enterprise PR:
  • GitHub issue / Jira ticket:
  • Design document:

Note

Add documentation for six new shard-related metrics (totals, not replicated/out-of-sync, follower counts) across coordinator and DBServer.

  • Metrics docs added:
    • Coordinator (Statistics):
      • Documentation/Metrics/arangodb_metadata_total_number_of_shards.yaml: arangodb_metadata_total_number_of_shards
      • Documentation/Metrics/arangodb_metadata_number_follower_shards.yaml: arangodb_metadata_number_follower_shards
      • Documentation/Metrics/arangodb_metadata_number_not_replicated_shards.yaml: arangodb_metadata_number_not_replicated_shards
      • Documentation/Metrics/arangodb_metadata_number_out_of_sync_shards.yaml: arangodb_metadata_number_out_of_sync_shards
    • DBServer (Replication):
      • Documentation/Metrics/arangodb_shards_follower_number.yaml: arangodb_shards_follower_number
      • Documentation/Metrics/arangodb_shard_followers_out_of_sync_number.yaml: arangodb_shard_followers_out_of_sync_number
  • All metrics marked introducedIn: "3.12.7", type gauge, with descriptions and troubleshoot guidance.

Written by Cursor Bugbot for commit 35cf940. This will update automatically on new commits. Configure here.

@bluepal-avanthi-dundigala bluepal-avanthi-dundigala requested a review from a team as a code owner December 9, 2025 06:15
@cla-bot cla-bot bot added the cla-signed label Dec 9, 2025
@bluepal-avanthi-dundigala bluepal-avanthi-dundigala marked this pull request as draft December 9, 2025 06:15
@bluepal-avanthi-dundigala bluepal-avanthi-dundigala requested review from jbajic and removed request for a team December 9, 2025 06:16
Comment on lines 1 to 2
name: arangodb_shards_follower_number
introducedIn: "3.12.7"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as arangodb_metadata_number_follower_shards.yaml?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as arangodb_metadata_number_follower_shards.yaml?

I added a separate file for the coordinator metrics as we previously agreed. If you prefer to use the existing 'arangodb_metadata_number_follower_shards.yaml', we need to update it by including 'coordinator' in the 'exposedBy' section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this file (arangodb_shards_follower_number.yaml) was taken from the PR #22121 (I assume as an example to work with). It can be deleted in this branch, as it will be added when the other one is merged. The same goes for the file arangodb_shard_followers_out_of_sync_number.yaml.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting files.

Copy link
Contributor

@KVS85 KVS85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See review.

@@ -0,0 +1,35 @@
name: arangodb_metadata_number_follower_shards
introducedIn: "3.12.7"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, put 3.12.8 here instead.

@@ -0,0 +1,35 @@
name: arangodb_metadata_number_not_replicated_shards
introducedIn: "3.12.7"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, put 3.12.8 here instead.

@@ -0,0 +1,35 @@
name: arangodb_metadata_number_out_of_sync_shards
introducedIn: "3.12.7"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, put 3.12.8 here instead.

@@ -0,0 +1,36 @@
name: arangodb_metadata_total_number_of_shards
introducedIn: "3.12.7"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, put 3.12.8 here instead.

@@ -0,0 +1,23 @@
name: arangodb_shard_followers_out_of_sync_number
introducedIn: "3.12.7"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, put 3.12.8 here instead.

Comment on lines 1958 to +1973
updateMetadataMetrics();

// Update coordinator specific shard metrics computed from Plan
if (_metadataMetrics.has_value()) {
updateCoordinatorPlanMetrics();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until now, updateMetadataMetrics() has been the (only) method that updates _metadataMetrics. Now you had the need to add another one to be called during loadCurrent() (updateCoordinatorCurrentMetrics()), and so I also get the sibling updateCoordinatorPlanMetrics() here.

Nevertheless, the current situation with these three methods is a little confusing, at least because updateMetadataMetrics() no longer updates (all the) metadata metrics.

To mitigate that, you could do one or both of

  • change the name(s) to clarify their respective responsibilities
  • merge updateCoordinatorPlanMetrics() with updateMetadataMetrics()
    .

Later, when we follow through with loading Plan and Current synchronously, it all can be merged into one call.

Comment on lines +6373 to +6377
if (it == _shardsToPlanServers.end() || !it->second) {
// no plan servers recorded
++notReplicated;
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case should be impossible, _shards and _shardsToPlanServers have to be consistent here.

It's surely sensible to be defensive and handle it safely regardless. As it hints to a bug at a different place, this should get a maintainer-mode assertion (e.g. TRI_ASSERT(it != _shardsToPlanServers.end() && it->second)). And possibly a log message with an error in production, though that doesn't feel necessary to me.

Fyi, maintainer-mode assertions (TRI_ASSERT) are only compiled in maintainer builds, and are not part of release builds.

Comment on lines +6386 to +6388
} else {
++notReplicated;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this case should not be possible, either. Let's also add a maintainer-mode assertion (e.g. TRI_ASSERT(!servers.empty()) to cement this.

uint64_t notReplicated = 0;

// read plan under read lock to compare leaders/replication factors
READ_LOCKER(readLocker, _planProt.lock);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possibly dangerous, as we're already holding a write lock for _currentProt.lock. We need to be certain this can't introduce deadlocks (e.g. if there is another code path that acquires a write lock for _planProt.lock, and then tries to get a read lock for _currentProt.lock, it could run into a deadlock with this code).

Afair, the ClusterInfo generally only uses one of those locks at a time. But I am not certain, I haven't checked the current code. If there is already code that does that, we must follow a common pattern that avoids deadlocks. And regardless of whether there is such code already, this should be documented at both locks.

Alternatively, we could change the caller to hold only a read lock instead of a write lock for _currentProt.lock: As long as we only ever hold both locks as read locks at the same time, this will prevent deadlocks as well.

I think I prefer the latter solution (only holding read locks), as it also means holding the write lock for a shorter amount of time, and reduce contention with readers: This can be detrimental otherwise. Note that this also holds for not only updateCoordinatorPlanMetrics, but also for the existing function updateMetadataMetrics(): It would be preferable to execute all of them under read locks only! Note that loadPlan and loadCurrent both also hold the _planProt.mutex and _currentProt.mutex, respectively, for their duration: This can be relied on to prevent concurrent calls to the update*Metrics functions.

auto const& shardId = it.first;
auto const& serversPtr = it.second;
++totalShards;
size_t currentN = (serversPtr ? serversPtr->size() : 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serversPtr must never be null, or there's a bug somewhere else. Double-checking it is absolutely fine, but I'd also add a maintainer-mode assertion to spot possible bugs during testing (e.g. TRI_ASSERT(serversPtr != nullptr)).

Comment on lines +6424 to +6426
if (currentN > 0) {
followers += (currentN - 1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly: In addition to serversPtr not being null, I think *serversPtr must never be empty. I'd also add a maintainer-mode assertion to that end. Though in this case, I am not 100% certain, so please do keep the check. :-)

// read plan under read lock to compare leaders/replication factors
READ_LOCKER(readLocker, _planProt.lock);

for (auto const& it : _shardsToCurrentServers) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that currently, Current can have stale entries, i.e. for collections/shards that have already been deleted. This could cause miscounts. It may help to think of the Plan as the actual cluster-wide metadata for e.g. a collection, and corresponding Current entries as a last reported state.

I suggest to flip it around: Loop over _shardsToPlanServers instead, and look up the corresponding entries in _shardsToCurrent.

Comment on lines +6397 to +6399
_metadataMetrics->numberOutOfSyncShards.store(0, std::memory_order_relaxed);
_metadataMetrics->numberNotReplicatedShards.store(notReplicated,
std::memory_order_relaxed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two metrics should only be updated in updateCoordinatorCurrentMetrics():

Suggested change
_metadataMetrics->numberOutOfSyncShards.store(0, std::memory_order_relaxed);
_metadataMetrics->numberNotReplicatedShards.store(notReplicated,
std::memory_order_relaxed);

Which also means we don't need to calculate notReplicated here (and we can't, really).

Comment on lines +6460 to +6463
_metadataMetrics->totalNumberOfShards.store(totalShards,
std::memory_order_relaxed);
_metadataMetrics->numberFollowerShards.store(followers,
std::memory_order_relaxed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two metrics are already being reported by updateCoordinatorCurrentMetrics(), and should not be reported here as well.

Suggested change
_metadataMetrics->totalNumberOfShards.store(totalShards,
std::memory_order_relaxed);
_metadataMetrics->numberFollowerShards.store(followers,
std::memory_order_relaxed);

Comment on lines +6445 to +6457
// out of sync: leader changed or missing
if (plannedN == 0) {
// No plan present: treat as out-of-sync
++outOfSync;
} else if (currentLeader.empty()) {
++outOfSync;
} else if (!plannedLeader.empty() && plannedLeader != currentLeader) {
++outOfSync;
}

if (plannedN > currentN) {
++notReplicated;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo outOfSync and notReplicated don't quite count the right thing; let me try to explain my idea of them (high-level, skipping over some details):

If there is only one in-sync replica (the leader), the shard should be counted as notReplicated. This is also the case for shards that are planned with replicationFactor: 1 (which means, the plan already only contains the leader as the only replica).

If any follower from Plan is missing from the list in Current, it is regarded as out-of-sync.

Note that a shard can be counted as both notReplicated and outOfSync, or any one of them, or neither.

But note that there are some additional edge cases to consider, e.g. during a leader failover or move shard operation. Let's discuss these separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants