Skip to content

Support hot reload for cluster runtime configs#17975

Open
CRZbulabula wants to merge 2 commits into
masterfrom
analysis/confignode-hot-reload-configs
Open

Support hot reload for cluster runtime configs#17975
CRZbulabula wants to merge 2 commits into
masterfrom
analysis/confignode-hot-reload-configs

Conversation

@CRZbulabula

Copy link
Copy Markdown
Contributor

Description

This PR adds hot-reload support for several cluster runtime configuration items managed by ConfigNode:

  • heartbeat_interval_in_ms
  • continuous_query_min_every_interval_in_ms
  • schema_region_group_extension_policy and data_region_group_extension_policy
  • default_schema_region_group_num_per_database and default_data_region_group_num_per_database
  • schema_region_per_data_node and data_region_per_data_node
  • read_consistency_level

The implementation updates ConfigNode and DataNode hot-reload paths, removes stale cached configuration snapshots from region extension and read-consistency paths, and reschedules heartbeat-related ConfigNode services when the heartbeat interval changes. Region-per-DataNode changes trigger max region group recalculation only on the current ConfigNode leader.

Load-balancing related parameters were also reviewed. Region extension policies and region-per-DataNode limits now support hot reload. region_group_allocate_policy and Ratis auto leader-balance settings remain restart-only because the corresponding balancer/allocator instances are selected during service initialization.

Testing

  • mvn spotless:apply -pl iotdb-core/confignode
  • mvn spotless:apply -pl iotdb-core/datanode
  • mvn spotless:apply -pl integration-test -P with-integration-tests
  • mvn compile -pl iotdb-core/confignode,iotdb-core/datanode -am -DskipTests -DskipUTs
  • mvn verify -DskipUTs -Dit.test=IoTDBSetConfigurationIT -DfailIfNoTests=false -Dfailsafe.failIfNoSpecifiedTests=false -Drat.skip=true -pl integration-test -am -P with-integration-tests
  • git diff --check

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds hot-reload support for multiple cluster runtime configuration items that are managed by ConfigNode and need to take effect without restarting nodes (e.g., heartbeat interval, CQ min interval, region extension policies/limits, and read consistency). It updates both ConfigNode and DataNode reload paths, removes stale cached config snapshots from execution paths, and reschedules heartbeat-driven services when the heartbeat interval changes.

Changes:

  • Mark several cluster config items as hot_reload in the system properties template and implement their hot-reload parsing/validation in ConfigNode/DataNode descriptors.
  • Remove static cached snapshots of runtime config values (region extension policies, region-per-DataNode, read consistency) so new values take effect immediately.
  • Add rescheduling/rebuild hooks for heartbeat-related services and failure detectors when the heartbeat interval changes, plus integration tests covering the hot-reload behavior.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
iotdb-core/node-commons/src/assembly/resources/conf/iotdb-system.properties.template Marks selected cluster configs as hot_reload in the template.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/plan/AbstractFragmentParallelPlanner.java Stops caching read consistency at construction; reads live config for routing decisions.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/conf/IoTDBDescriptor.java Adds hot-reload parsing for CQ min interval and read consistency level on DataNode.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/conf/IoTDBConfig.java Makes hot-reloaded fields volatile and keeps CQ min interval fields in sync.
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/schema/ClusterSchemaManager.java Removes static cached region-per-node limits; reads live ConfigNode config.
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/RetryFailedTasksThread.java Schedules retry loop using live heartbeat interval; adds reload hook.
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/partition/PartitionManager.java Removes static cached region extension policies; reads live ConfigNode config.
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/service/TopologyService.java Uses shared failure-detector builder and adds reload hook.
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/service/StatisticsService.java Schedules with live heartbeat interval; adds reload hook.
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/service/HeartbeatService.java Schedules with live heartbeat interval; adds reload hook.
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/service/EventService.java Schedules with live heartbeat interval; adds reload hook.
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/LoadManager.java Adds a centralized heartbeat-interval reload method to reschedule services and rebuild detectors.
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/cache/region/RegionGroupCache.java Adds failure-detector reload fan-out to region caches.
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/cache/LoadCache.java Adds failure-detector reload fan-out to all cache types.
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/cache/AbstractLoadCache.java Makes detector volatile, centralizes detector creation, and adds reload method.
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/ConfigManager.java Triggers service rescheduling and leader-only max-region recalculation on hot-reload.
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/conf/ConfigNodeDescriptor.java Refactors config parsing into helpers; adds hot-reload validation/apply for new hot-reload items.
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/conf/ConfigNodeConfig.java Makes newly hot-reloaded config fields volatile.
integration-test/src/test/java/org/apache/iotdb/db/it/IoTDBSetConfigurationIT.java Adds IT coverage for hot-reload of heartbeat interval, CQ min interval, region extension config, and read consistency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 18.18182% with 171 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.19%. Comparing base (b443006) to head (6dc94ed).

Files with missing lines Patch % Lines
...he/iotdb/confignode/conf/ConfigNodeDescriptor.java 0.00% 85 Missing ⚠️
...apache/iotdb/confignode/manager/ConfigManager.java 0.00% 17 Missing ⚠️
...tdb/confignode/manager/RetryFailedTasksThread.java 0.00% 11 Missing ⚠️
.../confignode/manager/load/service/EventService.java 0.00% 11 Missing ⚠️
...fignode/manager/load/service/HeartbeatService.java 0.00% 11 Missing ⚠️
...ignode/manager/load/service/StatisticsService.java 0.00% 11 Missing ⚠️
...che/iotdb/confignode/manager/load/LoadManager.java 0.00% 6 Missing ⚠️
...iotdb/confignode/manager/load/cache/LoadCache.java 0.00% 4 Missing ⚠️
...onfignode/manager/schema/ClusterSchemaManager.java 0.00% 4 Missing ⚠️
...nfignode/manager/load/cache/AbstractLoadCache.java 70.00% 3 Missing ⚠️
... and 4 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17975      +/-   ##
============================================
- Coverage     41.21%   41.19%   -0.02%     
  Complexity      318      318              
============================================
  Files          5258     5258              
  Lines        365944   366060     +116     
  Branches      47330    47340      +10     
============================================
- Hits         150825   150803      -22     
- Misses       215119   215257     +138     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants