Skip to content

Respect pre-set env vars and add cgroup memory detection in env scripts#17765

Open
MileaRobertStefan wants to merge 3 commits into
apache:masterfrom
MileaRobertStefan:fix_env_scripts_memory_size_envvar
Open

Respect pre-set env vars and add cgroup memory detection in env scripts#17765
MileaRobertStefan wants to merge 3 commits into
apache:masterfrom
MileaRobertStefan:fix_env_scripts_memory_size_envvar

Conversation

@MileaRobertStefan

@MileaRobertStefan MileaRobertStefan commented May 25, 2026

Copy link
Copy Markdown

Description

Adds cgroup v1/v2 memory detection.

This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold
    for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

@CRZbulabula CRZbulabula left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this — JVM auto-sizing in containers reading host memory is a real and well-known footgun, and the cgroup v1/v2 dual handling here is the right shape. A few inline suggestions below; nothing blocking. Two file-level notes:

  1. No automated tests. Cgroup-gated shell behavior is awkward to unit-test, but a small harness that parameterizes the cgroup mount point (default /sys/fs/cgroup) and runs the function against fixture files would catch regressions cheaply. Acceptable to skip given the PR description confirms manual cluster verification.
  2. Duplication. The same ~17-line block is copy-pasted across confignode-env.sh and datanode-env.sh. The existing calculate_memory_sizes() is already duplicated in both files, so this is consistent with the current style — but a follow-up that sources a shared helper from scripts/conf/iotdb-common.sh would reduce drift.

Comment thread scripts/conf/confignode-env.sh Outdated
Comment thread scripts/conf/confignode-env.sh Outdated
Comment thread scripts/conf/datanode-env.sh Outdated
Comment thread scripts/conf/datanode-env.sh Outdated
Comment thread scripts/conf/windows/confignode-env.bat Outdated
@CRZbulabula

Copy link
Copy Markdown
Contributor

The two failing unit-test ... datanode jobs are a stale-base false positive, not caused by this PR. A rebase will clear them.

This PR only touches shell/bat scripts (no Java), but the two failures are in a Java pipe test — TsFileInsertionEventParserTest#testScanParserSkipsUnnecessaryBitMaps and #testTableParserSkipsUnnecessaryBitMaps — so the change cannot be the cause.

Root cause: the failing workflow run (26503672419) was created on 2026-05-27. Those two tests were introduced by #17770 and were broken on master at that time; they were only fixed afterwards by #17779 (2026-05-28), #17807 (2026-06-03), and #17844 (2026-06-04) — all of which landed after this run's merge base. Clicking "Re-run failed jobs" reuses the original 2026-05-27 merge commit and does not pull in newer master, so re-running will keep failing with the identical result no matter how many times it's retried. Recent unrelated PRs (e.g. #17937 / #17935 / #17926) pass these same datanode unit-test jobs on current master, confirming the tests are green now.

Fix: rebase this branch onto the latest master and force-push (or use the "Update branch" button). That generates a fresh merge base containing the fixes above, and the datanode unit-test jobs should go green. A plain "Re-run" will not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants