Fix CUDA flaky tests for stochastic gates by using CPU-seeded RNG (#1802) by cyrjano · Pull Request #1802 · meta-pytorch/captum

cyrjano · 2026-03-23T17:26:09Z

Summary:

Problem

CUDA RNG produces different random sequences on different GPU architectures (e.g. V100 vs A100 vs H100) even with the same seed set via torch.manual_seed(). This causes stochastic gate CUDA tests to be flaky in CI — the same test passes on one GPU type but fails on another because expected values were hardcoded for a specific architecture's RNG output.

Additionally, test_p_norm_decay uses exact assert == for floating-point tensor comparison, which fails on GPU due to floating-point precision differences.

Solution

CPU-seeded RNG approach: In CUDA test subclasses, patch _sample_gate_values to generate random noise on CPU (where torch.manual_seed is deterministic across all hardware) and then move the tensor to the GPU device. This keeps the full training codepath exercised (noise + mu → clamp → gather → multiply) while ensuring cross-architecture determinism.

For LazyGaussianStochasticGates, both initialize_parameters (mu initialization) and _sample_gate_values (noise sampling) happen on-device after .to(cuda), so both are patched to use CPU RNG.

Since CUDA tests now produce identical values to CPU tests, the if cpu / elif cuda branches in base test files are removed, along with associated pyre-fixme[61] comments.

For test_p_norm_decay, exact assert == is replaced with assertTensorAlmostEqual with delta=0.01 tolerance.

Files Changed

test_gaussian_stochastic_gates_cuda.py: Patch _sample_gate_values with CPU-seeded normal_() sampling
test_kuma_stochastic_gates_cuda.py: Patch _sample_gate_values with CPU-seeded uniform_() sampling + Kumaraswamy transform
test_lazy_gaussian_stochastic_gates_cuda.py: Patch both initialize_parameters and _sample_gate_values
test_gaussian_stochastic_gates.py: Remove cpu/cuda branches (4 tests)
test_kuma_stochastic_gates.py: Remove cpu/cuda branches (4 tests)
test_lazy_gaussian_stochastic_gates.py: Remove cpu/cuda branches (12 tests)
test_p_norm_decay.py: Use assertTensorAlmostEqual instead of exact equality (2 tests)

Differential Revision: D97775614

meta-codesync · 2026-03-23T17:26:17Z

@cyrjano has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97775614.

…ta-pytorch#1802) Summary: ## Problem CUDA RNG produces different random sequences on different GPU architectures (e.g. V100 vs A100 vs H100) even with the same seed set via `torch.manual_seed()`. This causes stochastic gate CUDA tests to be flaky in CI — the same test passes on one GPU type but fails on another because expected values were hardcoded for a specific architecture's RNG output. Additionally, `test_p_norm_decay` uses exact `assert ==` for floating-point tensor comparison, which fails on GPU due to floating-point precision differences. ## Solution **CPU-seeded RNG approach**: In CUDA test subclasses, patch `_sample_gate_values` to generate random noise on CPU (where `torch.manual_seed` is deterministic across all hardware) and then move the tensor to the GPU device. This keeps the full training codepath exercised (noise + mu → clamp → gather → multiply) while ensuring cross-architecture determinism. For `LazyGaussianStochasticGates`, both `initialize_parameters` (mu initialization) and `_sample_gate_values` (noise sampling) happen on-device after `.to(cuda)`, so both are patched to use CPU RNG. Since CUDA tests now produce identical values to CPU tests, the `if cpu / elif cuda` branches in base test files are removed, along with associated `pyre-fixme[61]` comments. For `test_p_norm_decay`, exact `assert ==` is replaced with `assertTensorAlmostEqual` with `delta=0.01` tolerance. ## Files Changed - `test_gaussian_stochastic_gates_cuda.py`: Patch `_sample_gate_values` with CPU-seeded `normal_()` sampling - `test_kuma_stochastic_gates_cuda.py`: Patch `_sample_gate_values` with CPU-seeded `uniform_()` sampling + Kumaraswamy transform - `test_lazy_gaussian_stochastic_gates_cuda.py`: Patch both `initialize_parameters` and `_sample_gate_values` - `test_gaussian_stochastic_gates.py`: Remove cpu/cuda branches (4 tests) - `test_kuma_stochastic_gates.py`: Remove cpu/cuda branches (4 tests) - `test_lazy_gaussian_stochastic_gates.py`: Remove cpu/cuda branches (12 tests) - `test_p_norm_decay.py`: Use `assertTensorAlmostEqual` instead of exact equality (2 tests) Differential Revision: D97775614

meta-cla bot added the cla signed label Mar 23, 2026

meta-codesync bot added fb-exported meta-exported labels Mar 23, 2026

meta-codesync bot changed the title ~~Fix CUDA flaky tests for stochastic gates by using CPU-seeded RNG~~ Fix CUDA flaky tests for stochastic gates by using CPU-seeded RNG (#1802) Mar 23, 2026

cyrjano force-pushed the export-D97775614 branch from 64024ac to 621ca43 Compare March 23, 2026 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CUDA flaky tests for stochastic gates by using CPU-seeded RNG (#1802)#1802

Fix CUDA flaky tests for stochastic gates by using CPU-seeded RNG (#1802)#1802
cyrjano wants to merge 1 commit intometa-pytorch:masterfrom
cyrjano:export-D97775614

cyrjano commented Mar 23, 2026 •

edited by meta-codesync bot

Loading

Uh oh!

meta-codesync bot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cyrjano commented Mar 23, 2026 • edited by meta-codesync bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Files Changed

Uh oh!

meta-codesync bot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cyrjano commented Mar 23, 2026 •

edited by meta-codesync bot

Loading