Skip to content

Fix CUDA flaky tests for stochastic gates by using CPU-seeded RNG (#1802)#1802

Open
cyrjano wants to merge 1 commit intometa-pytorch:masterfrom
cyrjano:export-D97775614
Open

Fix CUDA flaky tests for stochastic gates by using CPU-seeded RNG (#1802)#1802
cyrjano wants to merge 1 commit intometa-pytorch:masterfrom
cyrjano:export-D97775614

Conversation

@cyrjano
Copy link
Contributor

@cyrjano cyrjano commented Mar 23, 2026

Summary:

Problem

CUDA RNG produces different random sequences on different GPU architectures (e.g. V100 vs A100 vs H100) even with the same seed set via torch.manual_seed(). This causes stochastic gate CUDA tests to be flaky in CI — the same test passes on one GPU type but fails on another because expected values were hardcoded for a specific architecture's RNG output.

Additionally, test_p_norm_decay uses exact assert == for floating-point tensor comparison, which fails on GPU due to floating-point precision differences.

Solution

CPU-seeded RNG approach: In CUDA test subclasses, patch _sample_gate_values to generate random noise on CPU (where torch.manual_seed is deterministic across all hardware) and then move the tensor to the GPU device. This keeps the full training codepath exercised (noise + mu → clamp → gather → multiply) while ensuring cross-architecture determinism.

For LazyGaussianStochasticGates, both initialize_parameters (mu initialization) and _sample_gate_values (noise sampling) happen on-device after .to(cuda), so both are patched to use CPU RNG.

Since CUDA tests now produce identical values to CPU tests, the if cpu / elif cuda branches in base test files are removed, along with associated pyre-fixme[61] comments.

For test_p_norm_decay, exact assert == is replaced with assertTensorAlmostEqual with delta=0.01 tolerance.

Files Changed

  • test_gaussian_stochastic_gates_cuda.py: Patch _sample_gate_values with CPU-seeded normal_() sampling
  • test_kuma_stochastic_gates_cuda.py: Patch _sample_gate_values with CPU-seeded uniform_() sampling + Kumaraswamy transform
  • test_lazy_gaussian_stochastic_gates_cuda.py: Patch both initialize_parameters and _sample_gate_values
  • test_gaussian_stochastic_gates.py: Remove cpu/cuda branches (4 tests)
  • test_kuma_stochastic_gates.py: Remove cpu/cuda branches (4 tests)
  • test_lazy_gaussian_stochastic_gates.py: Remove cpu/cuda branches (12 tests)
  • test_p_norm_decay.py: Use assertTensorAlmostEqual instead of exact equality (2 tests)

Differential Revision: D97775614

@meta-cla meta-cla bot added the cla signed label Mar 23, 2026
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Mar 23, 2026

@cyrjano has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97775614.

…ta-pytorch#1802)

Summary:

## Problem

CUDA RNG produces different random sequences on different GPU architectures (e.g. V100 vs A100 vs H100) even with the same seed set via `torch.manual_seed()`. This causes stochastic gate CUDA tests to be flaky in CI — the same test passes on one GPU type but fails on another because expected values were hardcoded for a specific architecture's RNG output.

Additionally, `test_p_norm_decay` uses exact `assert ==` for floating-point tensor comparison, which fails on GPU due to floating-point precision differences.

## Solution

**CPU-seeded RNG approach**: In CUDA test subclasses, patch `_sample_gate_values` to generate random noise on CPU (where `torch.manual_seed` is deterministic across all hardware) and then move the tensor to the GPU device. This keeps the full training codepath exercised (noise + mu → clamp → gather → multiply) while ensuring cross-architecture determinism.

For `LazyGaussianStochasticGates`, both `initialize_parameters` (mu initialization) and `_sample_gate_values` (noise sampling) happen on-device after `.to(cuda)`, so both are patched to use CPU RNG.

Since CUDA tests now produce identical values to CPU tests, the `if cpu / elif cuda` branches in base test files are removed, along with associated `pyre-fixme[61]` comments.

For `test_p_norm_decay`, exact `assert ==` is replaced with `assertTensorAlmostEqual` with `delta=0.01` tolerance.

## Files Changed

- `test_gaussian_stochastic_gates_cuda.py`: Patch `_sample_gate_values` with CPU-seeded `normal_()` sampling
- `test_kuma_stochastic_gates_cuda.py`: Patch `_sample_gate_values` with CPU-seeded `uniform_()` sampling + Kumaraswamy transform
- `test_lazy_gaussian_stochastic_gates_cuda.py`: Patch both `initialize_parameters` and `_sample_gate_values`
- `test_gaussian_stochastic_gates.py`: Remove cpu/cuda branches (4 tests)
- `test_kuma_stochastic_gates.py`: Remove cpu/cuda branches (4 tests)
- `test_lazy_gaussian_stochastic_gates.py`: Remove cpu/cuda branches (12 tests)
- `test_p_norm_decay.py`: Use `assertTensorAlmostEqual` instead of exact equality (2 tests)

Differential Revision: D97775614
@meta-codesync meta-codesync bot changed the title Fix CUDA flaky tests for stochastic gates by using CPU-seeded RNG Fix CUDA flaky tests for stochastic gates by using CPU-seeded RNG (#1802) Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant