Fix CUDA flaky tests for stochastic gates by using CPU-seeded RNG (#1802)#1802
Open
cyrjano wants to merge 1 commit intometa-pytorch:masterfrom
Open
Fix CUDA flaky tests for stochastic gates by using CPU-seeded RNG (#1802)#1802cyrjano wants to merge 1 commit intometa-pytorch:masterfrom
cyrjano wants to merge 1 commit intometa-pytorch:masterfrom
Conversation
Contributor
…ta-pytorch#1802) Summary: ## Problem CUDA RNG produces different random sequences on different GPU architectures (e.g. V100 vs A100 vs H100) even with the same seed set via `torch.manual_seed()`. This causes stochastic gate CUDA tests to be flaky in CI — the same test passes on one GPU type but fails on another because expected values were hardcoded for a specific architecture's RNG output. Additionally, `test_p_norm_decay` uses exact `assert ==` for floating-point tensor comparison, which fails on GPU due to floating-point precision differences. ## Solution **CPU-seeded RNG approach**: In CUDA test subclasses, patch `_sample_gate_values` to generate random noise on CPU (where `torch.manual_seed` is deterministic across all hardware) and then move the tensor to the GPU device. This keeps the full training codepath exercised (noise + mu → clamp → gather → multiply) while ensuring cross-architecture determinism. For `LazyGaussianStochasticGates`, both `initialize_parameters` (mu initialization) and `_sample_gate_values` (noise sampling) happen on-device after `.to(cuda)`, so both are patched to use CPU RNG. Since CUDA tests now produce identical values to CPU tests, the `if cpu / elif cuda` branches in base test files are removed, along with associated `pyre-fixme[61]` comments. For `test_p_norm_decay`, exact `assert ==` is replaced with `assertTensorAlmostEqual` with `delta=0.01` tolerance. ## Files Changed - `test_gaussian_stochastic_gates_cuda.py`: Patch `_sample_gate_values` with CPU-seeded `normal_()` sampling - `test_kuma_stochastic_gates_cuda.py`: Patch `_sample_gate_values` with CPU-seeded `uniform_()` sampling + Kumaraswamy transform - `test_lazy_gaussian_stochastic_gates_cuda.py`: Patch both `initialize_parameters` and `_sample_gate_values` - `test_gaussian_stochastic_gates.py`: Remove cpu/cuda branches (4 tests) - `test_kuma_stochastic_gates.py`: Remove cpu/cuda branches (4 tests) - `test_lazy_gaussian_stochastic_gates.py`: Remove cpu/cuda branches (12 tests) - `test_p_norm_decay.py`: Use `assertTensorAlmostEqual` instead of exact equality (2 tests) Differential Revision: D97775614
64024ac to
621ca43
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Problem
CUDA RNG produces different random sequences on different GPU architectures (e.g. V100 vs A100 vs H100) even with the same seed set via
torch.manual_seed(). This causes stochastic gate CUDA tests to be flaky in CI — the same test passes on one GPU type but fails on another because expected values were hardcoded for a specific architecture's RNG output.Additionally,
test_p_norm_decayuses exactassert ==for floating-point tensor comparison, which fails on GPU due to floating-point precision differences.Solution
CPU-seeded RNG approach: In CUDA test subclasses, patch
_sample_gate_valuesto generate random noise on CPU (wheretorch.manual_seedis deterministic across all hardware) and then move the tensor to the GPU device. This keeps the full training codepath exercised (noise + mu → clamp → gather → multiply) while ensuring cross-architecture determinism.For
LazyGaussianStochasticGates, bothinitialize_parameters(mu initialization) and_sample_gate_values(noise sampling) happen on-device after.to(cuda), so both are patched to use CPU RNG.Since CUDA tests now produce identical values to CPU tests, the
if cpu / elif cudabranches in base test files are removed, along with associatedpyre-fixme[61]comments.For
test_p_norm_decay, exactassert ==is replaced withassertTensorAlmostEqualwithdelta=0.01tolerance.Files Changed
test_gaussian_stochastic_gates_cuda.py: Patch_sample_gate_valueswith CPU-seedednormal_()samplingtest_kuma_stochastic_gates_cuda.py: Patch_sample_gate_valueswith CPU-seededuniform_()sampling + Kumaraswamy transformtest_lazy_gaussian_stochastic_gates_cuda.py: Patch bothinitialize_parametersand_sample_gate_valuestest_gaussian_stochastic_gates.py: Remove cpu/cuda branches (4 tests)test_kuma_stochastic_gates.py: Remove cpu/cuda branches (4 tests)test_lazy_gaussian_stochastic_gates.py: Remove cpu/cuda branches (12 tests)test_p_norm_decay.py: UseassertTensorAlmostEqualinstead of exact equality (2 tests)Differential Revision: D97775614