Pulse · pytorch/pytorch · GitHub

January 17, 2025 – January 24, 2025

Overview

195 Active pull requests

293 Active issues

2 Pull requests merged by 2 people

Fix staging for CPU tensors in OSS DCP async_save
#145408 merged Jan 23, 2025
Prevent legacy_load when weights_only=True (correctly)
#145111 merged Jan 17, 2025

193 Pull requests opened by 122 people

Revert "Fix for MSVC problem on Windows Arm64 (#136765)"
#145076 opened Jan 17, 2025
Remove FFT from stride incorrect ops
#145080 opened Jan 17, 2025
partitioner: avoid inserting duplicates into heap
#145082 opened Jan 17, 2025
cpp_wrapper: Move #includes to per-device header files
#145083 opened Jan 17, 2025
`torch.distributions`: replace `numbers.Number` with `torch.types.Number`.
#145086 opened Jan 17, 2025
[POC] Extend torch function support to ALL arguments, not just scalar type (but not insides of list)
#145089 opened Jan 17, 2025
Test
#145090 opened Jan 17, 2025
cpp_wrapper/aot_inductor: handle conjugation and negation dispatch keys
#145095 opened Jan 17, 2025
Use STL string_view header
#145098 opened Jan 17, 2025
Maintain multiple configs
#145103 opened Jan 17, 2025
futher scheduler changes for invoke_quant: prologue low prec, (slightly) more aggressive fusion
#145104 opened Jan 17, 2025
WIP remove -E workaround for nvcc
#145116 opened Jan 17, 2025
[EXPERIMENTAL][dynamo] optimize `DictGetItemGuardAccessor`
#145117 opened Jan 17, 2025
WIP sccache simplified
#145119 opened Jan 17, 2025
[triton] Update triton pin to include warp specialization support
#145120 opened Jan 17, 2025
inductor: Don't throw an internal error when a nn.module is missing a attribute
#145122 opened Jan 17, 2025
Repro collective timeout and FR dump
#145125 opened Jan 18, 2025
test trigger dispatch
#145126 opened Jan 18, 2025
[executorch hash update] update the pinned executorch hash
#145128 opened Jan 18, 2025
[cuBLAS][cuBLASLt] Unify `cuBLASLt` workspaces with `cuBLAS` workspaces
#145130 opened Jan 18, 2025
[dynamo] Log guard latency
#145132 opened Jan 18, 2025
[inductor] [bug fix] Fix `conv` on processing uint
#145136 opened Jan 18, 2025
improve perf for layer_norm
#145146 opened Jan 18, 2025
[BE][Easy] increase pip timeout for nightly tool: 15s -> 60s
#145147 opened Jan 18, 2025
[BE][PYFMT] bump `ruff format` target version to py39: add parentheses around long `with`-statements
#145148 opened Jan 18, 2025
[inductor] Simplify _inductor/utils.py slightly
#145150 opened Jan 18, 2025
[BE]: Apply ruff PERF401 to torch
#145153 opened Jan 18, 2025
Make `inductor_utils.requires_gpu` accept MPS
#145156 opened Jan 18, 2025
[BE]: Update NCCL submodule to 2.24.3
#145167 opened Jan 19, 2025
Added weight to MSELoss Criterion
#145169 opened Jan 19, 2025
[BE]: Update CUTLASS submodule to 3.7.0
#145172 opened Jan 19, 2025
[scan] scan dim handling in user-facing scan()
#145179 opened Jan 19, 2025
Added torch check to ensure indices are not empty
#145180 opened Jan 19, 2025
Add transpose support for CppMicroGemmFP32Vec
#145194 opened Jan 20, 2025
Guard size oblivious within empty_tensor_restride_symint
#145196 opened Jan 20, 2025
Use std::string_view in get_fully_qualified_type_name
#145197 opened Jan 20, 2025
CI test: TestAutograd.test_gradcheck_nondeterministic
#145205 opened Jan 20, 2025
Update slow tests
#145206 opened Jan 20, 2025
Fix incorrect citation of authors in documentation
#145209 opened Jan 20, 2025
solve apl dependency issue
#145215 opened Jan 20, 2025
Refactoring Distributed test cases to be device agnostic [1/n]
#145222 opened Jan 20, 2025
Raise MutationError if there are side effects when returning generator
#145223 opened Jan 20, 2025
update sympy version 1.13.3 in setup.py (previously update only in requirement.txt)
#145224 opened Jan 20, 2025
fix test_cublas_workspace_explicit_allocation for gfx12
#145227 opened Jan 20, 2025
Expose the rendezvous keepalive arguments
#145228 opened Jan 20, 2025
Improve typing in torch/types.py
#145237 opened Jan 21, 2025
Improve typing in torch/__init__.py
#145238 opened Jan 21, 2025
Improve typing in torch/_C/__init__.pyi.in
#145239 opened Jan 21, 2025
add grad_output shape check for adaptive_avg_pool2d_backward
#145241 opened Jan 21, 2025
[inductor] Make serialized inductor patterns path configurable instead of using …
#145243 opened Jan 21, 2025
Improve typing by using bool and int
#145244 opened Jan 21, 2025
[Quant][CPU] add a wrapper op in quantized_decomposed for _weight_int4pack_mm_for_cpu
#145245 opened Jan 21, 2025
[Inductor UT] Set input tensors to corresponding device for test case in test_aot_indutor.py
#145248 opened Jan 21, 2025
[Inductor][CPU] Add a lowering pass for _weight_int4pack_mm_for_cpu
#145250 opened Jan 21, 2025
change the test wheel to release wheel when release wheel available
#145252 opened Jan 21, 2025
[TEST] tmp storage with CONSTANTHANDLE
#145254 opened Jan 21, 2025
[ARM] Add test_ops and test_memory_profiler to aarch64 tests
#145260 opened Jan 21, 2025
Fix SEGFAULT when None arg was passed in GraphContext.op(..)
#145265 opened Jan 21, 2025
Improve the caching allocator test for raw alloc
#145269 opened Jan 21, 2025
Remove unnecessary HPUHooksInterface method
#145272 opened Jan 21, 2025
Updates NCCL user buffer registration test for NCCL 2.24.3
#145285 opened Jan 21, 2025
update get start xpu
#145286 opened Jan 21, 2025
[ROCm] miopen benchmark behavior now better aligns with cudnn
#145294 opened Jan 21, 2025
Add unique identifer to bmm thread_mm functions
#145303 opened Jan 21, 2025
[BE] Remove test_ops from FIXME_inductor_dont_reset_dynamo
#145307 opened Jan 21, 2025
[Utilization] post-test-process workflow
#145310 opened Jan 21, 2025
windows builds with VS2022
#145319 opened Jan 21, 2025
inductor: Explicitly test that torch.compile(option=...) does something
#145321 opened Jan 21, 2025
fix a small typo in comments
#145323 opened Jan 21, 2025
Add stft option to align window for center = false
#145324 opened Jan 21, 2025
[utilization] pipeline to create clean db records
#145327 opened Jan 22, 2025
[WIP] [AOTInductor] Use AtenTensorHandle as the constant map's holder.
#145331 opened Jan 22, 2025
PEP585: Missed conversions
#145342 opened Jan 22, 2025
[dtensor][cp] experiment: call flex_attention on DTensor
#145353 opened Jan 22, 2025
ehnace logging statically known by adding size_oblivious(..)
#145354 opened Jan 22, 2025
[WIP] Fix avg_pool crash with negative numbers
#145358 opened Jan 22, 2025
removed check for ConvTranspose3D on MPS
#145366 opened Jan 22, 2025
[ARM] Fix broken tests in test_tensor_creation_ops on AArch64
#145367 opened Jan 22, 2025
Enable C++ API parity tests on AArch64
#145370 opened Jan 22, 2025
[inductor][BE] Enable test_cpu_cpp_wrapper in fbcode
#145373 opened Jan 22, 2025
[torchbench] Increase tolerance for amp only poolformer_m36
#145375 opened Jan 22, 2025
Use AOTI as inductor backend when fullgraph_package is enabled.
#145381 opened Jan 22, 2025
[DO NOT MERGE] pre-merge runs only on MI200 and post-merge runs on both MI300
#145389 opened Jan 22, 2025
Update NJT linear_backward to return non-aliased tensor bias grad
#145399 opened Jan 22, 2025
[auto_functionalized] Support `Tensor(a!)[]?`
#145400 opened Jan 22, 2025
Update OSS nested tensor docs to focus on NJT
#145402 opened Jan 22, 2025
Use guard_size_oblivious in debug tensor writer
#145403 opened Jan 22, 2025
[distributions] Catch inf gradient in beta distribution
#145404 opened Jan 22, 2025
[export][be] Clean up local imports from export [2/n]
#145406 opened Jan 22, 2025
[dynamo][fbcode] Turn on inline_inbuilt_nn_modules
#145407 opened Jan 22, 2025
Add Torchao docs link to Pytorch libraries
#145412 opened Jan 22, 2025
[dynamo][not ready - just for CI] Remove all builtin skiplist
#145415 opened Jan 22, 2025
TopK ROCm Tuning
#145416 opened Jan 22, 2025
[dynamo][hop] test torch.compiling all user-facing HOPs
#145422 opened Jan 22, 2025
Tag storages with offset in file when with FakeTensorMode
#145424 opened Jan 22, 2025
Fix aot inductor intermediate debug printing
#145426 opened Jan 22, 2025
add pt2 callbacks for backward pass
#145427 opened Jan 23, 2025
[ca][hop] test CA on all HOPs
#145429 opened Jan 23, 2025
[dynamo] Re-enable `test_torch_name_rule_map_updated`
#145431 opened Jan 23, 2025
[dynamo] save/restore system random state more carefully
#145435 opened Jan 23, 2025
Advance past fc window for stft center
#145437 opened Jan 23, 2025
[draft_export] add LOC for data-dep error logging
#145443 opened Jan 23, 2025
[Docs] Add clarification for target types in CrossEntropyLoss doc
#145444 opened Jan 23, 2025
Record inputs at time of tracing, constrain to them for triton fn
#145448 opened Jan 23, 2025
Fix incorrect type comparison
#145449 opened Jan 23, 2025
Replace is_same with is_same_v for concise syntax
#145450 opened Jan 23, 2025
Add check that envvar configs are boolean
#145454 opened Jan 23, 2025
Update TorchBench commit to main
#145455 opened Jan 23, 2025
[c10d] Flush file in file recorder
#145458 opened Jan 23, 2025
[AOTInductor] Align behavior between CPU and GPU
#145459 opened Jan 23, 2025
[TEST ONLY] Conv with `oc = 0`
#145462 opened Jan 23, 2025
OpenReg: Refactor impl_registry
#145465 opened Jan 23, 2025
[Intel GPU] Add TORCH_API macro to export symbol NestedTensor_to_mask for libtorch_xpu
#145467 opened Jan 23, 2025
fix test_convolution error when use cudnn.flags
#145474 opened Jan 23, 2025
[dynamo] added support to trace torch.cuda.is_current_stream_capturing
#145475 opened Jan 23, 2025
Adapt Dynamo Tests to HPUs
#145476 opened Jan 23, 2025
Modify enable logic of COLLECTIVE_COMM profiler activity type
#145478 opened Jan 23, 2025
[Dynamo] Fix names collisions with foreach decomps
#145479 opened Jan 23, 2025
simplify torch.utils.cpp_extension.include_paths; use it in cpp_builder
#145480 opened Jan 23, 2025
feat: add SVE dispatch for non-FBGEMM qembeddingbag
#145486 opened Jan 23, 2025
Remove unnecessary "special linking" for `BLAS_LIBRARIES`
#145487 opened Jan 23, 2025
[torchbench] Add meta function for _cudnn_rnn_flatten_weight
#145488 opened Jan 23, 2025
cpp_wrapper: Move #includes to per-device header files
#145490 opened Jan 23, 2025
[BE]: Fix OrderedSet equality oversight
#145492 opened Jan 23, 2025
[inductor] Make triton kernel autotune config defaults backward-compatible
#145494 opened Jan 23, 2025
[compiled_autograd] Rename interface to pyinterface
#145495 opened Jan 23, 2025
Remove truncated normal initialization for 16-bit (and lower) tensors
#145499 opened Jan 23, 2025
[ROCm] Update workflow to use root user instead of jenkins user
#145504 opened Jan 23, 2025
Bump AOTriton to 0.8.2b
#145508 opened Jan 23, 2025
[dynamo][guards] Log guard latency to tlparse
#145509 opened Jan 23, 2025
Add istft option to align window for center = false
#145510 opened Jan 23, 2025
Add missing autoreleasepool around runUniqueGraph to prevent leaks
#145512 opened Jan 23, 2025
[WIP][inductor][5/N] triton support post-#5512, fix 1 and None handling
#145515 opened Jan 23, 2025
[ROCM MI300 skips for flaky unit tests
#145518 opened Jan 23, 2025
Fix allow_mutation_on_saved_tensors for inplace foreach
#145520 opened Jan 23, 2025
General Changes for multi accelerators
#145521 opened Jan 23, 2025
inductor.config.descriptive_names = False is not actually supported
#145523 opened Jan 23, 2025
Fix IdentationError of code example
#145525 opened Jan 23, 2025
[MPS] Add bilineard2d_aa implementation
#145526 opened Jan 23, 2025
fix intermediate debug information with cpp_wrapper
#145527 opened Jan 23, 2025
[utils] add try_import method for importing optional modules
#145528 opened Jan 23, 2025
Work around buggy use_const_ref_for_mutable_tensors
#145530 opened Jan 23, 2025
Disable slow gradcheck for nn.Transformer ModuleInfo
#145531 opened Jan 23, 2025
Make sure that benchmark_harness is set before running
#145532 opened Jan 23, 2025
Remove det_singular OpInfo
#145533 opened Jan 23, 2025
Increase the number of perf benchmark shards
#145534 opened Jan 23, 2025
Removes threadfence from topk kernel to improve AMD performance
#145536 opened Jan 23, 2025
[dynamo] Properly model torch profiler context objects
#145537 opened Jan 23, 2025
Make sure not using cpp wrapper when setting nvtx training annotation
#145538 opened Jan 23, 2025
Add accuracy issue support in AOTI Minifier
#145539 opened Jan 23, 2025
[BE][hop] make it easier to use speculate_subgraph
#145540 opened Jan 23, 2025
[BE] Type annotate wrapper_benchmark.py and cuda_combined_scheduling.py
#145542 opened Jan 23, 2025
Testing #144594
#145546 opened Jan 23, 2025
[dynamo][refactor] Move collections.namedtuple out of SkipFunctionVariable
#145547 opened Jan 23, 2025
fix unbacked + view incorrectness
#145548 opened Jan 23, 2025
[ca] add test_reset for 2.6 release validation
#145549 opened Jan 23, 2025
If mypy fails it should report the error back to lintrunner
#145550 opened Jan 23, 2025
Remove incorrect BuiltinVariable.call_hasattr()
#145551 opened Jan 23, 2025
Turn on mypy for _dynamo/variables/builtin.py
#145552 opened Jan 23, 2025
Fix call to create_load_global
#145553 opened Jan 23, 2025
Fix dynamo use of `list[int]` in graph break
#145554 opened Jan 23, 2025
Add CUDA 12.8 installation and Linux CD Docker images
#145557 opened Jan 23, 2025
[dynamo][builtin-skipfile-cleanup] Support tuple.__new__
#145558 opened Jan 23, 2025
[dynamo][builtin-skipfile-cleanup] Remove collections
#145559 opened Jan 23, 2025
[Not for land] hacking up mx
#145562 opened Jan 24, 2025
Refactor fuzzer and add support for Dynamo
#145565 opened Jan 24, 2025
Advance docker release latest verison to cuda 12.4
#145566 opened Jan 24, 2025
Add CUDA 12.8 installation and manylinux-cuda12.8
#145567 opened Jan 24, 2025
[mps] Hoist erfinv logic out of the kernel in preparation for moving.
#145568 opened Jan 24, 2025
[aotinductor] update unbacked symint runtime assertion msg
#145569 opened Jan 24, 2025
[BE] mv test/inductor_skips/* to test/inductor_expected_failures/
#145572 opened Jan 24, 2025
[inductor/profiler] add kernel kwargs instrumentation
#145573 opened Jan 24, 2025
[inductor][3/N] triton support post-#5512, tt.divisibility format
#145575 opened Jan 24, 2025
[Inductor][CPP] fix torch logit decomposition
#145576 opened Jan 24, 2025
[inductor] Fix duplicate detection in _dynamic_scale_rblock
#145577 opened Jan 24, 2025
Spruce up docs for emulate_precision_casts
#145579 opened Jan 24, 2025
[MPS][BE] Implement bilineard2d as shader
#145581 opened Jan 24, 2025
[inductor][4/N] triton support post-#5512, fix constexpr signatures
#145583 opened Jan 24, 2025
[ROCm] Eliminate the need for divisions in layernorm for default vector size
#145584 opened Jan 24, 2025
[Custom Ops] Add a new API to allow users to register an autocast for the custom op
#145588 opened Jan 24, 2025
Replace decorators in UTs to cover additional devices
#145589 opened Jan 24, 2025
[dynamo][benchmarks] Stop benchmarking compile time of dead code
#145590 opened Jan 24, 2025
[CCA] remove TODO for hardware_destructive_interference_size
#145591 opened Jan 24, 2025
Fix constants with non-functional operators
#145593 opened Jan 24, 2025
[micro_pipeline_tp] add logging for all-gather-matmul fusion
#145594 opened Jan 24, 2025
[micro_pipeline_tp] support pattern matching row-wise scaled_mm with sharded scale
#145595 opened Jan 24, 2025
Add `torch._foreach_copy_` doc
#145597 opened Jan 24, 2025
[NFC] Fix some minor typos.
#145599 opened Jan 24, 2025
Add device support for chunk_cat, all_gather_copy_in, and split_with_…
#145600 opened Jan 24, 2025
[ATen][CUDA][Transformers] Add Blackwell support to SDPA
#145602 opened Jan 24, 2025
[dynamo] refactor dynamo__custom_eval_frame to C++, refactor SKIP_CODE[_RECURSIVE]
#145603 opened Jan 24, 2025
WIP error_prop sc
#145605 opened Jan 24, 2025
[BE][CI] bump ruff to 0.9.3
#145606 opened Jan 24, 2025

148 Issues closed by 54 people

DISABLED test_arange_dynamic_cuda (__main__.TestInductorDynamicCUDA)
#127067 closed Jan 24, 2025
torch.sin/cos/tan+torch.floor/round may bring wrong results with torch.compile
#145466 closed Jan 24, 2025
torch crashes on ubuntu:24.04 during SDPA-CuDNN test
#145580 closed Jan 24, 2025
Error: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate more than 1EB memory
#145369 closed Jan 24, 2025
Inference super slow with torchvision model fasterrcnn_mobilenet_v3_large_fpn
#145032 closed Jan 24, 2025
DISABLED test_bw_decoding_fails_float16 (__main__.TestFlexDecoding)
#141761 closed Jan 24, 2025
Int8 Inference Slowdown Comparing to FP32 when using PyTorch 2 Export Quantization with X86 Backend through Inductor
#144434 closed Jan 24, 2025
lerp_ doesn't correctly type promote
#140601 closed Jan 24, 2025
HOP input mutation analysis is not comprehensive
#137639 closed Jan 23, 2025
CUDA memory leak in model_container::run_const_fold
#126059 closed Jan 23, 2025
Flexattention: ValueError: Shape element 1 must be a power of 2
#133321 closed Jan 23, 2025
DISABLED test_fn_grad_linalg_det_singular_cuda_complex128 (__main__.TestBwdGradientsCUDA)
#93044 closed Jan 23, 2025
DISABLED test_forward_mode_AD_linalg_det_singular_cuda_complex128 (__main__.TestFwdGradientsCUDA)
#93045 closed Jan 23, 2025
qmul.cpp:34:10: error: redefinition of 'xnn_binary_params' 34 | struct xnn_binary_params { | ^
#145497 closed Jan 23, 2025
Async distributed checkpointing works incorrectly with tensors on CPU
#144657 closed Jan 23, 2025
cannot pickle 'torch._C._aoti.AOTIModelPackageLoader' object
#145411 closed Jan 23, 2025
DISABLED test_trigger_bisect_on_error (__main__.ExcTests)
#131303 closed Jan 23, 2025
PR #89436 looks like it causes or enables a memory leak
#90464 closed Jan 23, 2025
"Unknown builtin op" error during jit.load() of TorchScript module with @custom_op
#143773 closed Jan 23, 2025
Memory Leak in MPS Backend During LSTM Iterations (Out of Memory Error)
#145374 closed Jan 23, 2025
cloning third_party/kleidiai fails
#145273 closed Jan 23, 2025
DISABLED test_unbacked_bindings_for_divisible_u_symint_training_ir_to_decomp_non_strict (__main__.TrainingIRToRunDecompExportNonStrictTestExport)
#138582 closed Jan 23, 2025
DISABLED test_unbacked_bindings_for_divisible_u_symint (__main__.TestExport)
#138586 closed Jan 23, 2025
DISABLED test_unbacked_bindings_for_divisible_u_symint_training_ir_to_decomp (__main__.TrainingIRToRunDecompExportTestExport)
#138583 closed Jan 23, 2025
DISABLED test_unbacked_bindings_for_divisible_u_symint_non_strict (__main__.NonStrictExportTestExport)
#138585 closed Jan 23, 2025
DISABLED test_unbacked_bindings_for_divisible_u_symint_retraceability (__main__.RetraceExportTestExport)
#138584 closed Jan 23, 2025
DISABLED test_unbacked_bindings_for_divisible_u_symint_retraceability_non_strict (__main__.RetraceExportNonStrictTestExport)
#138676 closed Jan 23, 2025
DISABLED test_slice_with_floordiv_retraceability_non_strict (__main__.RetraceExportNonStrictTestExport)
#138675 closed Jan 23, 2025
DISABLED test_slice_with_floordiv_training_ir_to_decomp_non_strict (__main__.TrainingIRToRunDecompExportNonStrictTestExport)
#131136 closed Jan 23, 2025
DISABLED test_slice_with_floordiv_non_strict (__main__.NonStrictExportTestExport)
#131088 closed Jan 23, 2025
DISABLED test_slice_with_floordiv_retraceability (__main__.RetraceExportTestExport)
#131083 closed Jan 23, 2025
[inductor] [cuda] [silence] `F.gumbel_softmax` return inconsistent resutls compared with eager
#145470 closed Jan 23, 2025
Failed to export the model to ONNX
#144750 closed Jan 23, 2025
DISABLED test_slice_with_floordiv_training_ir_to_decomp (__main__.TrainingIRToRunDecompExportTestExport)
#131082 closed Jan 23, 2025
DISABLED test_slice_with_floordiv_serdes_non_strict (__main__.SerDesExportNonStrictTestExport)
#138884 closed Jan 23, 2025
DISABLED test_slice_with_floordiv_serdes (__main__.SerDesExportTestExport)
#131119 closed Jan 23, 2025
DISABLED test_slice_with_floordiv (__main__.TestExport)
#131101 closed Jan 23, 2025
Link to `third_party/eigen` git submodule is broken
#145496 closed Jan 23, 2025
[Tracking Issue] Mixed precision does not work with ignored modules
#90318 closed Jan 23, 2025
torch.jit.trace wrong function mapping: > maps to aten::lt
#145485 closed Jan 23, 2025
libtorch_python.dylib not getting symlinked correctly in OSX 13 with pytorch-cpu
#145469 closed Jan 23, 2025
torch.backends.cudnn.flags use error when test
#145472 closed Jan 23, 2025
Non_blocking copy behavior on non-cuda/non-privateuse1 accelerator might be unexpected
#143641 closed Jan 23, 2025
DISABLED test_device_mode_ops_sparse_sampled_addmm_cpu_float32 (__main__.TestDeviceUtilsCPU)
#132720 closed Jan 23, 2025
torch._neg_view correctness
#145428 closed Jan 23, 2025
[RFC] Improve performance for softmax op for cuda in some specific size
#144645 closed Jan 23, 2025
DISABLED test_autograd_function_backed_op (__main__.TestCustomOpWithCompiledAutograd)
#121342 closed Jan 22, 2025
DISABLED test_aot_sequence_nr (__main__.DynamicShapesAotAutogradFallbackTests)
#106440 closed Jan 22, 2025
DISABLED test_no_grad_copy (__main__.TestAutograd)
#139734 closed Jan 22, 2025
Footgun: tracer.root.register_module( in HOPs
#140760 closed Jan 22, 2025
DISABLED test_second_order_accurate (__main__.TestGradient)
#116746 closed Jan 22, 2025
No period in docstring of torch.compiler.disable
#145365 closed Jan 22, 2025
DISABLED test_large_weight_non_abi_compatible_cuda (__main__.AOTInductorTestNonABICompatibleCuda)
#127068 closed Jan 22, 2025
DISABLED test_large_mmaped_weights_non_abi_compatible_cuda (__main__.AOTInductorTestNonABICompatibleCuda)
#127202 closed Jan 22, 2025
DISABLED test_cpp_frontend_module_has_same_output_as_python (__main__.TestCppExtensionJIT)
#116105 closed Jan 22, 2025
[aarch64] multiple inductor test failures related to vec128_bfloat16
#144818 closed Jan 22, 2025
[Inductor][GPU] Input is padded with incorrect value when executing `torch.nn.functional.pad` on gpu
#144462 closed Jan 22, 2025
[inductor][gpu] torch.fft.fft outputs incorrect results when `n>1`
#143719 closed Jan 22, 2025
[inductor][gpu] torch.nn.functional.avg_pool1d outputs incorrect result when input.numel() is 1
#143720 closed Jan 22, 2025
Release Pyotrch version 2.6.0 in pypi
#145142 closed Jan 22, 2025
[Device] `ConvTranspose` bahaves differently on CPU and CUDA when `out_channels=0`
#142466 closed Jan 22, 2025
DISABLED test_fs_preserve_sharing (__main__.TestMultiprocessing)
#91467 closed Jan 22, 2025
DISABLED test_min_cut_partitioner_recomputable_ops (__main__.TestPartitioning)
#104327 closed Jan 22, 2025
make latexpdf
#145221 closed Jan 22, 2025
DISABLED test_cpp_extension_recommends_custom_ops_dynamic_shapes (__main__.DynamicShapesMiscTests)
#127813 closed Jan 22, 2025
DISABLED test_fake_crossref_backward_no_amp_index_fill_cuda_float32 (__main__.TestFakeTensorCUDA)
#99126 closed Jan 22, 2025
Dtype available for `torch.optim.Adam` and `torch.optim.AdamW` when `fused=True` is different from described
#145282 closed Jan 22, 2025
Optimizer state cannot get offloaded to CPU
#144397 closed Jan 22, 2025
When calling a custom function of a LlamaForCausalLM using FSDP causes RuntimeError
#145281 closed Jan 22, 2025
XPU builds validations
#145290 closed Jan 22, 2025
DISABLED test_open_device_registration (__main__.TestCppExtensionOpenRgistration)
#100152 closed Jan 22, 2025
DISABLED test_basic (__main__.TestPythonDispatch)
#145096 closed Jan 22, 2025
DISABLED test_max_autotune_remote_caching_dynamic_False (__main__.TestMaxAutotuneRemoteCache)
#145360 closed Jan 22, 2025
loss.backward() breaking somewhere when modulating a nested tensor using scale and shift (RuntimeError: Function AddBackward0 returned an invalid gradient at index 0 - got [1, 4, 64] but expected shape compatible with [4, 1, 64])
#145256 closed Jan 22, 2025
DISABLED test_comprehensive_fft_ifft_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#127344 closed Jan 22, 2025
nn.Embedding backwards pass for nested tensors
#145257 closed Jan 22, 2025
DISABLED test_aot_module_simplified_fake_tensor_gm_raises (__main__.TestAOTModuleSimplified)
#124590 closed Jan 22, 2025
Dynamo graph break on PEP585 generic types
#145226 closed Jan 22, 2025
trace.save_real_tensors segfaults on resnet
#143524 closed Jan 21, 2025
Investigate potential cost savings for inductor workflows
#138476 closed Jan 21, 2025
DISABLED test_identity_float32 (__main__.TestTemplatedSDPA)
#124659 closed Jan 21, 2025
Accessing secrets variables in CI
#144853 closed Jan 21, 2025
[CD] Nightly Release Linux Manywheel builds add size check
#137362 closed Jan 21, 2025
Flakybot fails to fetch test ownership information
#144964 closed Jan 21, 2025
Binaries Python 3.13t failing linux-aarch64-binary-manywheel and linux-binary-manywheel
#145234 closed Jan 21, 2025
DISABLED test_alibi_causal_float32 (__main__.TestTemplatedSDPA)
#124588 closed Jan 21, 2025
DISABLED test_alibi_bias_float32 (__main__.TestTemplatedSDPA)
#124526 closed Jan 21, 2025
torch/_prims/executor.py #TODO : caching
#145171 closed Jan 21, 2025
isin prevents dynamic shapes in modules
#142507 closed Jan 21, 2025
DISABLED test_autograd_in_attr (__main__.TestPythonDispatch)
#145068 closed Jan 21, 2025
DISABLED test_returning_symint (__main__.TestPythonRegistration)
#144920 closed Jan 21, 2025
DISABLED test_register_functional_op_multiple_returns (__main__.TestPythonRegistration)
#142807 closed Jan 21, 2025
DISABLED test_override_aten_ops_with_multiple_libraries (__main__.TestPythonRegistration)
#142460 closed Jan 21, 2025
DISABLED test_register_fallthrough (__main__.TestPythonRegistration)
#142494 closed Jan 21, 2025
DISABLED test_override_cuda_with_jiterator (__main__.TestPythonRegistration)
#142495 closed Jan 21, 2025
DISABLED test_register_functional_op_with_optional (__main__.TestPythonRegistration)
#117871 closed Jan 21, 2025
DISABLED test_register_functional_op_no_returns (__main__.TestPythonRegistration)
#117834 closed Jan 21, 2025
Performance regression when using @torch.compile compared to no compilation
#144822 closed Jan 21, 2025
F.scaled_dot_product_attention get query @ key
#145276 closed Jan 21, 2025
`torch.compile` may produce wrong result with `BicubicInterp+Neg+Linear+Tan`.
#145264 closed Jan 21, 2025
`torch.compile` may produce wrong result with `torch.nn.functional.interpolate`.
#145268 closed Jan 21, 2025
Calculation Results Become NaN After Using `torch.compile` with `Matmul+Concat4+Mul+Linear+Tan`.
#145266 closed Jan 21, 2025
A confusion about Bidirectional GRU
#145073 closed Jan 21, 2025
CPU-only PyTorch on M1 MacBook always gets "RuntimeError: Placeholder storage has not been allocated on MPS device!"
#145229 closed Jan 21, 2025
Exporting a model with dynamic axes and dynamo fails with `TypeError: unhashable type: 'list'`
#144860 closed Jan 21, 2025
Cannot build static windows libraries
#111905 closed Jan 21, 2025
Investigate CUDA enabled build-time difference between MSVC and GCC+WSL
#91623 closed Jan 21, 2025
DISABLED test_out_of_order_index_ds (__main__.TestOutOfOrderDataLoader)
#142343 closed Jan 21, 2025
BUG: torch.exp for complex types on Linux chokes in some cases
#136063 closed Jan 21, 2025
torch.asin returns incorrect value with complex input on cpu
#138327 closed Jan 21, 2025
Floating point exception (core dumped) in `thnn_conv2d`
#143489 closed Jan 21, 2025
[XPU] Nightly binary builds for XPU Linux and Windows are failing since 01.11.2025
#144967 closed Jan 20, 2025
[Perf] Flash-Attn Bwd slow down w/ cutlass 3.6.0 in General
#144729 closed Jan 20, 2025
When using `torch.jit.trace` with `Linear+MaxPool2d+BatchNorm2d`, different results are observed.
#145207 closed Jan 20, 2025
Inconsistent results between CPU and GPU for many operators with complex inputs containing `Inf`
#141487 closed Jan 20, 2025
torch.nn.functional.normalize producing nan values with a large p value and tensor of complex numbers
#135428 closed Jan 20, 2025
`1/torch.inf` produce inconsistent results
#106845 closed Jan 20, 2025
torch.sigmoid producing nan for tensor of negative complex numbers on cpu
#135777 closed Jan 20, 2025
[complex] torch.{exp}: does not match numpy
#48010 closed Jan 20, 2025
DISABLED test_mm_concat_cuda (__main__.FreezingGpuTests)
#145185 closed Jan 20, 2025
torch.onnx.export failed with Process finished with exit code 136 (interrupted by signal 8:SIGFPE)
#144144 closed Jan 20, 2025
[XPU] Keep going jobs of `ciflow/xpu` when case fist failed.
#145048 closed Jan 20, 2025
DISABLED test_comprehensive_fft_fft_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#122715 closed Jan 20, 2025
unexpected behaviour of `torch.chunk`
#145026 closed Jan 19, 2025
[DCP] BUG: FsspecWriter calls os.fsync on .finish(), therefore program crashes on checkpoint save
#144752 closed Jan 19, 2025
massive number of runtime asserts can hamper compile times
#144792 closed Jan 18, 2025
DISABLED test_integers_t1_uint8_np_longlong (__main__.TestArrayFromScalar)
#145135 closed Jan 18, 2025
DISABLED test_dtype_passthrough_dtype_complex128 (__main__.TestDLPack)
#145134 closed Jan 18, 2025
Noisy warning - torch.fx.experimental.symbolic_shapes: [WARNING] Ignored guard (...), this could result in accuracy problems
#101265 closed Jan 18, 2025
DISABLED test_flex_attention (__main__.TestCompiledAutograd)
#144912 closed Jan 18, 2025
DISABLED test_register_functional_op_one_return (__main__.TestPythonRegistration)
#117816 closed Jan 18, 2025
[Inductor] Test failure in test_comprehensive_nn_functional_max_pool2d_cuda
#131072 closed Jan 17, 2025
[AOTI] AOTI doesn't work well with torch.select
#132360 closed Jan 17, 2025
"index_cuda" not implemented for 'Float8_e4m3fn'
#133605 closed Jan 17, 2025
eps in layernorm.cpp causes a numerical transformation
#140092 closed Jan 17, 2025
[Compiled_autograd] running deepspeed Zero3 failed for torch.compile with compiled_autograd
#141646 closed Jan 17, 2025
DISABLED test_autograd_cpp_node_data_dependent (__main__.TestCompiledAutograd)
#125579 closed Jan 17, 2025
DISABLED test_autograd_cpp_node_saved (__main__.TestCompiledAutograd)
#131103 closed Jan 17, 2025
DISABLED test_autograd_cpp_node_saved_float (__main__.TestCompiledAutograd)
#133197 closed Jan 17, 2025
DISABLED test_autograd_cpp_node_saved_int (__main__.TestCompiledAutograd)
#133283 closed Jan 17, 2025
DISABLED test_non_traceable_autograd_cpp_node (__main__.TestCompiledAutograd)
#134738 closed Jan 17, 2025
DISABLED test_autograd_cpp_node_saved_dynamic (__main__.TestCompiledAutograd)
#135685 closed Jan 17, 2025
`unbind_copy` gives unexpected results on 1-dimensional inputs, or 0-dimensional outputs
#130829 closed Jan 17, 2025
[inductor][cpu]pyhpc_isoneutral_mixing performance regression in 2024-07-30 nightly release
#132281 closed Jan 17, 2025
[XPU] unrecognized device for new_qtensor: xpu:0
#144848 closed Jan 17, 2025
torch.select could not guard on data-dependent expression error
#143249 closed Jan 17, 2025
> if graph capture is thread local
#137844 closed Jan 17, 2025
non-strict export doesn't work with nn.Sequential slicing
#137455 closed Jan 17, 2025

145 Issues opened by 85 people

`torch.ops.aten.copy` causes SIGSEGV when handling sparse CSR tensors with invalid metadata
#145604 opened Jan 24, 2025
add scalar inputs with out causes error in torch.compile
#145598 opened Jan 24, 2025
[Dynamo] compile torch.logit with different data types
#145596 opened Jan 24, 2025
_pickle.UnpicklingError: invalid load key, ''.
#145592 opened Jan 24, 2025
[dynamo] mark_dynamic not working as intended with input shapes
#145587 opened Jan 24, 2025
[compile / strict export] torch._dynamo.exc.Unsupported: CollectiveFunctionRewriteVariable can't support async_op=True for <function all_reduce at 0x7f40be5724d0>
#145574 opened Jan 24, 2025
[BE] Automate update stable_cuda version so that we can set it when introducing new cuda version
#145571 opened Jan 24, 2025
Enable CUDA 12.8.0
#145570 opened Jan 24, 2025
[dynamo] Dynamo doesn't prune dead input cell object
#145564 opened Jan 24, 2025
Unable to build pytorch after #143806
#145563 opened Jan 24, 2025
Confusing as_storage_and_layout(x, want_contiguous=True) behavior
#145561 opened Jan 24, 2025
Error `RuntimeError: CUDA error: no kernel image is available for execution on the device` when doing `!=` operation on Jetson orin agx.
#145560 opened Jan 23, 2025
Docs fonts are bold on Mac, in 2.7
#145556 opened Jan 23, 2025
need to document `FlopCounterMode`
#145555 opened Jan 23, 2025
[RFC] Cuda support matrix for Release 2.7
#145544 opened Jan 23, 2025
Module.to() fail in dynamo when swap_module_params_on_conversion is true
#145529 opened Jan 23, 2025
use_const_ref_for_mutable_tensors doesn't work with out= overloads
#145522 opened Jan 23, 2025
Make debugging flaky tests easier by having relevant logs in one place
#145516 opened Jan 23, 2025
Can't properly implement backward method for custom op in C++ when the op takes List of tensors as argument
#145514 opened Jan 23, 2025
Activation Checkpointing composability with split backward computation
#145511 opened Jan 23, 2025
pip failure when trying to download nightly whl from pytorch.download.org : ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE
#145501 opened Jan 23, 2025
`torch._inductor.aoti_compile_and_package` fails when using dynamic shapes (PyTorch 2.6.0 RC)
#145500 opened Jan 23, 2025
Unexpected behavior of `torch.nn.init.trunc_normal` with bf16 tensors
#145498 opened Jan 23, 2025
Add a lint rule to avoid the word `interface` in C++
#145493 opened Jan 23, 2025
Cannot print symbolic tensors from C++
#145491 opened Jan 23, 2025
OrderedSet is backed by normal Dict, does not check ordering in equality
#145489 opened Jan 23, 2025
[custom ops] [2.7 nightly] custom ops with typing.List breaks when importing annotations from future
#145481 opened Jan 23, 2025
[XPU] torch 2.7.0.dev20250121+xpu Import Error
#145477 opened Jan 23, 2025
torch.backends.cudnn.flags use error when test
#145473 opened Jan 23, 2025
Is there a PyTorch version that can work properly on the Thor platform based on the Blackwell architecture?
#145471 opened Jan 23, 2025
[inductor][torchbench] Unsupported operator issue when running the torch_multimodal_clip model with batch size 4.
#145468 opened Jan 23, 2025
Mark Dynamic does not work for nn module constructor inputs
#145463 opened Jan 23, 2025
Incomplete check of LR as a tensor in Optimizer
#145461 opened Jan 23, 2025
Flex Attention not support score_mod with gradients
#145460 opened Jan 23, 2025
DISABLED test_tensor_subclass_basic (__main__.TestCompiledAutograd)
#145457 opened Jan 23, 2025
torch._dynamo.exc.Unsupported: Graph break due to unsupported builtin torch._C._dynamo.eval_frame.set_eval_frame.
#145456 opened Jan 23, 2025
[CUDA] Illegal Memory Access with `AdaptiveMaxPool2d`
#145453 opened Jan 23, 2025
[Dynamo]while_loop raise an exception
#145451 opened Jan 23, 2025
[dynamo] fix graph break on random.random
#145446 opened Jan 23, 2025
[dynamo] `random.Random` gives wrong result on second call
#145445 opened Jan 23, 2025
seg fault in aot_inductor_package on arm GPU with 2.6.0 RC
#145441 opened Jan 23, 2025
Crash in wrapper_benchmark.py with --profile enabled
#145434 opened Jan 23, 2025
XPU - UserWarning: Failed to initialize XPU devices. when run on Machine without without Intel GPU Driver
#145433 opened Jan 23, 2025
aot inductor intermediate tensor debug printing (setting 2) not working
#145425 opened Jan 22, 2025
torch.compile has different numerics for var_mean
#145401 opened Jan 22, 2025
[EXPORT AOTI] `aoti_compile_and_package` custom_ops dependecies
#145394 opened Jan 22, 2025
create DISABLED issues for specific runner labels
#145388 opened Jan 22, 2025
DISABLED test_view_of_slice_cuda (__main__.TestUnbackedSymintsCUDA)
#145386 opened Jan 22, 2025
Windows Pytorch compiler crash some version of cl.exe. Fix provided
#145383 opened Jan 22, 2025
flaky test issues should close themselves if the test doesn't exist anymore
#145382 opened Jan 22, 2025
torch.logit works incorrectly when input < eps after torch.compile
#145379 opened Jan 22, 2025
Loading weights using `torch.distributed.checkpoint` leads to large loss values
#145378 opened Jan 22, 2025
Inductor autograd raises an error in the second run may because of fx graph cache
#145377 opened Jan 22, 2025
distributed.new_group with backend GLOO hangs when distributed.split_group was called before
#145376 opened Jan 22, 2025
[XPU] torch.nn.functional.pad brings wrong results with torch.compile on Intel GPU
#145372 opened Jan 22, 2025
Set `size` when `is_coalesced` is set in `torch.sparse_coo_tensor()`
#145371 opened Jan 22, 2025
The possible error in the pytorch documentation of RNN.
#145368 opened Jan 22, 2025
DISABLED test_cache_hot_load_device_cuda_bfloat16_dynamic_False (__main__.TestFxGraphCache)
#145364 opened Jan 22, 2025
DISABLED test_cache_load_function_device_cuda_float32_dynamic_False_bundle_triton_True_grad_False (__main__.TestFxGraphCache)
#145363 opened Jan 22, 2025
DISABLED test_max_autotune_remote_caching_dynamic_False (__main__.TestMaxAutotuneRemoteCache)
#145361 opened Jan 22, 2025
DISABLED test_comprehensive_svd_lowrank_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#145362 opened Jan 22, 2025
DISABLED test_linear_and_cel_max_autotune (__main__.InplacePaddingTest)
#145359 opened Jan 22, 2025
FP8: E5M2: The FP8 E5M2 result is not `inf` when casting a FP32 value larger than max normal value of FP8 E5M2 (57344)
#145357 opened Jan 22, 2025
[autograd] inconsistent jvp results
#145356 opened Jan 22, 2025
Missing docs for `torch._foreach_copy_`
#145355 opened Jan 22, 2025
DISABLED test_extern (__main__.NumBytesMetricTests)
#145352 opened Jan 22, 2025
[CUDA] Illegal Memory Access with `ReplicationPad2D`
#145350 opened Jan 22, 2025
[CUDA] Illegal Memory Access with `AdaptiveAvgPool2d`
#145349 opened Jan 22, 2025
DISABLED test_graph_break_inside_ctx_with_side_effects (__main__.ContextlibContextManagerTests)
#145346 opened Jan 22, 2025
DISABLED test_partitioning_with_view (__main__.MinCutPartitioningTests)
#145345 opened Jan 22, 2025
DISABLED test_cat (__main__.NumBytesMetricTests)
#145344 opened Jan 22, 2025
DISABLED test_partitioning_unremat_bw (__main__.MinCutPartitioningTests)
#145343 opened Jan 22, 2025
internal compiler error: in extract_insn when compiling pytorch with xpu with gcc 12
#145340 opened Jan 22, 2025
[libTorch] Model initialization on multi-device is slow. It seems to run sequentially in multi-thread
#145337 opened Jan 22, 2025
DISABLED test_cache_load_function_device_cuda_float32_dynamic_False_bundle_triton_True_grad_True (__main__.TestFxGraphCache)
#145336 opened Jan 22, 2025
DISABLED test_mm_plus_mm (__main__.TestPatternMatcher)
#145335 opened Jan 22, 2025
DISABLED test_reorder_peak_memory (__main__.TestOperatorReorderForPeakMemory)
#145332 opened Jan 22, 2025
DISABLED test_warn_on_invalid_torch_function_standalone_class (__main__.TestTorchFunctionWarning)
#145333 opened Jan 22, 2025
DISABLED test_cache_hot_load_device_cuda_bfloat16_dynamic_False (__main__.AOTAutogradCacheTests)
#145334 opened Jan 22, 2025
[dynamo] Save/restore system random state more carefully
#145329 opened Jan 22, 2025
[inductor][triton] refactor ASTSource.make_ir integration
#145326 opened Jan 21, 2025
Using torch.cond in a model intended for onnx.export(dynamo=True,...) has issues with the functions provided.
#145300 opened Jan 21, 2025
[dynamo] `torch.compile` ICE on using a sourceless unspecialized NN module as branching condition
#145284 opened Jan 21, 2025
Torch Compile edge case with != versus is not
#145277 opened Jan 21, 2025
AttributeError: '_OpNamespace' 'aten' object has no attribute 'momentum'
#145274 opened Jan 21, 2025
`torch.ops.aten.embedding_dense_backward` Crashes with Out-of-Bounds Indices On CPU
#145267 opened Jan 21, 2025
[Pipelining] Problem using `torch.distributed.pipelining` on `Gemma2ForCausalLM`
#145263 opened Jan 21, 2025
Missing create_graph arguments in torch.func apis
#145262 opened Jan 21, 2025
Custom symbolic functions for ONNX export with None args causes SEGFAULT
#145261 opened Jan 21, 2025
No Range Check for `storage_offset` in `as_strided` Function
#145259 opened Jan 21, 2025
Missing Length Check for `reflection_pad3d` `padding` Argument
#145258 opened Jan 21, 2025
PyObject preservation does not prevent weakrefs being cleared by Python garbage collector
#145253 opened Jan 21, 2025
[Break XPU] device type in test_aot_inductor.py is not passed correctly to cpp_builder.
#145247 opened Jan 21, 2025
Expose configurable path instead of using fixed path in the inductor module for serialized pattern generation
#145242 opened Jan 21, 2025
Not using set_num_threads results in very slow .all()
#145233 opened Jan 21, 2025
Flaky Dynamo test: TestAutograd.test_gradcheck_nondeterministic
#145231 opened Jan 21, 2025
The `sympy` dependency spec for pytorch on PyPi wheel is still unchanged.
#145225 opened Jan 20, 2025
Regression in the compilation of the torch.all operation in PyTorch version 2.6.0 compared to 2.5.1
#145220 opened Jan 20, 2025
`torch.compile` may produce wrong result with `Linear+MaxPool2d+BatchNorm2d`.
#145219 opened Jan 20, 2025
getting different results when adding `torch.Tensor` or python number to a DTensor - Is that expected?
#145218 opened Jan 20, 2025
DISABLED test_cache_load_function_device_cuda_bfloat16_dynamic_False_bundle_triton_True_grad_False (__main__.TestFxGraphCache)
#145217 opened Jan 20, 2025
[ARM] - test_quantized_module.py test_lstm_api fails on Aarch64
#145216 opened Jan 20, 2025
Nested tensor support for pointwise matrix multiplication of nested tensor and normal tensor
#145214 opened Jan 20, 2025
Significant precision error from torch.compile
#145213 opened Jan 20, 2025
DISABLED test_remote_cache_load_function_device_cuda_float32_dynamic_False_bundle_triton_False (__main__.TestFxGraphCache)
#145212 opened Jan 20, 2025
DISABLED test_aoti (__main__.TestMemoryPlanning)
#145211 opened Jan 20, 2025
DISABLED test_reorder_peak_memory_lpmf (__main__.TestOperatorReorderForPeakMemory)
#145210 opened Jan 20, 2025
Some FlexAttention learned bias bugs/limitations
#145208 opened Jan 20, 2025
Indexed ^= (XOR in-place) operation doesn't work as expected on MPS backend
#145203 opened Jan 20, 2025
DISABLED test_reuse_kernel_cuda (__main__.AOTInductorTestABICompatibleGpu)
#145193 opened Jan 20, 2025
DISABLED test_mixed_mm (__main__.TestPatternMatcher)
#145192 opened Jan 20, 2025
DISABLED test_remote_cache_load_function_device_cuda_float32_dynamic_False_bundle_triton_True (__main__.TestFxGraphCache)
#145191 opened Jan 20, 2025
DISABLED test_remote_cache_load_function_device_cuda_bfloat16_dynamic_False_bundle_triton_True (__main__.TestFxGraphCache)
#145190 opened Jan 20, 2025
DISABLED test_slice_scatter_reinplace_cuda (__main__.GPUTests)
#145189 opened Jan 20, 2025
DISABLED test_sdpa_rewriter_12_cuda (__main__.SDPAPatternRewriterCudaDynamicTests)
#145188 opened Jan 20, 2025
DISABLED test_sdpa_rewriter_12_cuda (__main__.SDPAPatternRewriterCudaTests)
#145187 opened Jan 20, 2025
DISABLED test_mm_concat_cuda (__main__.FreezingGpuTests)
#145186 opened Jan 20, 2025
DISABLED test_aoti_eager_cache_hit_dynamic_shapes_cuda (__main__.DynamicShapesCodegenGPUTests)
#145184 opened Jan 20, 2025
DISABLED test_reorder_peak_memory_dfs (__main__.TestOperatorReorderForPeakMemory)
#145183 opened Jan 20, 2025
DISABLED test_remote_cache_load_function_device_cuda_bfloat16_dynamic_False_bundle_triton_False (__main__.TestFxGraphCache)
#145182 opened Jan 20, 2025
DISABLED test_cache_load_function_device_cuda_bfloat16_dynamic_False_bundle_triton_True_grad_True (__main__.TestFxGraphCache)
#145181 opened Jan 20, 2025
CUDA initialization error with vLLM 0.5.4 and PyTorch 2.4.0+cu121
#145170 opened Jan 19, 2025
empty_cache does not work for CUDAPluggableAllocator + MemPool
#145168 opened Jan 19, 2025
Pytorch matmul for nested 4D tensors in jagged layout doesn't work
#145158 opened Jan 18, 2025
The latest PyTorch XPU wheel 2.7.0.dev20250117+xpu does not work on Windows
#145155 opened Jan 18, 2025
Driver Allocated Memory grows unrestricted when using torch.unique on MPS device
#145151 opened Jan 18, 2025
[RFC] Improve performance for layer_norm op for cuda with revectorized
#145145 opened Jan 18, 2025
Please add fp16 to MPS devices.
#145144 opened Jan 18, 2025
Bracket indexing not working
#145143 opened Jan 18, 2025
Obey sm_carveout (limit on number of SMs) in inductor persistent kernel
#145115 opened Jan 17, 2025
torch._C._IncludeDispatchKeyGuard is very broken?
#145108 opened Jan 17, 2025
`torch.onnx.export` (dynamo=False) fails with uninformative error when exporting `apply_rotary_pos_emb`/`repeat_interleave`
#145100 opened Jan 17, 2025
Tracking issue: Incorrect Meta Strides / Turn On PyDispatcher in FakeTensor Mode
#145094 opened Jan 17, 2025
Inductor aten.clone lowering ignores Conjugate and Negative dispatch keys
#145093 opened Jan 17, 2025
[torchbench] torch._dynamo.exc.Unsupported: Graph break due to unsupported builtin None.morphologyEx
#145088 opened Jan 17, 2025
partitioner hangs for some long chains of ops with many users
#145081 opened Jan 17, 2025
list comprehension in SkipFiles are always skipped with no way to override
#145079 opened Jan 17, 2025
Negative values in stride causing error in `avg_pool2d` (on both CPU and CUDA)
#145077 opened Jan 17, 2025
AssertionError: increase TRITON_MAX_BLOCK['X'] to 4096 Again!
#145074 opened Jan 17, 2025
Segmentation fault when passing an empty tensor to `_local_scalar_dense`
#145072 opened Jan 17, 2025
Illegal memory access and segmentation fault due to large `storage_offset` in `as_strided`
#145071 opened Jan 17, 2025
Segment fault on CPU and IndexError on CUDA for `_adaptive_avg_pool2d_backward`
#145070 opened Jan 17, 2025
DISABLED test_sparse_add_cuda_complex64 (__main__.TestSparseCSRCUDA)
#145069 opened Jan 17, 2025
SIGSEGV error when passing a 0-sized tensor to `_local_scalar_dense`
#145066 opened Jan 17, 2025
SIGFPE error when passing very large kernel_size to `avg_pool1d`
#145065 opened Jan 17, 2025

430 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[inductor] Kernel memory analysis for use in heuristics
#142026 commented on Jan 24, 2025 • 19 new comments
[Inductor] Unifiy Low Precision FP Legalization for to_dtype_bitcast & constant
#144646 commented on Jan 24, 2025 • 14 new comments
[PGNCCL] Add an API to get the status/error code at the PG level
#144498 commented on Jan 24, 2025 • 12 new comments
Use `typing.IO[bytes]` instead of `io.BytesIO` in annotations
#144994 commented on Jan 24, 2025 • 11 new comments
Update test_c10d_object_collectives.py with DistributedTestBase class
#145056 commented on Jan 24, 2025 • 10 new comments
[inductor] Add type annotations to _inductor/utils.py
#144108 commented on Jan 23, 2025 • 9 new comments
[inductor] [cpp] Support vectorization for score and mask in FlexAttention CPU
#143638 commented on Jan 24, 2025 • 8 new comments
pickler for GraphModule
#141659 commented on Jan 24, 2025 • 8 new comments
Add noncontiguous OpInfo tests for MPS
#142202 commented on Jan 24, 2025 • 6 new comments
[dcp] Add ZStandard transformer
#143360 commented on Jan 24, 2025 • 6 new comments
serde unbacked bindings
#144894 commented on Jan 23, 2025 • 6 new comments
Parallelize epilogue/prologue benchmarking
#143408 commented on Jan 21, 2025 • 5 new comments
[torch.special] Adding betainc, betaincc, betaincinv, betainccinv, betaln and beta with backward operation
#132135 commented on Jan 23, 2025 • 4 new comments
Fix Throughputbenchmark issue
#144669 commented on Jan 24, 2025 • 4 new comments
Align CPU behavior with CUDA for `ConvTranspose` when `out_channels=0`
#142859 commented on Jan 24, 2025 • 4 new comments
Introduce the public API for all_gather_scaled_matmul
#141053 commented on Jan 20, 2025 • 4 new comments
[Intel GPU] qconv_pointwise.binary XPU support
#135189 commented on Jan 23, 2025 • 4 new comments
Add prepacking for linear weights
#139387 commented on Jan 23, 2025 • 4 new comments
[CI] enable operator benchmark on CPU
#143733 commented on Jan 23, 2025 • 4 new comments
[Intel GPU] qlinear at XPU backend
#133307 commented on Jan 23, 2025 • 3 new comments
Add test cases of fp8 datatypes in pt2e
#144388 commented on Jan 24, 2025 • 3 new comments
[Intel CPU] Fix issue #143483.
#144854 commented on Jan 24, 2025 • 3 new comments
[Dynamo][autograd.Function] Relax backward speculation strict mode
#142830 commented on Jan 22, 2025 • 3 new comments
[Intel CPU] Fix issue #143484.
#144950 commented on Jan 24, 2025 • 2 new comments
OpenReg: fix issue of pin_memory
#145046 commented on Jan 24, 2025 • 2 new comments
[Inductor] Fix starvation issue when threads attempt to acquire write…
#144460 commented on Jan 23, 2025 • 2 new comments
add fp8 scaled_mm for XPU
#140972 commented on Jan 20, 2025 • 2 new comments
Add overloads to diagonal docs
#144214 commented on Jan 22, 2025 • 2 new comments
[compiled autograd] Always proxy autograd.Function nodes; handle AOT backwards
#143405 commented on Jan 24, 2025 • 2 new comments
[Do NOT merge] Enable inductor-periodic testing for ROCm on MI300
#144594 commented on Jan 24, 2025 • 2 new comments
Add option to limit number of SMs used by matmul kernels
#144974 commented on Jan 22, 2025 • 2 new comments
[inductor] Fix an aten.squeeze stride computation issue
#143683 commented on Jan 23, 2025 • 1 new comment
Implement cuda graphs implementation of torch.cond and torch.while_loop
#140979 commented on Jan 23, 2025 • 1 new comment
inductor_config_logging: Don't drop keys
#144700 commented on Jan 24, 2025 • 1 new comment
Avoid data-dependent errors in NJT tests via capture_scalar_outputs=True
#144588 commented on Jan 21, 2025 • 1 new comment
[compiled autograd] support Tensor Subclasses in AOTBackward
#144115 commented on Jan 24, 2025 • 1 new comment
[BE] Add stride check in `torch.max_pool1d()`
#144023 commented on Jan 23, 2025 • 1 new comment
Exclude upsample_bilinear2d.vec from default core ATen decomposition table
#141791 commented on Jan 18, 2025 • 1 new comment
Fix a number of flexattention issues (cse, cudagraph, etc.)
#145059 commented on Jan 22, 2025 • 1 new comment
[Inductor changes] Invoke Quant
#139102 commented on Jan 17, 2025 • 1 new comment
[Intel CPU] Fix issue #143482.
#144760 commented on Jan 24, 2025 • 1 new comment
Fix fft jit ops cpu
#143894 commented on Jan 21, 2025 • 1 new comment
Enable CPP Extension Open Registration tests on Arm
#144774 commented on Jan 21, 2025 • 1 new comment
Update ck
#144799 commented on Jan 24, 2025 • 1 new comment
Support narrow() on batch dim for NJT
#142063 commented on Jan 17, 2025 • 1 new comment
Fix flash attention seed/offset overflow when seed/offset larger than int64
#144844 commented on Jan 23, 2025 • 1 new comment
Enable SVE ACLE implementation for tanH Aten op for FP32 dType.
#143741 commented on Jan 21, 2025 • 1 new comment
Support remaining *_like factory functions for NJT
#144889 commented on Jan 22, 2025 • 1 new comment
[Inductor UT] Refactor FlexAttention UT and add CPU tests
#144953 commented on Jan 22, 2025 • 1 new comment
[ATen][CUDA] Implement 128 bit vectorization
#141959 commented on Jan 24, 2025 • 0 new comments
Add AOT inductor support for _scaled_mm for CPU
#141961 commented on Jan 21, 2025 • 0 new comments
Permute test
#140261 commented on Jan 19, 2025 • 0 new comments
support condition branch in ao debug handler
#141302 commented on Jan 21, 2025 • 0 new comments
[sympy] Make solve of Mul for Eq int replacement friendly
#141347 commented on Jan 23, 2025 • 0 new comments
Save models in OCI registry
#141354 commented on Jan 21, 2025 • 0 new comments
Use std::string_view in torchgen
#141735 commented on Jan 17, 2025 • 0 new comments
[Don't merge] test only
#141468 commented on Jan 24, 2025 • 0 new comments
[Store log]Test log struct
#141439 commented on Jan 23, 2025 • 0 new comments
WIP delta graph logging
#141416 commented on Jan 23, 2025 • 0 new comments
[hop] fix unbacked_bindings meta for while_loop
#143559 commented on Jan 24, 2025 • 0 new comments
Fix FSDP hanging
#143540 commented on Jan 23, 2025 • 0 new comments
[hop][inductor] track the dependency on unbacked symbols correctly with constant_args for hops
#143456 commented on Jan 24, 2025 • 0 new comments
[compiled autograd] stop specializing on metadata during initial trace
#143417 commented on Jan 24, 2025 • 0 new comments
[compiled autograd] Proxy nodes for user-defined C++ torch::autograd::Function
#143387 commented on Jan 24, 2025 • 0 new comments
[EXPERIMENTAL][dynamo] Turn on `inline_inbuilt_nn_modules` for fbcode
#143313 commented on Jan 17, 2025 • 0 new comments
[compiled autograd] Proxy a node for CopyBackwards into the graph
#143304 commented on Jan 24, 2025 • 0 new comments
[compiled autograd] Proxy opaque nodes for built-in autograd nodes
#143296 commented on Jan 24, 2025 • 0 new comments
Set proper `LD_LIBRARY_PATH` on Linux in nightly venv in nightly pull tool
#143262 commented on Jan 17, 2025 • 0 new comments
[TorchGen] Simplify argumenttype_type
#143254 commented on Jan 20, 2025 • 0 new comments
[Testing only] Add python cycle detection
#143204 commented on Jan 22, 2025 • 0 new comments
Unify use of `enableCollectiveHashDebug_` and trivial updates
#142865 commented on Jan 18, 2025 • 0 new comments
Fix RMSNorm epsilon value type for BF16 or FP16
#142848 commented on Jan 22, 2025 • 0 new comments
Set `enable_faithful_generator_behavior` flag to True
#142513 commented on Jan 23, 2025 • 0 new comments
parallelize sort
#142391 commented on Jan 23, 2025 • 0 new comments
Fix type annotation of `Linear.bias`
#142326 commented on Jan 23, 2025 • 0 new comments
[inductor] Decide cooperative RSPLIT with same algorithm as split reductions
#142295 commented on Jan 24, 2025 • 0 new comments
[scan] Refactoring of input checking and dynamo invocation
#142125 commented on Jan 23, 2025 • 0 new comments
support condition branch in ao debug handler
#140256 commented on Jan 19, 2025 • 0 new comments
[Environment Variable][7/N] Use thread-safe getenv functions
#140211 commented on Jan 21, 2025 • 0 new comments
[Environment Variable][6/N] Use thread-safe getenv functions
#140200 commented on Jan 19, 2025 • 0 new comments
Cleanup stale Dynamo Feature Flags
#140147 commented on Jan 18, 2025 • 0 new comments
[export] Add custom op profiles and generate meta kernel
#140048 commented on Jan 21, 2025 • 0 new comments
[associative_scan] Lifted arguments
#140043 commented on Jan 23, 2025 • 0 new comments
Add torch._scaled_mm for CPU
#139975 commented on Jan 22, 2025 • 0 new comments
[Don't Review] Test CI
#139971 commented on Jan 21, 2025 • 0 new comments
[associative_scan] scan dim handling in user-facing associative_scan()
#139864 commented on Jan 24, 2025 • 0 new comments
Add Windows Arm64 Nightly Builds
#139760 commented on Jan 23, 2025 • 0 new comments
Add support for loading model `state_dict()`in C++ which are OrderedDicts
#139750 commented on Jan 21, 2025 • 0 new comments
[cuDNN] Add an option to force cuDNN usage (incl. SDPA)
#139699 commented on Jan 23, 2025 • 0 new comments
[export] Serialize draft export report
#139384 commented on Jan 18, 2025 • 0 new comments
Allow BUILD for classes with types built via functions/classes allowed for REDUCE (i.e. not GLOBALs in checkpoint)
#139302 commented on Jan 17, 2025 • 0 new comments
[do not review] saving things for NJT metadata cache
#139247 commented on Jan 20, 2025 • 0 new comments
Use the device interface for detecting Triton availability
#139171 commented on Jan 22, 2025 • 0 new comments
Unify shallow_copy_and_detach overloads by passing c10::VariableVersion
#138999 commented on Jan 24, 2025 • 0 new comments
[c10d] Remove ProcessGroupGloo + CUDA tests
#138998 commented on Jan 23, 2025 • 0 new comments
Tensor .cuda() very slow with specific array sizes
#138964 commented on Jan 21, 2025 • 0 new comments
Fix bug of torch.nn.functional.kl_div when broadcast happened
#138810 commented on Jan 23, 2025 • 0 new comments
[Docs] Optimize parameter description to declare allowed type (3/N)
#138798 commented on Jan 19, 2025 • 0 new comments
Switch back to the default checkout action
#138739 commented on Jan 22, 2025 • 0 new comments
inductor `full_like` decompositions give incorrect strides
#144699 commented on Jan 21, 2025 • 0 new comments
Expose torch.autograd.graph.is_backward_executing
#141276 commented on Jan 21, 2025 • 0 new comments
Fix InductorLower when attribute is shape ()
#141226 commented on Jan 21, 2025 • 0 new comments
[Inductor] be able to disable cache for test
#141195 commented on Jan 24, 2025 • 0 new comments
[CI] Reduce distributed test timeout to 60s
#141168 commented on Jan 21, 2025 • 0 new comments
Specific attribute for device DTensor RNG support indication.
#141141 commented on Jan 20, 2025 • 0 new comments
[WIP][Inductor XPU] Support mkldnn fusion in freezing for XPU.
#141096 commented on Jan 19, 2025 • 0 new comments
[pytree] Save namedtuple fields
#141084 commented on Jan 19, 2025 • 0 new comments
Suport generators
#141055 commented on Jan 23, 2025 • 0 new comments
[EZ] Remove TODO because it already works
#141002 commented on Jan 18, 2025 • 0 new comments
dynamo: Support custom attributes in tensor subclasses
#140978 commented on Jan 21, 2025 • 0 new comments
Add option to split Linear gates for Quantizable LSTM into separate ops
#140868 commented on Jan 21, 2025 • 0 new comments
[aoti] Avoid DCE unbacked symint node
#140858 commented on Jan 18, 2025 • 0 new comments
Fix TORCH_CUDA_ARCH_LIST for SBSA+CUDA build
#140844 commented on Jan 18, 2025 • 0 new comments
Enable CUDA 12.6 OSS CI
#140793 commented on Jan 24, 2025 • 0 new comments
Enable C++ dynamic shape guards by default
#140756 commented on Jan 23, 2025 • 0 new comments
fix torchrec on inductor
#140747 commented on Jan 18, 2025 • 0 new comments
[Intel GPU] Enable fp64 GEMM
#140677 commented on Jan 21, 2025 • 0 new comments
Add boolean conversion support for SymNodeVariable
#140621 commented on Jan 17, 2025 • 0 new comments
add NHWC support to GroupNorm backward pass and optimize NHWC GroupNorm kernels
#140440 commented on Jan 18, 2025 • 0 new comments
Auto SAC - Automated SAC (Selective Activation Checkpointing) Policy Construction and Wrapping
#140410 commented on Jan 22, 2025 • 0 new comments
remove redundant assign
#140399 commented on Jan 24, 2025 • 0 new comments
Fix inconsistent results from integral linspace on MPS
#140371 commented on Jan 18, 2025 • 0 new comments
[WIP] SimpleFSDP prototype frontend changes
#140360 commented on Jan 21, 2025 • 0 new comments
[scan] Refactored testcases
#140321 commented on Jan 18, 2025 • 0 new comments
[Inductor] optimize welford reduction
#145061 commented on Jan 24, 2025 • 0 new comments
optimize the decomposition of aten.native_group_norm
#144733 commented on Jan 21, 2025 • 0 new comments
[Intel GPU] Support SparseCsrXPU codegen
#144722 commented on Jan 18, 2025 • 0 new comments
functional compiled autograd
#144707 commented on Jan 24, 2025 • 0 new comments
Output of nonzero is transposed, fix fake tensor
#144695 commented on Jan 23, 2025 • 0 new comments
Generalize poison fork logic for each device backend
#144664 commented on Jan 17, 2025 • 0 new comments
Fix torch.logsumexp dim description
#144661 commented on Jan 22, 2025 • 0 new comments
[MPS] lu factor ex implementation
#144651 commented on Jan 21, 2025 • 0 new comments
remove Windows XPU build workaround.
#144644 commented on Jan 23, 2025 • 0 new comments
[Not4Land] test `optree` version compatibility
#144642 commented on Jan 17, 2025 • 0 new comments
[inductor] Add features to docstring_linter (see #142496)
#144620 commented on Jan 23, 2025 • 0 new comments
Collect packages with importlib in collect_env
#144616 commented on Jan 24, 2025 • 0 new comments
[device_mesh] improve device selection logic
#144600 commented on Jan 18, 2025 • 0 new comments
Fix DTensorTestBase to barrier with device ids
#144599 commented on Jan 18, 2025 • 0 new comments
Implemented dropout usage for RNN with MIOpen backend
#144572 commented on Jan 23, 2025 • 0 new comments
[BE][CI] bump `ruff` to 0.9.0: string quote styles
#144569 commented on Jan 24, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format`
#144547 commented on Jan 18, 2025 • 0 new comments
[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements
#144546 commented on Jan 24, 2025 • 0 new comments
Fix clang-tidy warnings of performance from uncovered files
#144542 commented on Jan 21, 2025 • 0 new comments
Save integral tensor data for ET
#144508 commented on Jan 24, 2025 • 0 new comments
blocked benchmarking to avoid queue limit
#144507 commented on Jan 18, 2025 • 0 new comments
better overlapping of sleep and memory warmup
#144505 commented on Jan 18, 2025 • 0 new comments
add most basic event packing
#144501 commented on Jan 18, 2025 • 0 new comments
patch for block-wise quantization + pt2e
#144492 commented on Jan 17, 2025 • 0 new comments
prov logging
#145047 commented on Jan 20, 2025 • 0 new comments
[Easy] Replace paper description with link to make a concise description.
#145031 commented on Jan 23, 2025 • 0 new comments
Made partitioning more(?) deterministic
#145024 commented on Jan 23, 2025 • 0 new comments
[CD] Annotate linux/arm64 cuda wheels with consistent nvidia dependencies
#145021 commented on Jan 21, 2025 • 0 new comments
[dynamo/export] call local_scalar_dense when full() value is scalar tensor
#144999 commented on Jan 24, 2025 • 0 new comments
Introduce new template heuristic for triton autotune configs
#144985 commented on Jan 24, 2025 • 0 new comments
[inductor] Fix for pattern file contains 'getitem' fails during impor…
#144980 commented on Jan 23, 2025 • 0 new comments
[test] fix unit test
#144977 commented on Jan 17, 2025 • 0 new comments
Let `tensor_a.new_tensor()` be on `tensor_a.device` by default
#144958 commented on Jan 22, 2025 • 0 new comments
[Dynamo] Allow `format()` to handle int
#144956 commented on Jan 24, 2025 • 0 new comments
Replacing explicit backend search with api call
#144944 commented on Jan 24, 2025 • 0 new comments
[ROCm][TunableOp] Improve selection criteria for fastest solution
#144942 commented on Jan 23, 2025 • 0 new comments
Binary upload checksum
#144887 commented on Jan 23, 2025 • 0 new comments
update guard_size_oblivious comment
#144880 commented on Jan 22, 2025 • 0 new comments
[64-bit] Int64 casting for UpSampleNearest3D
#144865 commented on Jan 23, 2025 • 0 new comments
WIP pp_cp test
#144834 commented on Jan 18, 2025 • 0 new comments
Added swizzle searching, disabled fp16 accum, and enabled ping-pong for cutlass
#144829 commented on Jan 24, 2025 • 0 new comments
Test of RST to MD
#144804 commented on Jan 22, 2025 • 0 new comments
[c10d][NCCL] Implement ncclCommInitRankScalable (merging #136789)
#144794 commented on Jan 24, 2025 • 0 new comments
[caffe2] Use the manifold cache backend as the default
#144773 commented on Jan 24, 2025 • 0 new comments
Unconditional dependency on setuptools
#144763 commented on Jan 18, 2025 • 0 new comments
[Reopen] [Intel GPU] Set higher tolerance for some models only on XPU Device
#144756 commented on Jan 21, 2025 • 0 new comments
[cherry-pick][dtensor] expose the __create_chunk_list__ in the doc (#144100)
#144741 commented on Jan 18, 2025 • 0 new comments
c10::string_view -> std::string_view in torchgen
#144177 commented on Jan 18, 2025 • 0 new comments
[Submodule] Turning flash-attention integration into 3rd party submod
#144120 commented on Jan 23, 2025 • 0 new comments
Avoid overflow in vector_norm for scalar input
#144073 commented on Jan 23, 2025 • 0 new comments
Native channel shuffle floating point exception
#144010 commented on Jan 18, 2025 • 0 new comments
cpp_wrapper: Precompile device-specific header files
#144002 commented on Jan 22, 2025 • 0 new comments
[poc][not-ready-for-review] visualize dynamic shapes shape env mutations over time
#143961 commented on Jan 20, 2025 • 0 new comments
Add ability to skip compute capability checks for Triton
#143956 commented on Jan 17, 2025 • 0 new comments
Fix an unnecessary CPU to GPU copy within flex_attention
#143928 commented on Jan 23, 2025 • 0 new comments
Using acc_t for log_softmax
#143896 commented on Jan 21, 2025 • 0 new comments
Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True
#143880 commented on Jan 24, 2025 • 0 new comments
Remove lexicographical sorting of storage keys in torch.save
#143879 commented on Jan 24, 2025 • 0 new comments
[FlexAttention] make bm creation cuda-graphable
#143872 commented on Jan 23, 2025 • 0 new comments
[1/N]Add Intel GPU Support to Torch Test Cases
#143833 commented on Jan 17, 2025 • 0 new comments
[inductor] Used fixed configs for contiguous reductions
#143812 commented on Jan 24, 2025 • 0 new comments
Enable clang-tidy on torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp
#143806 commented on Jan 24, 2025 • 0 new comments
[don't merge] use vs2022 build windows cpu wheel.
#143791 commented on Jan 23, 2025 • 0 new comments
[Intel GPU] Avoid copy when the input of Matmul is broadcasted
#143784 commented on Jan 17, 2025 • 0 new comments
Modify the tolerance level in TIMM benchmark for XPU PreCI
#143739 commented on Jan 23, 2025 • 0 new comments
nn.MultiheadAttention string representation
#143724 commented on Jan 23, 2025 • 0 new comments
Getattr access for subclasses in pre-dispatch
#143671 commented on Jan 23, 2025 • 0 new comments
Fix the build errors in ONEDNN+BLIS Path
#143642 commented on Jan 17, 2025 • 0 new comments
[triton pin 3.2] Cherry pick additional device context fix
#143622 commented on Jan 23, 2025 • 0 new comments
Add the max_autotune tests in the periodic jobs.
#143560 commented on Jan 21, 2025 • 0 new comments
Introduce cache clearing APIs for the lazy graph executor
#144489 commented on Jan 18, 2025 • 0 new comments
Support Swiglu for Module and functional
#144465 commented on Jan 24, 2025 • 0 new comments
improve WOQ first token performance on CPU
#144463 commented on Jan 23, 2025 • 0 new comments
Support negative values for fill with uint tensors
#144458 commented on Jan 21, 2025 • 0 new comments
[CUDA][cuBLAS] Add fp16 accumulate option to cuBLAS/cuBLASLt
#144441 commented on Jan 24, 2025 • 0 new comments
Implement `generator.throw(exception)`
#144424 commented on Jan 23, 2025 • 0 new comments
Implement `generator.close()`
#144423 commented on Jan 23, 2025 • 0 new comments
Implement `generator.send(..)`
#144422 commented on Jan 23, 2025 • 0 new comments
Implement `generator.__iter__()`
#144421 commented on Jan 23, 2025 • 0 new comments
Add `CLEANUP_THROW` bytecode
#144420 commented on Jan 23, 2025 • 0 new comments
fix a bug for constant_pad_nd
#144394 commented on Jan 23, 2025 • 0 new comments
`torch.linalg.solve`: doc update on dealing with rank-deficient systems which admit a solution
#144390 commented on Jan 19, 2025 • 0 new comments
Fix lowering to inductor IR for triton CPU
#144389 commented on Jan 24, 2025 • 0 new comments
[Intel GPU] fix memory leak in deconv backward
#144385 commented on Jan 17, 2025 • 0 new comments
Filter out iGPU if dGPU is found on XPU
#144378 commented on Jan 24, 2025 • 0 new comments
[Don't Merge] Fix poision child process issue when call getAccelerator()
#144368 commented on Jan 22, 2025 • 0 new comments
implement LazyInductorBenchmarker
#144365 commented on Jan 18, 2025 • 0 new comments
Improve torchrun documentation
#144354 commented on Jan 24, 2025 • 0 new comments
implement pruning for GroupedInductorBenchmarker
#144353 commented on Jan 18, 2025 • 0 new comments
codecache.py: Utilize precompiled headers for CPP python bindings
#144349 commented on Jan 22, 2025 • 0 new comments
codecache: Remove cpp_prefix.h duplication per build, then precompile it
#144293 commented on Jan 22, 2025 • 0 new comments
[inductor] Only call triton.compile in worker processes
#144288 commented on Jan 22, 2025 • 0 new comments
Cholesky mps implementation
#144193 commented on Jan 23, 2025 • 0 new comments
aot_inductor TIMM convit_base inference regression on dashboard
#144772 commented on Jan 21, 2025 • 0 new comments
TorchBench mobilenet_v2 cudagraphs_freezing inference regression
#144891 commented on Jan 21, 2025 • 0 new comments
TIMM cudagraphs_freezing inference regression
#144888 commented on Jan 21, 2025 • 0 new comments
torch.export fails for whisper tiny
#144906 commented on Jan 21, 2025 • 0 new comments
Dynamo is not thread safe
#118260 commented on Jan 21, 2025 • 0 new comments
torch.accelerator.is_available() raise RuntimeError if no available CUDA/XPU devices
#144567 commented on Jan 21, 2025 • 0 new comments
`torch.device(0)` makes CUDA init fail in subprocess since `2.5.0`
#144152 commented on Jan 21, 2025 • 0 new comments
Make flex_attention work if `score_mod`'s output doesn't require gradients at all
#145050 commented on Jan 21, 2025 • 0 new comments
[torchbench] stable_diffusion_unet compilation failure
#144991 commented on Jan 21, 2025 • 0 new comments
CUDAGraph outputs will be overwritten by a subsequent run?
#144961 commented on Jan 21, 2025 • 0 new comments
Inconsistency of `tensor.new_tensor(data)` between eager and dynamo
#144957 commented on Jan 21, 2025 • 0 new comments
UserWarning: cuDNN SDPA backward got grad_output.strides() != output.strides()
#144913 commented on Jan 21, 2025 • 0 new comments
Issue: Illegal Memory Access in Backward Pass of `scaled_dot_product_attention` with Custom Attention Mask
#145040 commented on Jan 21, 2025 • 0 new comments
[compiled autograd] It would be nice if the compiiled autograd graph was actually runnable
#144982 commented on Jan 21, 2025 • 0 new comments
ONNX: Wrong output shape for ceil_mode Pooling
#71549 commented on Jan 21, 2025 • 0 new comments
Numerical error when using torch.nn.functional.pad with a large array on MPS device
#121961 commented on Jan 21, 2025 • 0 new comments
[inductor][cpu]float32 dynamic shape maml_omniglot performance regression in 2025-01-13 nightly release
#144937 commented on Jan 21, 2025 • 0 new comments
[inductor][cpu]amp fp16 llama dynamic shape cpp wrapper performance regression in 2025-01-07 nightly release
#144932 commented on Jan 21, 2025 • 0 new comments
[inductor][cpu] fused attention Inductor tests fails with an error " name 'getitem' is not defined "
#144674 commented on Jan 21, 2025 • 0 new comments
[Inductor] Unify the data type propagation between Triton and CPP Backend
#144246 commented on Jan 21, 2025 • 0 new comments
Add API to detect if activation checkpointing is enabled in the current region or not
#144928 commented on Jan 21, 2025 • 0 new comments
Torch compile cache
#144859 commented on Jan 21, 2025 • 0 new comments
_pickle.UnpicklingError: pickle data was truncated - Windows multiprocessing during training
#69611 commented on Jan 21, 2025 • 0 new comments
Multiple CPU processes using same GPU model for inference
#16943 commented on Jan 21, 2025 • 0 new comments
[RFC] Disable CMake find_library(libm) on Windows, and solve libm conflict to MSVC runtime lib(ucrt.lib).
#141946 commented on Jan 21, 2025 • 0 new comments
DISABLED test_method_overloading (__main__.TestScript)
#131104 commented on Jan 17, 2025 • 0 new comments
DISABLED test_mixed_mm_epi_works (__main__.TestPatternMatcher)
#126489 commented on Jan 22, 2025 • 0 new comments
DISABLED test_batch_linear_post_grad_fusion (__main__.TestPostGradBatchLinearFusion)
#120280 commented on Jan 22, 2025 • 0 new comments
custom gradient for int8
#129889 commented on Jan 22, 2025 • 0 new comments
torch.compile() within TorchDispatchMode always causes an unknown guard failure.
#144787 commented on Jan 22, 2025 • 0 new comments
DISABLED test_sdpa_mask_fp16_L6_S17_NH23_HS121 (__main__.TestSDPA)
#138905 commented on Jan 22, 2025 • 0 new comments
RFC: Dynamically Quantized 4 bit matmul API and usage
#143289 commented on Jan 22, 2025 • 0 new comments
Code fails with "Expected curr_block->next == nullptr to be true, but got false"
#140419 commented on Jan 22, 2025 • 0 new comments
DISABLED test_mixed_mm_bad_cases (__main__.TestPatternMatcher)
#128487 commented on Jan 22, 2025 • 0 new comments
ModuleNotFoundError: No module named 'torch.privateuseone'
#144955 commented on Jan 22, 2025 • 0 new comments
DISABLED test_comprehensive_argsort_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#131158 commented on Jan 22, 2025 • 0 new comments
Adding Infiniband to RDZV Backend for optimal torch run training
#144779 commented on Jan 21, 2025 • 0 new comments
user-defined triton kernels + inductor stride re-ordering can lead to silent incorrectness
#130243 commented on Jan 21, 2025 • 0 new comments
asynchronous copies from accelerator to cpu: what should be the expected behaviour?
#140296 commented on Jan 21, 2025 • 0 new comments
`torch._foreach_mul` does not support autograd
#144580 commented on Jan 21, 2025 • 0 new comments
Connection Limitation in PyTorch Distributed (Vanilla) with c10d Rendezvous Backend
#144856 commented on Jan 21, 2025 • 0 new comments
CheckpointError with `torch.distributed.algorithms._checkpoint.checkpoint_wrapper` and `torch.compile`
#144637 commented on Jan 21, 2025 • 0 new comments
Better mergebot messages when reverting a PR
#139680 commented on Jan 21, 2025 • 0 new comments
[CI] Manywheel image should use hash based on `.ci/docker` directory
#142218 commented on Jan 21, 2025 • 0 new comments
ONNX: wrong operator for ceil_mode Pooling in case of skip the last window
#131272 commented on Jan 21, 2025 • 0 new comments
DataLoader hangs when object fails during pickling
#142884 commented on Jan 21, 2025 • 0 new comments
python-3.13t binaries are only available for Linux x86
#144357 commented on Jan 21, 2025 • 0 new comments
Bug when using reparameterized model evaluating with DDP
#145043 commented on Jan 21, 2025 • 0 new comments
torch.distributed hangs between Linux (X86) and Mac (M2 Pro)
#144851 commented on Jan 21, 2025 • 0 new comments
Consider making torch.cond return zero rather than None for the gradients of tensors that are in the not-taken branch of the if-else.
#141301 commented on Jan 21, 2025 • 0 new comments
Partitioner stores fp8 copy of all weights between fwd and bwd, causing OOM
#141881 commented on Jan 21, 2025 • 0 new comments
[PassRate] TorchBench training PassRate is less than 100
#143414 commented on Jan 21, 2025 • 0 new comments
torch.compile does not work with Flash attention 3
#144540 commented on Jan 21, 2025 • 0 new comments
DISABLED test_grad_scaling_autocast_cuda (__main__.TestTorchDeviceTypeCUDA)
#119154 commented on Jan 17, 2025 • 0 new comments
DISABLED test_tensor_subclasses (__main__.TestScript)
#119949 commented on Jan 17, 2025 • 0 new comments
DISABLED test_comprehensive_cross_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#140355 commented on Jan 17, 2025 • 0 new comments
DISABLED test_dunder_round_edgecases_val_2147483647_ndigits_-1 (__main__.TestNonarrayArgs)
#116121 commented on Jan 17, 2025 • 0 new comments
DISABLED test_is_isnot (__main__.TestScript)
#120694 commented on Jan 17, 2025 • 0 new comments
Is it possible to remove NCCL submodule and use only nccl binaries from pypi instead ?
#144768 commented on Jan 17, 2025 • 0 new comments
Cannot create and distribute array in torch.func.grad
#134462 commented on Jan 17, 2025 • 0 new comments
[Dynamo] Do an audit on skipfiles and mark more files as inline
#142395 commented on Jan 17, 2025 • 0 new comments
DISABLED test_profiler_mark_wrapper_call_dynamic_shapes_cuda (__main__.DynamicShapesGPUTests)
#145002 commented on Jan 17, 2025 • 0 new comments
DTensor RNG state for non CUDA backends
#138329 commented on Jan 17, 2025 • 0 new comments
amp.custom_fwd has incomplete support for library.custom_op
#137033 commented on Jan 17, 2025 • 0 new comments
ExpandableMemorySegments not working on H100s/A100s
#122057 commented on Jan 17, 2025 • 0 new comments
AMP doesn't gracefully handle optimizers for disabled regions
#47128 commented on Jan 17, 2025 • 0 new comments
Support loading and executing a ExportedProgram from torch.export in C++ environment
#144663 commented on Jan 17, 2025 • 0 new comments
[Pipelining] PP+DDP does not work for Zero Bubble
#144530 commented on Jan 17, 2025 • 0 new comments
`_pdist_forward` causes segmentation fault for 3D tensor with last dimension of size 0
#145064 commented on Jan 17, 2025 • 0 new comments
auto-grad graph replicate split_with_sizes(lengths) X times where X = len(lengths) effecting compile time
#140835 commented on Jan 17, 2025 • 0 new comments
`torch.compiler.disable()` on module hooks will disable `module.compile()`
#142358 commented on Jan 17, 2025 • 0 new comments
DISABLED test_serialize_export_scan_simple_cuda_float32 (__main__.TestHOPCUDA)
#139073 commented on Jan 17, 2025 • 0 new comments
DISABLED test_retrace_export_scan_simple_cuda_float32 (__main__.TestHOPCUDA)
#139074 commented on Jan 17, 2025 • 0 new comments
Support torch.func.grad for Flex Attention
#144810 commented on Jan 17, 2025 • 0 new comments
torch.stack for sequences
#144671 commented on Jan 17, 2025 • 0 new comments
nn.LSTM documentation
#139582 commented on Jan 17, 2025 • 0 new comments
DISABLED test_device_mode_ops_sparse_sampled_addmm_cpu_complex64 (__main__.TestDeviceUtilsCPU)
#132686 commented on Jan 17, 2025 • 0 new comments
DISABLED test_dtype_sympy_expr_dynamic_shapes_cpu (__main__.DynamicShapesCodegenCpuTests)
#135213 commented on Jan 17, 2025 • 0 new comments
DISABLED test_tmp_not_defined_issue2_dynamic_shapes_cpu (__main__.DynamicShapesCodegenCpuTests)
#135212 commented on Jan 17, 2025 • 0 new comments
LibTorch -> TorchScript -> PyTorch (Python) fails with `AttributeError: 'RecursiveScriptModule' object has no attribute 'forward'`
#68559 commented on Jan 20, 2025 • 0 new comments
TorchInductor CPU Performance Dashboard
#93531 commented on Jan 20, 2025 • 0 new comments
Set up Mac builds with clang >= 17 even though Xcode only has at most clang 16
#143913 commented on Jan 20, 2025 • 0 new comments
Python 3.13 support for PyTorch
#130249 commented on Jan 20, 2025 • 0 new comments
tts_angular: fail_to_run, torch._dynamo.exc.Unsupported: call_method NNModuleVariable() flatten_parameters [] {}
#105532 commented on Jan 20, 2025 • 0 new comments
Illegal Memory Access With `torch.compile`
#139628 commented on Jan 20, 2025 • 0 new comments
TypeError: Type parameter +RV without a default follows type parameter with a default in _inductor/utils.py
#140914 commented on Jan 20, 2025 • 0 new comments
DISABLED test_matmul_triton_kernel_benchmark (__main__.TestKernelBenchmark)
#115002 commented on Jan 20, 2025 • 0 new comments
Device assert throws a runtime error in cuda backend and results in a crash in xpu backend
#142135 commented on Jan 20, 2025 • 0 new comments
Matmul with int32 parameters on Intel GPU leads to errors
#144766 commented on Jan 20, 2025 • 0 new comments
[Inductor] [CPU] `GroupNorm` triggers inconsistency when using Inductor
#141541 commented on Jan 20, 2025 • 0 new comments
[RFC] Intel GPU distributed Backend integration in `torch-xpu-ops`and registeration in PyTorch
#141741 commented on Jan 20, 2025 • 0 new comments
DISABLED test_profiler_mark_wrapper_call_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135294 commented on Jan 20, 2025 • 0 new comments
[torch.export] _insert_copy_for_mutations can't generate proper copy nodes for pure inplace ops
#144954 commented on Jan 20, 2025 • 0 new comments
ARM build failed with recent XNNPACK update: third_party/XNNPACK/src/reference/unary-elementwise.cc:125:14: error: invalid ‘static_cast’ from type ‘xnn_bfloat16’ to type ‘_Float16’
#141083 commented on Jan 20, 2025 • 0 new comments
DISABLED test_comprehensive_pca_lowrank_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#139828 commented on Jan 20, 2025 • 0 new comments
Torch profiler corrupted names with Python 3.11
#121219 commented on Jan 19, 2025 • 0 new comments
[pytree] Handling of `None` in torch.utils._pytree is inconsistent with JAX.
#119328 commented on Jan 19, 2025 • 0 new comments
Add BufferDict container
#37386 commented on Jan 19, 2025 • 0 new comments
Responses from `https://download.pytorch.org/whl/cpu` have `cache-control: no-cache` set in their headers
#130571 commented on Jan 18, 2025 • 0 new comments
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on Jan 18, 2025 • 0 new comments
DISABLED test_input_mutation2_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135295 commented on Jan 18, 2025 • 0 new comments
Make streams used for NCCL operations configurable
#67158 commented on Jan 17, 2025 • 0 new comments
DISABLED test_device_mode_ops_sparse_sampled_addmm_cpu_float64 (__main__.TestDeviceUtilsCPU)
#132737 commented on Jan 17, 2025 • 0 new comments
Observing CUDA OOM errors in more recent versions of PyTorch nightly (post-`2.6.0.dev20241126`)
#141904 commented on Jan 17, 2025 • 0 new comments
DISABLED test_torch_to (__main__.TestTEFuserStatic)
#121876 commented on Jan 17, 2025 • 0 new comments
DISABLED test_torch_to (__main__.TestTEFuserDynamic)
#121875 commented on Jan 17, 2025 • 0 new comments
[dynamo] Fix constant propagation in builtins and UserClasses
#131354 commented on Jan 21, 2025 • 0 new comments
[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor
#127294 commented on Jan 20, 2025 • 0 new comments
[inductor] enable bf32 test for mkldnn conv
#127293 commented on Jan 21, 2025 • 0 new comments
allow to use bf16 as fp32 internal precision for mkldnn conv backward
#126054 commented on Jan 21, 2025 • 0 new comments
allow to use bf16 as fp32 internal precision for mkldnn conv
#126050 commented on Jan 21, 2025 • 0 new comments
refine fp32 precision api
#125888 commented on Jan 21, 2025 • 0 new comments
[vision hash update] update the pinned vision hash
#125806 commented on Jan 24, 2025 • 0 new comments
Automated submodule update: FBGEMM
#115316 commented on Jan 24, 2025 • 0 new comments
Automated submodule update: kineto
#106149 commented on Jan 22, 2025 • 0 new comments
[Inductor test failure] torch:inductor/test_select_algorithm TestSelectAlgorithm.test_convolution1 with cuda 12.6.3
#143412 commented on Jan 24, 2025 • 0 new comments
[export] run_decompositions fails on `torch.ops.aten.index_put_`
#141336 commented on Jan 24, 2025 • 0 new comments
compile time regression 1/9
#144775 commented on Jan 24, 2025 • 0 new comments
[RFC] Add CPP INT8 SDPA Template for Inductor CPU
#144941 commented on Jan 24, 2025 • 0 new comments
DISABLED TCPStoreTest.testMultiTenantStores (__main__.TCPStoreTest)
#142030 commented on Jan 24, 2025 • 0 new comments
Performance regression in torch.compile
#136254 commented on Jan 24, 2025 • 0 new comments
assert size/strides for fallback kernel
#144717 commented on Jan 24, 2025 • 0 new comments
FFT half precision only let CUDA pass the check
#143112 commented on Jan 24, 2025 • 0 new comments
-fno-omit-frame-pointer by default in our builds
#51151 commented on Jan 24, 2025 • 0 new comments
TorchDispatchMode cann't capture the operator which name is aten::index_put_ impl_
#145041 commented on Jan 24, 2025 • 0 new comments
[inductor][cpu] With inductor_max_autotune, constants missing from frozen FxGraph.
#143144 commented on Jan 24, 2025 • 0 new comments
Support LayerNorm2d
#144223 commented on Jan 24, 2025 • 0 new comments
DISABLED test_aot_export_with_torch_cond (__main__.TestAOTExport)
#139998 commented on Jan 24, 2025 • 0 new comments
DISABLED test_flip_cpu (__main__.CpuTests)
#142863 commented on Jan 24, 2025 • 0 new comments
Allow generic python data structure input for torch.autograd.Function
#144159 commented on Jan 23, 2025 • 0 new comments
[ONNX][RFC] Migrate torchlib from onnxscript
#139301 commented on Jan 23, 2025 • 0 new comments
Release 2.6.0 validations checklist and cherry-picks
#144503 commented on Jan 23, 2025 • 0 new comments
DISABLED TCPStoreTest.testMultiTenantStoresUV (__main__.TCPStoreTest)
#139150 commented on Jan 23, 2025 • 0 new comments
Add overflow check for integer division
#138684 commented on Jan 17, 2025 • 0 new comments
[Docker] Create an independent dependecies layer
#138612 commented on Jan 22, 2025 • 0 new comments
Prototype Triton kernel for torch.bmm(NJT, T)
#138555 commented on Jan 23, 2025 • 0 new comments
Update test_function_base.py for Numpy 2.0 +
#138463 commented on Jan 21, 2025 • 0 new comments
Replace use of PyTorch 2.0 with torch.compile, and minor edits
#138436 commented on Jan 23, 2025 • 0 new comments
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification
#138214 commented on Jan 22, 2025 • 0 new comments
[POC][FX][pytree] cleanup fx pytree implementation
#138202 commented on Jan 23, 2025 • 0 new comments
update _unsafe_set_version_counter to accept lists of tensors
#137921 commented on Jan 24, 2025 • 0 new comments
[Intel GPU] allow_tf32 context at XPU backend
#137570 commented on Jan 20, 2025 • 0 new comments
Add back DistributedDataParallel types that were lost when pyi was removed
#136835 commented on Jan 20, 2025 • 0 new comments
Avoid sqrt calculations with values less than zero
#136824 commented on Jan 24, 2025 • 0 new comments
[Intel GPU] qlinear.pointwise with mixed dtype support
#136753 commented on Jan 23, 2025 • 0 new comments
[Partitioner] Reduce time consuming of partitions merger
#136614 commented on Jan 23, 2025 • 0 new comments
[Partitioner] Remove unnecessary upstream nodes in dependency viewer
#136608 commented on Jan 23, 2025 • 0 new comments
add generalized pareto distribution (GPD)
#135968 commented on Jan 24, 2025 • 0 new comments
[Intel GPU] qconv.pointwise with mixed dtype XPU support
#135465 commented on Jan 23, 2025 • 0 new comments
add supports_coalescing property in c10d::Backend to determine whether backend supports coalescing
#135338 commented on Jan 23, 2025 • 0 new comments
[Intel GPU] qlinear_pointwise.binary[_tensor] XPU support
#135337 commented on Jan 23, 2025 • 0 new comments
Pass ideep:lowp_kind to matmul_forward::compute on cache misses
#135058 commented on Jan 23, 2025 • 0 new comments
Add decompositions for median and nonmedian
#134881 commented on Jan 23, 2025 • 0 new comments
[ROCm] Add support for SymmetricMemory and Intra Node Comm
#134817 commented on Jan 17, 2025 • 0 new comments
add ranking for grouped benchmarks
#133287 commented on Jan 18, 2025 • 0 new comments
Make IPC features extendable on third-party devices
#133222 commented on Jan 23, 2025 • 0 new comments
basic GroupedInductorBenchmarker
#133121 commented on Jan 18, 2025 • 0 new comments
xpu: support sycl with torch.utils.cpp_extension APIs
#132945 commented on Jan 18, 2025 • 0 new comments
Added dist utility API to get backend from a device object
#132735 commented on Jan 24, 2025 • 0 new comments
[xla hash update] update the pinned xla hash
#132021 commented on Jan 20, 2025 • 0 new comments
DISABLED test_cuda_event_created_outside_of_graph (__main__.CtxManagerTests)
#133828 commented on Jan 23, 2025 • 0 new comments
DISABLED test_cuda_event_created_outside_of_graph_dynamic_shapes (__main__.DynamicShapesCtxManagerTests)
#133837 commented on Jan 23, 2025 • 0 new comments
[Tensorboard] Problem with subfolders from SummaryWriter
#32651 commented on Jan 23, 2025 • 0 new comments
Runners, torchbench, & the future
#143215 commented on Jan 23, 2025 • 0 new comments
Need clarification on torch.nn.CrossEntropyLoss
#137188 commented on Jan 23, 2025 • 0 new comments
torch.library.opcheck generates gradients with strides of 0
#132857 commented on Jan 23, 2025 • 0 new comments
`torch.nn.function.one_hot` and `torch.Tensor.as_subclass` API not available under `torch.compile`
#129651 commented on Jan 23, 2025 • 0 new comments
ExportedModule default print of graph signature is unreadable
#141243 commented on Jan 23, 2025 • 0 new comments
TORCH_PYTHON_API contains breaking changes in same version 2.6.0a0
#144966 commented on Jan 22, 2025 • 0 new comments
int_mm seems broken due to Triton upgrade
#144705 commented on Jan 22, 2025 • 0 new comments
[feature request] Varlen indexing function for lookup and concat of varlen BPE tokens from a tensor vocab (i.e. `detokenize(...)` and arrays of strings)
#135704 commented on Jan 22, 2025 • 0 new comments
[triton 3.2] test_convolution_as_mm failure on A100
#141079 commented on Jan 22, 2025 • 0 new comments
`torch.ops.aten._local_scalar_dense` crashed on empty size tensor
#145063 commented on Jan 22, 2025 • 0 new comments
Method for loading a distributed checkpoint into a single state_dict is being deprecrated without alternative, request to make it possible to keep that feature
#125777 commented on Jan 22, 2025 • 0 new comments
[DCP]Distributed checkpoint `set_optimizer_state_dict` cause optimizer step error when optimizer contains empty param group
#143828 commented on Jan 22, 2025 • 0 new comments
Should coordinator_rank in class _DistWrapper be the global_rank instead of local rank in its process group?
#141825 commented on Jan 22, 2025 • 0 new comments
Compile error for custom op with optional mutable tensor list argument
#144072 commented on Jan 22, 2025 • 0 new comments
Really slow compilation times for torch.compile causing distributed training errors
#108971 commented on Jan 22, 2025 • 0 new comments
On Kaggle : libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12
#134929 commented on Jan 22, 2025 • 0 new comments
Support SDPA flash attention/ memory efficant attn on ROCm gfx908
#141958 commented on Jan 22, 2025 • 0 new comments
torch compile error with `torch.Tensor.unsqueeze_`
#129673 commented on Jan 22, 2025 • 0 new comments
Wrong meta function for constant_pad_nd
#144187 commented on Jan 22, 2025 • 0 new comments
Add ATen functions in native_functions.yaml to torch_in_graph_functions list automatically
#145014 commented on Jan 22, 2025 • 0 new comments
torch.nn.functional.scaled_dot_product_attention is_causal fails for kv-cache case (sequential and further parallel attention)
#144858 commented on Jan 22, 2025 • 0 new comments
TIMM Training cudagraphs poolformer_m36 regression
#144893 commented on Jan 22, 2025 • 0 new comments
PyTorch source code build failed on some Windows 11 environment caused by C++ protocol buffer compiler
#143795 commented on Jan 22, 2025 • 0 new comments
Fix `torch.stft` and `torch.istft` when using `center=False`, non-rectangular windows and `win_length==hop_length`
#134323 commented on Jan 23, 2025 • 0 new comments
DISABLED test_reentrant_parent_error_on_cpu_cuda (__main__.TestAutogradDeviceTypeCUDA)
#86735 commented on Jan 23, 2025 • 0 new comments
Meta implementations of FFT operators often have incorrect strides
#106623 commented on Jan 23, 2025 • 0 new comments
Adding Levenberg-marquardt optimizer in PyTorch
#83529 commented on Jan 23, 2025 • 0 new comments
Update TorchInductor to support removed AttrsDescriptor in upstream Triton
#144103 commented on Jan 23, 2025 • 0 new comments
[v.2.6.0] Release Tracker
#142814 commented on Jan 23, 2025 • 0 new comments
compiled autograd + dynamic shapes fails with constraint violation
#133575 commented on Jan 23, 2025 • 0 new comments
[feature request]: Update max onnx opset to 21 for onnxruntime==1.18 compatability
#127167 commented on Jan 23, 2025 • 0 new comments
DISABLED test_allocation_id_uniqueness (__main__.TestTorchTidyProfiler)
#125021 commented on Jan 23, 2025 • 0 new comments
AWS A100 runners reliability issue
#140332 commented on Jan 23, 2025 • 0 new comments
[MPS] Indexing Returns 0 if OOB
#144824 commented on Jan 23, 2025 • 0 new comments
capture_dynamic_output_shape_ops=True changing expected output between eager and compiled versions
#130290 commented on Jan 23, 2025 • 0 new comments
DISABLED test_angle_cpu (__main__.CpuTritonTests)
#136124 commented on Jan 23, 2025 • 0 new comments
pytorch with xpu support fails to eval pre trained models
#143996 commented on Jan 23, 2025 • 0 new comments
FSDP does not work on GLOO backend
#74041 commented on Jan 23, 2025 • 0 new comments
AOTAutogradCache implementation
#128234 commented on Jan 23, 2025 • 0 new comments
FlexAttention + ROCM Issue Tracker
#140855 commented on Jan 23, 2025 • 0 new comments
MaxPool2D memory leakage on device MPS
#125217 commented on Jan 23, 2025 • 0 new comments
General MPS op coverage tracking issue
#77764 commented on Jan 23, 2025 • 0 new comments
DISABLED test_crash (__main__.TestCompileWorker)
#131064 commented on Jan 23, 2025 • 0 new comments
[torchbench] Missing meta function for aten::_cudnn_rnn_flatten_weight
#144989 commented on Jan 23, 2025 • 0 new comments
[Tracker] Nested tensor op coverage requests
#118107 commented on Jan 23, 2025 • 0 new comments
Issues linking to libtorch on M2 mac
#143571 commented on Jan 23, 2025 • 0 new comments
[RFC] PyTorch - PyPi PEP-759 proposal (wheel-next)
#139761 commented on Jan 23, 2025 • 0 new comments
DISABLED test_log_traced_frames (__main__.LoggingTests)
#137461 commented on Jan 23, 2025 • 0 new comments
Inconsistent computation of gradient in MaxUnPooling
#80827 commented on Jan 23, 2025 • 0 new comments
DISABLED test_new_spectral_norm_forward_swap_True (__main__.TestNNParametrization)
#131089 commented on Jan 23, 2025 • 0 new comments