97727 Commits

Author SHA1 Message Date
Pian Pawakapan
47f048afa5 [torchfuzz] add dtensor_placements template (#170136)
Some checks failed
Update viable/strict / do_update_viablestrict (push) Has been cancelled
Upload test stats while running / Upload test stats while running (push) Has been cancelled
Close stale pull requests / stale (push) Has been cancelled
B200 Smoke Tests / get-label-type (push) Has been cancelled
B200 Smoke Tests / linux-jammy-cuda12.8-py3.10-gcc11-sm100 (push) Has been cancelled
rocm-mi200 / before-test (push) Has been cancelled
rocm-mi200 / get-label-type (push) Has been cancelled
rocm-mi200 / linux-jammy-rocm-py3.10 (push) Has been cancelled
unstable-periodic / introduction (push) Has been cancelled
rocm-navi31 / before-test (push) Has been cancelled
rocm-navi31 / get-label-type (push) Has been cancelled
rocm-navi31 / linux-jammy-rocm-py3.10 (push) Has been cancelled
rocm-navi31 / linux-jammy-rocm-py3_10 (push) Has been cancelled
periodic / before-test (push) Has been cancelled
periodic / get-label-type (push) Has been cancelled
periodic / linux-jammy-cuda12.4-py3.10-gcc11 (push) Has been cancelled
periodic / linux-jammy-cuda12.8-py3.10-gcc11 (push) Has been cancelled
periodic / linux-jammy-cuda12.8-py3.10-gcc11-debug (push) Has been cancelled
periodic / linux-jammy-cuda13.0-py3.10-gcc11 (push) Has been cancelled
periodic / linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck (push) Has been cancelled
periodic-rocm-mi300 / before-test (push) Has been cancelled
periodic-rocm-mi300 / get-label-type (push) Has been cancelled
periodic-rocm-mi300 / linux-noble-rocm-py3.12-mi300 (push) Has been cancelled
periodic-rocm-mi200 / before-test (push) Has been cancelled
periodic-rocm-mi200 / get-label-type (push) Has been cancelled
periodic-rocm-mi200 / linux-jammy-rocm-py3.10 (push) Has been cancelled
slow-rocm-mi200 / before-test (push) Has been cancelled
slow-rocm-mi200 / get-label-type (push) Has been cancelled
slow-rocm-mi200 / linux-jammy-rocm-py3.10 (push) Has been cancelled
inductor-rocm-mi200 / get-label-type (push) Has been cancelled
inductor-rocm-mi200 / rocm-py3.10-inductor (push) Has been cancelled
windows-arm64-build-test / build (push) Has been cancelled
windows-arm64-build-test / test (push) Has been cancelled
inductor-periodic / get-default-label-prefix (push) Has been cancelled
inductor-periodic / periodic-dynamo-benchmarks-build (push) Has been cancelled
inductor-periodic / periodic-dynamo-benchmarks-test (push) Has been cancelled
inductor-periodic / periodic-dynamo-benchmarks-build-cuda13 (push) Has been cancelled
inductor-periodic / periodic-dynamo-benchmarks-test-cuda13 (push) Has been cancelled
inductor-periodic / rocm-periodic-dynamo-benchmarks-build (push) Has been cancelled
inductor-periodic / rocm-periodic-dynamo-benchmarks-test (push) Has been cancelled
inductor-periodic / inductor-smoke-build (push) Has been cancelled
inductor-periodic / inductor-smoke-test (push) Has been cancelled
inductor-periodic / periodic-dynamo-benchmarks-cpu-build (push) Has been cancelled
inductor-periodic / periodic-dynamo-benchmarks-cpu-test (push) Has been cancelled
vllm-test / vllm-x-pytorch-build (push) Has been cancelled
vllm-test / vllm-x-pytorch-test (push) Has been cancelled
Limited CI on H100 / get-label-type (push) Has been cancelled
Limited CI on H100 / linux-jammy-cuda12.8-py3.10-gcc11-sm90 (push) Has been cancelled
Limited CI on H100 / linux-jammy-cuda12_8-py3_10-gcc11-sm90-FA3-ABI-stable-test (push) Has been cancelled
xpu / get-label-type (push) Has been cancelled
xpu / linux-jammy-xpu-n-1-py3.10 (push) Has been cancelled
xpu / linux-noble-xpu-n-py3.10 (push) Has been cancelled
xpu / win-vs2022-xpu-n-1-py3 (push) Has been cancelled
xpu / win-vs2022-xpu-n-py3 (push) Has been cancelled
Limited CI for CUTLASS backend on H100 / get-label-type (push) Has been cancelled
Limited CI for CUTLASS backend on H100 / linux-jammy-cuda12.8-py3.10-gcc11-sm90-cutlass-backend (push) Has been cancelled
rocm-mi355 / before-test (push) Has been cancelled
rocm-mi355 / get-label-type (push) Has been cancelled
rocm-mi355 / linux-noble-rocm-py3.12-mi355 (push) Has been cancelled
operator_microbenchmark / get-label-type (push) Has been cancelled
operator_microbenchmark / opmicrobenchmark-build (push) Has been cancelled
operator_microbenchmark / opmicrobenchmark-test (push) Has been cancelled
operator_microbenchmark / opmicrobenchmark-build-b200 (push) Has been cancelled
operator_microbenchmark / opmicrobenchmark-test-b200 (push) Has been cancelled
operator_microbenchmark / opmicrobenchmark-build-rocm (push) Has been cancelled
operator_microbenchmark / opmicrobenchmark-test-rocm (push) Has been cancelled
inductor-A100-perf-nightly / get-label-type (push) Has been cancelled
inductor-A100-perf-nightly / cuda12.8-py3.10-gcc11-sm80 (push) Has been cancelled
inductor-A100-perf-nightly / cuda13.0-py3.10-gcc11-sm80 (push) Has been cancelled
inductor-perf-nightly-x86 / get-label-type (push) Has been cancelled
inductor-perf-nightly-x86 / inductor-build (push) Has been cancelled
inductor-perf-nightly-x86 / inductor-test-nightly-freezing (push) Has been cancelled
inductor-perf-nightly-x86 / inductor-test (push) Has been cancelled
inductor-perf-nightly-x86-zen / get-label-type (push) Has been cancelled
inductor-perf-nightly-x86-zen / inductor-build (push) Has been cancelled
inductor-perf-nightly-x86-zen / inductor-test-nightly (push) Has been cancelled
inductor-perf-nightly-x86-zen / inductor-test (push) Has been cancelled
inductor-perf-nightly-macos / macos-perf-py3-arm64 (push) Has been cancelled
inductor-perf-nightly-macos / macos-perf-py3-arm64-mps (push) Has been cancelled
inductor-perf-nightly-aarch64 / get-label-type (push) Has been cancelled
inductor-perf-nightly-aarch64 / linux-jammy-aarch64-py3.10-inductor (push) Has been cancelled
inductor-perf-b200 / get-label-type (push) Has been cancelled
inductor-perf-b200 / cuda12.8-py3.10-gcc11-sm100 (push) Has been cancelled
inductor-nightly / get-default-label-prefix (push) Has been cancelled
inductor-nightly / nightly-dynamo-benchmarks-build (push) Has been cancelled
inductor-nightly / nightly-dynamo-benchmarks-test (push) Has been cancelled
inductor-micro-benchmark / get-default-label-prefix (push) Has been cancelled
inductor-micro-benchmark / cuda12.8-py3.10-gcc11-sm80 (push) Has been cancelled
inductor-micro-benchmark / cuda13.0-py3.10-gcc11-sm80 (push) Has been cancelled
inductor-micro-benchmark-x86 / inductor-build (push) Has been cancelled
inductor-micro-benchmark-x86 / inductor-micro-benchmark-test (push) Has been cancelled
attention_op_microbenchmark / attn-microbenchmark-build (push) Has been cancelled
attention_op_microbenchmark / attn-microbenchmark-test (push) Has been cancelled
attention_op_microbenchmark / opmicrobenchmark-build-b200 (push) Has been cancelled
attention_op_microbenchmark / opmicrobenchmark-test-b200 (push) Has been cancelled
Nightly Upload to s3 / upload-stats-to-s3 (push) Has been cancelled
Limited CI for symmetric memory tests on H100 / get-label-type (push) Has been cancelled
Limited CI for symmetric memory tests on H100 / linux-jammy-cuda12.8-py3.10-gcc11-sm90-symm (push) Has been cancelled
Limited CI for symmetric memory tests on B200 / get-label-type (push) Has been cancelled
Limited CI for symmetric memory tests on B200 / linux-jammy-cuda12.8-py3.10-gcc11-sm100-symm (push) Has been cancelled
trunk / job-filter (push) Has been cancelled
trunk / before-test (push) Has been cancelled
trunk / get-label-type (push) Has been cancelled
trunk / libtorch-linux-jammy-cuda12.8-py3.10-gcc11-debug (push) Has been cancelled
trunk / linux-jammy-cuda12.8-py3.10-gcc11 (push) Has been cancelled
trunk / linux-jammy-cuda13.0-py3.10-gcc11 (push) Has been cancelled
trunk / linux-jammy-cuda12.8-py3.10-gcc11-no-ops (push) Has been cancelled
trunk / linux-jammy-cuda13.0-py3.10-gcc11-no-ops (push) Has been cancelled
trunk / macos-py3-arm64 (push) Has been cancelled
trunk / win-vs2022-cpu-py3 (push) Has been cancelled
trunk / win-vs2022-cuda12.8-py3 (push) Has been cancelled
trunk / linux-jammy-rocm-py3.10 (push) Has been cancelled
trunk / inductor-build (push) Has been cancelled
trunk / inductor-build-cuda13 (push) Has been cancelled
trunk / cross-compile-linux-test (push) Has been cancelled
trunk / verify-cachebench-cpu-build (push) Has been cancelled
trunk / verify-cachebench-cpu-test (push) Has been cancelled
trunk / linux-jammy-py3-clang12-executorch (push) Has been cancelled
trunk / linux-jammy-py3.10-gcc11-full-debug-build-only (push) Has been cancelled
trunk-rocm-mi300 / before-test (push) Has been cancelled
trunk-rocm-mi300 / get-label-type (push) Has been cancelled
trunk-rocm-mi300 / linux-jammy-rocm-py3.10 (push) Has been cancelled
slow / before-test (push) Has been cancelled
slow / get-label-type (push) Has been cancelled
slow / linux-jammy-cuda12.8-py3.10-gcc11-sm86 (push) Has been cancelled
slow / linux-jammy-cuda13.0-py3.10-gcc11-sm86 (push) Has been cancelled
slow / linux-jammy-py3.10-clang12 (push) Has been cancelled
slow / linux-jammy-py3.10-clang18-asan (push) Has been cancelled
s390x-periodic / before-test (push) Has been cancelled
s390x-periodic / linux-manylinux-2_28-py3-cpu-s390x (push) Has been cancelled
rocm-mi300 / before-test (push) Has been cancelled
rocm-mi300 / get-label-type (push) Has been cancelled
rocm-mi300 / linux-noble-rocm-py3.12-mi300 (push) Has been cancelled
pull / job-filter (push) Has been cancelled
pull / before-test (push) Has been cancelled
pull / get-label-type (push) Has been cancelled
pull / linux-jammy-py3.10-gcc11 (push) Has been cancelled
pull / linux-docs (push) Has been cancelled
pull / linux-jammy-py3.10-gcc11-no-ops (push) Has been cancelled
pull / linux-jammy-py3.10-gcc11-pch (push) Has been cancelled
pull / linux-jammy-py3.10-clang18-asan (push) Has been cancelled
pull / linux-jammy-py3.10-clang12-onnx (push) Has been cancelled
pull / linux-jammy-py3.10-clang12 (push) Has been cancelled
pull / linux-jammy-py3.14-clang12 (push) Has been cancelled
pull / linux-jammy-cuda12.8-cudnn9-py3.10-clang12 (push) Has been cancelled
pull / linux-jammy-cpu-py3.10-gcc11-bazel-test (push) Has been cancelled
pull / linux-jammy-py3.10-gcc11-mobile-lightweight-dispatch-build (push) Has been cancelled
pull / linux-jammy-rocm-py3.10 (push) Has been cancelled
pull / cuda12.8-py3.10-gcc11-sm75 (push) Has been cancelled
pull / cuda13.0-py3.10-gcc11-sm75 (push) Has been cancelled
pull / linux-jammy-xpu-n-py3.10 (push) Has been cancelled
inductor-unittest / get-label-type (push) Has been cancelled
inductor-unittest / inductor-build (push) Has been cancelled
inductor-unittest / inductor-test (push) Has been cancelled
inductor-unittest / inductor-halide-build (push) Has been cancelled
inductor-unittest / inductor-halide-test (push) Has been cancelled
inductor-unittest / inductor-pallas-cpu-build (push) Has been cancelled
inductor-unittest / inductor-pallas-cpu-test (push) Has been cancelled
inductor-unittest / inductor-triton-cpu-build (push) Has been cancelled
inductor-unittest / linux-jammy-cpu-py3.12-gcc11-inductor-triton-cpu (push) Has been cancelled
inductor-unittest / inductor-cpu-build (push) Has been cancelled
inductor-unittest / inductor-cpu-test (push) Has been cancelled
inductor-unittest / inductor-cpu-core-build (3.11) (push) Has been cancelled
inductor-unittest / inductor-cpu-core-build (3.12) (push) Has been cancelled
inductor-unittest / inductor-cpu-core-build (3.13) (push) Has been cancelled
inductor-unittest / inductor-cpu-core-test (3.11) (push) Has been cancelled
inductor-unittest / inductor-cpu-core-test (3.12) (push) Has been cancelled
inductor-unittest / inductor-cpu-core-test (3.13) (push) Has been cancelled
dynamo-unittest / get-label-type (push) Has been cancelled
dynamo-unittest / dynamo-build (3.11) (push) Has been cancelled
dynamo-unittest / dynamo-build (3.12) (push) Has been cancelled
dynamo-unittest / dynamo-build (3.13) (push) Has been cancelled
dynamo-unittest / dynamo-test (3.11) (push) Has been cancelled
dynamo-unittest / dynamo-test (3.12) (push) Has been cancelled
dynamo-unittest / dynamo-test (3.13) (push) Has been cancelled
Limited CI for distributed tests on H100 / get-label-type (push) Has been cancelled
Limited CI for distributed tests on H100 / linux-jammy-cuda12.8-py3.10-gcc11-sm90-dist (push) Has been cancelled
CI for distributed tests on B200 / get-label-type (push) Has been cancelled
CI for distributed tests on B200 / linux-jammy-cuda12.8-py3.10-gcc11-build-distributed-b200 (push) Has been cancelled
CI for distributed tests on B200 / linux-jammy-cuda12.8-py3.10-gcc11-test-b200 (push) Has been cancelled
vLLM Benchmark / set-parameters (push) Has been cancelled
vLLM Benchmark / Build PyTorch and vLLM (push) Has been cancelled
vLLM Benchmark / Run vLLM benchmarks (push) Has been cancelled
Build vLLM wheels / Build cu128 vLLM wheel on manylinux_2_28_x86_64 (push) Has been cancelled
Build vLLM wheels / Build cu128 vLLM wheel on manylinux_2_28_aarch64 (push) Has been cancelled
Build vLLM wheels / Build cu129 vLLM wheel on manylinux_2_28_x86_64 (push) Has been cancelled
Build vLLM wheels / Build cu129 vLLM wheel on manylinux_2_28_aarch64 (push) Has been cancelled
Build vLLM wheels / Build cu130 vLLM wheel on manylinux_2_28_x86_64 (push) Has been cancelled
Build vLLM wheels / Upload cu128 vLLM wheel on manylinux_2_28_aarch64 (push) Has been cancelled
Build vLLM wheels / Upload cu128 vLLM wheel on manylinux_2_28_x86_64 (push) Has been cancelled
Build vLLM wheels / Upload cu129 vLLM wheel on manylinux_2_28_aarch64 (push) Has been cancelled
Build vLLM wheels / Upload cu129 vLLM wheel on manylinux_2_28_x86_64 (push) Has been cancelled
Build vLLM wheels / Upload cu130 vLLM wheel on manylinux_2_28_x86_64 (push) Has been cancelled
inductor-perf-nightly-xpu / get-label-type (push) Has been cancelled
inductor-perf-nightly-xpu / xpu-n-py3.10-inductor-benchmark (push) Has been cancelled
inductor-perf-nightly-xpu / xpu-n-py3.10-inductor-test (push) Has been cancelled
Close nonexistent disable issues / close-nonexistent-disable-issues (push) Has been cancelled
Index PyTorch Tests for Target Determination / get-label-type (push) Has been cancelled
Index PyTorch Tests for Target Determination / index (push) Has been cancelled
nightly / get-label-type (push) Has been cancelled
nightly / Link checks (push) Has been cancelled
nightly / docs build (push) Has been cancelled
nightly / docs push (push) Has been cancelled
nightly / update-commit-hashes (main, .ci/docker/ci_commit_pins, triton, triton-lang) (push) Has been cancelled
nightly / update-commit-hashes (main, .github/ci_commit_pins, audio, pytorch) (push) Has been cancelled
nightly / update-commit-hashes (main, .github/ci_commit_pins, vision, pytorch) (push) Has been cancelled
nightly / update-commit-hashes (main, .github/ci_commit_pins, vllm, vllm-project) (push) Has been cancelled
inductor-perf-nightly-rocm-mi355 / get-label-type (push) Has been cancelled
inductor-perf-nightly-rocm-mi355 / rocm-py3_10-inductor-benchmark-build (push) Has been cancelled
inductor-perf-nightly-rocm-mi355 / rocm-py3_10-inductor-benchmark-test (push) Has been cancelled
inductor-perf-nightly-rocm-mi300 / get-label-type (push) Has been cancelled
inductor-perf-nightly-rocm-mi300 / rocm-py3_10-inductor-benchmark-build (push) Has been cancelled
inductor-perf-nightly-rocm-mi300 / rocm-py3_10-inductor-benchmark-test (push) Has been cancelled
Delete old branches / delete (push) Has been cancelled
inductor-perf-nightly-h100 / get-label-type (push) Has been cancelled
inductor-perf-nightly-h100 / build (push) Has been cancelled
inductor-perf-nightly-h100 / test-periodically (push) Has been cancelled
inductor-perf-nightly-h100 / test-weekly (push) Has been cancelled
inductor-perf-nightly-h100 / test (push) Has been cancelled
quantization-periodic / get-default-label-prefix (push) Has been cancelled
quantization-periodic / periodic-quantization-build (push) Has been cancelled
quantization-periodic / periodic-test-quantization (push) Has been cancelled
operator_benchmark / x86-opbenchmark-build (push) Has been cancelled
operator_benchmark / aarch64-opbenchmark-build (push) Has been cancelled
operator_benchmark / x86-opbenchmark-test (push) Has been cancelled
operator_benchmark / aarch64-opbenchmark-test (push) Has been cancelled
weekly / update-commit-hash (push) Has been cancelled
weekly / update-slow-tests (push) Has been cancelled
docker-builds / get-label-type (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-aarch64-py3.10-clang21, linux.arm64.m7g.4xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-aarch64-py3.10-gcc13, linux.arm64.m7g.4xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-aarch64-py3.10-gcc13-inductor-benchmarks, linux.arm64.m7g.4xlarge, 600) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-inductor-benchmarks, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda12.8-cudnn9-py3.10-clang12, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda12.8-cudnn9-py3.10-linter, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda12.8-py3.12-pallas, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda12.9-cudnn9-py3.12-gcc11-vllm, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda13.0-cudnn9-py3-gcc11, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-cuda13.0-cudnn9-py3-gcc11-inductor-benchmarks, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-linter, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3-clang12-onnx, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3-clang18-asan, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3-gcc11-inductor-benchmarks, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.10-clang12, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.10-gcc11, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.11-clang12, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.12-clang12, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.12-halide, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.12-pallas, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.12-triton-cpu, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.13-clang12, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-py3.14-clang12, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-rocm-n-py3, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-rocm-n-py3-benchmarks, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-tpu-py3.12-pallas, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-jammy-xpu-n-1-py3, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-noble-riscv64-py3.12-gcc14, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-noble-rocm-n-py3, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-noble-rocm-nightly-py3, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-noble-xpu-n-py3, linux.12xlarge) (push) Has been cancelled
docker-builds / docker-build (pytorch-linux-noble-xpu-n-py3-inductor-benchmarks, linux.12xlarge) (push) Has been cancelled
ossf-scorecard / Scorecards analysis (push) Has been cancelled
fuzzes over [Replicate(), Shard(i), Partial()] for DTensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170136
Approved by: https://github.com/bobrenjc93
2026-01-02 06:27:43 +00:00
Guilherme Leobas
39839dbc39 Include one level of stack trace in the lru_cache warning msg (#171496)
Fixes #167991

Example of the new warning message:

```python
/home/guilhermel/git/pytorch313/torch/_dynamo/variables/functions.py:2159: UserWarning: Dynamo detected a call to a `functools.lru_cache`-wrapped function at 'script.py:12'. Dynamo ignores the cache wrapper and directly traces the wrapped function. Silent incorrectness is only a *potential* risk, not something we have observed. Enable TORCH_LOGS=+dynamo for a DEBUG stack trace.

This call originates from:
  File "/path/to/script.py", line 12, in bar
    return baz(x)

  torch._dynamo.utils.warn_once(msg)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171496
Approved by: https://github.com/Lucaskabela
2026-01-01 18:17:57 +00:00
pytorchbot
b35a75b73d Update inductor expected accuracy files (#171533)
## Summary

This PR updates the expected accuracy CSV files for inductor benchmarks based on CI results from PyTorch commit 3c98eef883.

These files serve as reference points for dynamo/inductor CI to track:
- Graph breaks
- Model accuracy

## Changes

- Updated CUDA expected accuracy files in `benchmarks/dynamo/ci_expected_accuracy/`
- Updated ROCm expected accuracy files in `benchmarks/dynamo/ci_expected_accuracy/rocm/`

## Test Plan

- [ ] Verify that the CI jobs pass with the updated expected accuracy files
- [ ] Review the diff to ensure changes are reasonable and expected
- [ ] Check that no unexpected regressions are being marked as "expected"

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171533
Approved by: https://github.com/jataylo, https://github.com/atalman
2026-01-01 15:07:33 +00:00
Nikhil Patel
f7f91ec63a [Inductor][NV Universal GEMM] Ensure benchmarking launches kernels on the current CUDA stream (#171362)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171362
Approved by: https://github.com/drisspg
ghstack dependencies: #170623
2026-01-01 05:28:51 +00:00
albanD
845ea00ae1 Remove assert meta/prims/refs (#170776)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170776
Approved by: https://github.com/ezyang, https://github.com/cyyever
ghstack dependencies: #170598
2026-01-01 05:27:53 +00:00
albanD
1913ee1aec Assert in github, docs, setup and top (#170598)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170598
Approved by: https://github.com/ezyang, https://github.com/cyyever
2026-01-01 05:27:53 +00:00
can-gaa-hou
9def334cd1 [inductor] Fix OverflowError when truncate infinity number (#166636)
Fixes #163833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166636
Approved by: https://github.com/ezyang
2026-01-01 05:24:42 +00:00
Tom Ritchford
5ad95e64e0 Fix pyrefly errors by using pyrefly check --suppress-errors (#171188)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171188
Approved by: https://github.com/lolpack, https://github.com/cyyever
2026-01-01 05:18:13 +00:00
Anshul Sinha
452abac61c [dtensor][partial] fixes unnecessary redistributions when subtracting two partial tensors (#170040)
**Summary:** Currently, whenever we subtract two partial dtensors, we redistribute since linearity is -1 for aten.sub.tensor. However, this is an unnecessary redistribution that can be avoided in similar ways to its add counterpart. I moved the op to linear_ops and ensured subtracting a scalar from a partial dtensor continues to redistribute.

**Test Cases:**
1. pytest test/distributed/tensor/test_pointwise_ops.py -k test_add_sub_scalar_norm_partial
2. pytest test/distributed/tensor/test_pointwise_ops.py -k test_add_sub_scalar_partial

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170040
Approved by: https://github.com/wconstab
ghstack dependencies: #170030, #170035
2025-12-31 23:52:03 +00:00
William Wen
76a53f9626 [dynamo] remove most InstructionTranslator.current_tx() callsites (#170234)
We will eventually remove `current_tx` in favor of directly passing it to VT's. We also eventually intend to change callsites involving TX'es so that the leaf TX is always passed. Currently, this is inconsisten since `InstructionTranslator.current_tx()` returns the root TX.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170234
Approved by: https://github.com/guilhermeleobas
2025-12-31 23:47:34 +00:00
William Wen
dd6a12daec [dynamo] remove most Unsupported subclasses (#171486)
These subclasses were either deleted outright or modified to not be a subclass of Unsupported.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171486
Approved by: https://github.com/guilhermeleobas
ghstack dependencies: #170587, #171358
2025-12-31 23:46:41 +00:00
Jongsok Choi
78ff7c86be [pallas backend] Fix scalar store shape mismatch (#171581)
When storing a scalar value into a buffer, use jnp.full to handle shape differences.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171581
Approved by: https://github.com/oulgen
ghstack dependencies: #171571, #171579
2025-12-31 19:10:10 +00:00
Jongsok Choi
69c5884a75 [pallas backend] Update xfail files with current error messages (#171579)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171579
Approved by: https://github.com/oulgen
ghstack dependencies: #171571
2025-12-31 19:10:10 +00:00
Jongsok Choi
6be2baa9fb [pallas backend] Fix 1D buffer broadcasting for batch norm operations (#171571)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171571
Approved by: https://github.com/oulgen
2025-12-31 19:10:01 +00:00
Eddie Yan
178ebac3a9 [cuDNN][SDPA] Use-same prefer-cuDNN settings for Blackwell and Blackwell Ultra (#170800)
GB300 can run GB200 SDPA kernels

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170800
Approved by: https://github.com/Skylion007
2025-12-31 17:35:59 +00:00
Aaron Gokaslan
766be25a17 [BE]: Update cpp-httplib submodule to 0.29.0 (#171333)
Updates the submodule. I updated it to remove a lot of redundant copies in the codebase that I found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171333
Approved by: https://github.com/drisspg
2025-12-31 17:05:34 +00:00
Aleksandar Samardžić
49e614ea32 Fix synthetic offsets calculation for grouped MM auto-tuning (#171316)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171316
Approved by: https://github.com/NikhilAPatel
2025-12-31 15:31:57 +00:00
Yuanyuan Chen
77470cdbfb Remove old CUDA conditions (#171235)
This PR removes old  branches for CUDA <=12.3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171235
Approved by: https://github.com/ezyang
2025-12-31 09:05:08 +00:00
vishalgoyal316
7c467cad4a Improve RNN dtype mismatch error message (#166946)
/Enhance error message to explain mismatch and provide two actionable fixes: convert input with input.to(dtype) or convert model with model.to(dtype). Add test to validate error message and verify both suggested fixes work.

Fixes #136931

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166946
Approved by: https://github.com/mikaylagawarecki, https://github.com/cyyever
2025-12-31 06:51:06 +00:00
Jongsok Choi
30fd43528e [pallas backend] Add atomic_add store mode support (#171567)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171567
Approved by: https://github.com/oulgen
2025-12-31 05:59:49 +00:00
Andrew Megalaa
4dcec41041 [ROCm] [CI] Fix deterministic scan kernel edge case and enable test (#170763)
Fixes #168862

Previously the test would break on MI300x on this assertion
```cpp
TORCH_INTERNAL_ASSERT(2 * BLOCK_THREADS >= grid_size);
```
because the MI300x has 304 SMs, so the grid_size would get set to at least 304 while the number of threads within a block would be 128 for 16-byte types (2 * 128 = 256 which is not >= 304).

It seems like the reason for this assertion was because the kernel performed a simple reduction for the block aggregates: each thread held a block's aggregate, and if there were less threads per block than the number of blocks, then each thread would add up 2 aggregates (hence the assertion as a safe guard). Changing the conditional to a loop should incur very minimal overhead since it's executed at most one more time per thread than the old behavior.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170763
Approved by: https://github.com/jeffdaily

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2025-12-31 05:35:10 +00:00
Jongsok Choi
f79dc8a549 [pallas backend] skip flaky test test_constant_pad_2d_strides_nonpositive_cpu_pallas (#171564)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171564
Approved by: https://github.com/oulgen
2025-12-31 04:56:20 +00:00
PyTorch UpdateBot
c8ae6c8618 [vllm hash update] update the pinned vllm hash (#171557)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vllm hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171557
Approved by: https://github.com/pytorchbot
2025-12-31 04:51:52 +00:00
Yu, Guangye
3c620e7eff [xpu][fix] Fix test_wrap_triton_handled_during_tracing on XPU (#171512)
# Motivation
This PR aims to fix the failure introduced from https://github.com/pytorch/pytorch/pull/171289 on XPU backend.
Currently, some UTs under `test/dynamo/` are device-agnostic. So the cuda-specific code will raise the following error on XPU backend.
```python
AssertionError: Torch not compiled with CUDA enabled
```

# Additional Context
Fix https://github.com/pytorch/pytorch/issues/171508 and https://github.com/pytorch/pytorch/issues/171509

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171512
Approved by: https://github.com/ezyang
2025-12-31 02:41:42 +00:00
shunting314
e03d126b68 [autochunker] override num-chunks (#171477)
Previously we can not override auto_chunker.num_chunks with the options argument of torch.compile due to lack of type annotation. The type of the config is decided as the default value which is None. Overriding it as an integer during compilation will trigger type mismatch and fail.

Adding type annotations fixes that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171477
Approved by: https://github.com/v0i0, https://github.com/eellison
ghstack dependencies: #171359
2025-12-31 01:51:38 +00:00
shunting314
7b5761f816 [autochunker] support gradient accumulation (#171359)
With gradient accumulation, there will be a fx.Node dividing the loss by a scalar (the gradient accumulation steps). Update the propagation rule to handle that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171359
Approved by: https://github.com/eellison
2025-12-31 01:51:38 +00:00
Zhang, Jianyi
99d55c3193 [xpu][fix] Fix UT test_fuse_mix_order_reductions_combo_kernels (#170297)
Fixes #170296

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170297
Approved by: https://github.com/jansel
2025-12-31 01:07:25 +00:00
Parshant Sharma
6535e1e69e Fix share_memory_ compile (#171162)
Fixes #166623

### Summary:
Fixes share_memory_ in compile mode

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171162
Approved by: https://github.com/Lucaskabela
2025-12-31 00:52:27 +00:00
Jongsok Choi
f65bb97673 [pallas backend] Refactor load and store functions. (#171546)
This refactoring extracts common logic from the load() and store() methods
into reusable helper functions, improving code readability and maintainability.

New helper methods added:
- _get_iter_vars(), _get_used_iter_vars(), _get_indirect_vars(): Variable extraction
- _safe_int(): Safe integer conversion
- _get_buffer_info(): Buffer metadata retrieval
- _compute_output_numel_from_index(): Output size computation
- _get_index_coefficients(), _check_gather_pattern(): Index analysis
- _needs_strided_indexing(), _adjust_index_for_buffer_shape(): Indexing decisions
- _build_load_expr(), _build_store_expr(): Expression building
- _detect_scatter_pattern(), _detect_point_scatter(), _detect_iter_scatter(): Scatter detection
- _check_im2col_pattern(), _check_load_is_strided_input(): Pattern matching
- _check_store_needs_transpose(), _build_full_array_store_expr(): Store helpers
- _maybe_squeeze_intermediate_buffer(): Broadcasting fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171546
Approved by: https://github.com/oulgen
2025-12-31 00:02:12 +00:00
Oguz Ulgen
4ecfdeb39e [pallas backend] Add automatic padding to align tensor sizes to WARPGROUP_SIZE (128) (#171539)
Mosaic requires tiles of size 128, and there's no way to mask since jnp.arange is not supported. I'm told padding is the only way. Until I find a better alternative, pad to 128.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171539
Approved by: https://github.com/choijon5
ghstack dependencies: #171475, #171485, #171531
2025-12-30 22:39:38 +00:00
Oguz Ulgen
98bc4d77e1 [pallas backend] Require sm90+ for mosaic (#171531)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171531
Approved by: https://github.com/choijon5
ghstack dependencies: #171475, #171485
2025-12-30 22:39:38 +00:00
Oguz Ulgen
2fff179ded [pallas backend] More passing tests after 0.8.2 upgrade (#171485)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171485
Approved by: https://github.com/choijon5
ghstack dependencies: #171475
2025-12-30 22:03:30 +00:00
Oguz Ulgen
f43c42a437 [pallas backend] Swap from triton to mosaic for gpu (#171475)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171475
Approved by: https://github.com/choijon5
2025-12-30 22:03:29 +00:00
zpcore
af45e0c41f [DTensor] refactor redistribute_cost function (#170108)
[No function change] Refactor the `redistribute_cost` code by extracting out the logic to compute the cost of a single collective op into `_compute_placement_transition_cost`. This helps for `DTensorRedistributePlanner` to use the same single collective op cost from `_collective_utils.py` when traversing the graph. Below is how the calling stack will look like:
```
DTensorRedistributePlanner --> one_step_redistribute_cost ----------------|
                                                                          | -----> _compute_placement_transition_cost
                                                                          |
redistribute_cost ---> DTensorRedistributePlanner (get transform_infos)---|
```

Without the refactor, `redistribute_cost` and `DTensorRedistributePlanner` will circular calling each other.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170108
Approved by: https://github.com/mori360
ghstack dependencies: #170106, #170107
2025-12-30 21:42:14 +00:00
zpcore
2a11e4430f [DTensor] Fix redistribute_cost using incorrect comm_bytes_gb (#170107)
Notice that the redistribute_cost is incorrect while looking into https://github.com/pytorch/pytorch/issues/169439 to verify cost with following condition:
```
redistribute_cost(SRC, DST) <= redistribute_cost(SRC, INT) + redistribute_cost(INT, DST) for all INT
```
The failing case is:

1. For SRC --> DST, the redistribution path is: `S(1)S(0)[0]S(0)[1]->S(1)S(0)R->S(1)[0]S(0)S(1)[1]->S(1)[0]RS(1)[1]->S(1)[0]S(1)[2]S(1)[1]`
Then redistribute cost is summed up from following four costs:
```
current=S(0), target=R, comm_bytes_gb=1.1920928955078125e-07, step_cost=7.2006796424717825
current=R, target=S(1), comm_bytes_gb=1.1920928955078125e-07, step_cost=0.0    <<<<<<<<<<<<<<<<<<< comm_bytes_gb incorrect
current=S(0), target=R, comm_bytes_gb=2.384185791015625e-07, step_cost=7.201359284943566 <<< mismatch with number 7.2006796424717825
current=R, target=S(1), comm_bytes_gb=2.384185791015625e-07, step_cost=0.0
```
2. For SRC --> INT,  the redistribution path is: 'S(1)S(0)[0]S(0)[1]->S(1)S(0)R->S(1)[0]S(0)S(1)[1]'
Then redistribute cost is summed up from following two costs:
```
current=S(0), target=R, comm_bytes_gb=1.1920928955078125e-07, step_cost=7.2006796424717825
current=R, target=S(1), comm_bytes_gb=1.1920928955078125e-07, step_cost=0.0
```
3. For INT --> DST, the redistribution path is `S(1)[0]S(0)S(1)[1]->S(1)[0]RS(1)[1]->S(1)[0]S(1)[2]S(1)[1]'
Then redistribute cost is summed up from following two costs:
```
current=S(0), target=R, comm_bytes_gb=1.1920928955078125e-07, step_cost=7.2006796424717825
current=R, target=S(1), comm_bytes_gb=1.1920928955078125e-07, step_cost=0.0
```
As we can see, `redistribute_cost(SRC, DST) > redistribute_cost(SRC, INT) + redistribute_cost(INT, DST) ` in this failing test. The difference is from converting from `R` to `S(1)`, which results in the incorrect `comm_bytes_gb` for the upcoming cost computation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170107
Approved by: https://github.com/mori360
ghstack dependencies: #170106
2025-12-30 21:42:14 +00:00
zpcore
dee666e7ce [DTensor] Fix redistribute_cost to detect shard_order (#170106)
When source placements and target placements are the same, we may still need to compute the redistribute_cost because they can be under different shard order.

Skipped the testcase for this PR as there will be a more strict test in #170109.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170106
Approved by: https://github.com/mori360
2025-12-30 21:42:14 +00:00
Aaron Gokaslan
ae272eebd0 [BE][Ez]: Simplify flex attention typing imports (#171528)
Saw this pattern in #171487 and it's not necessary. Typing_extensions already devolves to type alias in newer Python. Once the base Python version is raised enough, ruff will clean it up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171528
Approved by: https://github.com/drisspg
2025-12-30 21:27:30 +00:00
PyTorch MergeBot
a69ed6babc Revert "[precompile] E2e AOT compilation w/ regional inductor backend. (#170153)"
This reverts commit 5f1bed3d93.

Reverted https://github.com/pytorch/pytorch/pull/170153 on behalf of https://github.com/zhxchen17 due to breaking internal tests ([comment](https://github.com/pytorch/pytorch/pull/170153#issuecomment-3700377268))
2025-12-30 20:07:55 +00:00
PyTorch MergeBot
c8ac3a7f1b Revert "[precompile] Respect torch.compile() dynamic config. (#170844)"
This reverts commit 8b581ce5c0.

Reverted https://github.com/pytorch/pytorch/pull/170844 on behalf of https://github.com/zhxchen17 due to breaking internal tests ([comment](https://github.com/pytorch/pytorch/pull/170153#issuecomment-3700377268))
2025-12-30 20:07:55 +00:00
PyTorch MergeBot
a38bbbab87 Revert "[precompile] Cache serialized code object by identity weak map. (#170845)"
This reverts commit dcdf0acbfa.

Reverted https://github.com/pytorch/pytorch/pull/170845 on behalf of https://github.com/zhxchen17 due to breaking internal tests ([comment](https://github.com/pytorch/pytorch/pull/170153#issuecomment-3700377268))
2025-12-30 20:07:55 +00:00
William Wen
44cbd4f8df [dynamo] Remove SkipCodeRecursiveException and RecompileLimitExceeded, add frame_exec_strategy attribute (#171358)
Replace exception-based control flow pattern with attribute-based approach for
SkipCodeRecursiveException and RecompileLimitExceeded.

Instead of using specific exception types for control flow, add a frame_exec_strategy
attribute to TorchDynamoException that allows exceptions to optionally specify how
convert_frame should handle them.

Benefits:
- Cleaner separation of concerns (exceptions for errors, attributes for control flow)
- More flexible - any exception can specify a frame execution strategy
- Easier to extend - no need for new exception types for new strategies
- Better type safety with isinstance(e, exc.TorchDynamoException) check

Changes:
- torch/_dynamo/exc.py:
  * Add frame_exec_strategy attribute to TorchDynamoException with documentation
  * Remove SkipCodeRecursiveException and RecompileLimitExceeded classes
- torch/_dynamo/convert_frame.py:
  * Remove imports of removed exception classes
  * Replace isinstance checks with frame_exec_strategy attribute check
  * Set frame_exec_strategy on Unsupported exception in recompile limit handler

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171358
Approved by: https://github.com/Lucaskabela, https://github.com/guilhermeleobas
ghstack dependencies: #170587
2025-12-30 19:51:13 +00:00
Dylan Maloy
6c619f0f05 support __torch_function__ in torch.autograd.backward (#171473)
Summary: i'd like to call torch.autograd.backward on a list of tensor-subclasses that imlpement __torch_function__ :)

Differential Revision: D89834724

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171473
Approved by: https://github.com/zhxchen17
2025-12-30 18:18:11 +00:00
Jongsok Choi
88052ace3e [pallas backend] Add FMA support. (#171518)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171518
Approved by: https://github.com/oulgen
2025-12-30 17:22:08 +00:00
cyy
963bd0a31f [BC-BREAKING] Remove torch::cuda::profiler::init (#169202)
This function throws on CUDA 12+ and ROCM, and that means it is useless.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/169202
Approved by: https://github.com/ezyang
2025-12-30 17:04:45 +00:00
Aaron Gokaslan
11d2161833 [BE][Ez]: Fix correctness and perf issues with ParamHasherUtil (#171467)
Enables more efficient hashing in STL containers and fixes incorrect invariant

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171467
Approved by: https://github.com/drisspg
2025-12-30 16:20:04 +00:00
PyTorch MergeBot
8c66e4b884 Revert "Add custom torch dispatch mode in aot_autograd runtime wrapper to analyze custom ops under config (#166545)"
This reverts commit 807f51f99f.

Reverted https://github.com/pytorch/pytorch/pull/166545 on behalf of https://github.com/atalman due to Failing internally ([comment](https://github.com/pytorch/pytorch/pull/166545#issuecomment-3699701857))
2025-12-30 15:27:14 +00:00
Andrey Talman
bfb99436c8 [CI] Bump jax version (#171478)
Same as https://github.com/pytorch/pytorch/pull/171211

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171478
Approved by: https://github.com/oulgen, https://github.com/huydhn
2025-12-30 15:08:19 +00:00
Andrey Talman
a810249adf Bump torchbench version (#171490)
To include https://github.com/pytorch/benchmark/pull/2661

Pull Request resolved: https://github.com/pytorch/pytorch/pull/171490
Approved by: https://github.com/huydhn, https://github.com/oulgen
2025-12-30 15:03:53 +00:00
Jack Taylor
3c98eef883 [ROCm] enable decompose k tests for functional coverage (#169948)
Fixes https://github.com/pytorch/pytorch/issues/168617
Fixes https://github.com/pytorch/pytorch/issues/168615
Fixes https://github.com/pytorch/pytorch/issues/168614
Fixes https://github.com/pytorch/pytorch/issues/168613
Fixes https://github.com/pytorch/pytorch/issues/168599
Fixes https://github.com/pytorch/pytorch/issues/168600
Fixes https://github.com/pytorch/pytorch/issues/168601
Fixes https://github.com/pytorch/pytorch/issues/168602
Fixes https://github.com/pytorch/pytorch/issues/168603
Fixes https://github.com/pytorch/pytorch/issues/168604
Fixes https://github.com/pytorch/pytorch/issues/168605
Fixes https://github.com/pytorch/pytorch/issues/168606
Fixes https://github.com/pytorch/pytorch/issues/168607

Enables testing for decompose K mode on ROCm. This is still disabled by default pending perf testing but we can get the functional coverage by adding an inductor config for decompose k enablement.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/169948
Approved by: https://github.com/jansel, https://github.com/eellison, https://github.com/PaulZhang12
2025-12-30 09:48:53 +00:00
Jongsok Choi
9ade6aad80 [pallas backend] Fix iteration variable reshaping for reductions. (#171504)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171504
Approved by: https://github.com/oulgen
2025-12-30 08:51:53 +00:00