Commit Graph

97256 Commits

Author SHA1 Message Date
atalman
7e73d62699 [Release 2.10] Update Matrix and Cutting a release branch preparations (#170603)
1. Updates Matrix For release 2.10
2. Updates Cutting a release branch preparations conditions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170603
Approved by: https://github.com/seemethere, https://github.com/malfet
2025-12-17 01:10:50 +00:00
Eddie Yan
d8f8e6aff8 [cuDNN][submodule] Upgrade to cuDNN frontend 1.16.1 (#170591)
For https://github.com/pytorch/pytorch/issues/169849

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170591
Approved by: https://github.com/Skylion007, https://github.com/malfet
2025-12-17 00:17:17 +00:00
drisspg
493fde4add Refactor tests (#170463)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170463
Approved by: https://github.com/v0i0
ghstack dependencies: #170397
2025-12-17 00:07:15 +00:00
angelayi
bce1d4994c [effects] Handle single return (#170460)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170460
Approved by: https://github.com/fxdawnn
2025-12-16 23:59:09 +00:00
JessicaZhong
4c5fae2efd [BE] Rename MaskPartial back to _MaskPartial (#170423)
`MaskPartial` was made to a public class in https://github.com/pytorch/pytorch/pull/164414, this PR changes it back to a private class, as this class is [used internally](95f7d70fba/torch/distributed/tensor/_ops/_embedding_ops.py (L52)) to achieve vocab parallel.

Test:
```
python3 test/distributed/tensor/test_embedding_ops.py
python3 test/distributed/tensor/test_tensor_ops.py
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170423
Approved by: https://github.com/tianyu-l
2025-12-16 23:33:43 +00:00
Eli Uriegas
224a1d35ff ci: Switch inductor-pallas to autolabel (#170620)
The pull_request trigger path and ciflow path caused jobs to get
triggered twice creating some confusion so wanted to switch this over
to utilize the same autolabel path as everything else.

Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170620
Approved by: https://github.com/oulgen
2025-12-16 23:32:58 +00:00
Isuru Fernando
5813323218 [inductor] fix typing for TensorBox.create (#169992)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/169992
Approved by: https://github.com/eellison
2025-12-16 23:30:10 +00:00
angelayi
4ddc16192e [export] Fix isin decomposition (#170362)
Fixes https://github.com/pytorch/pytorch/issues/170023

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170362
Approved by: https://github.com/yushangdi
2025-12-16 22:49:22 +00:00
albanD
efb64175dc assert removal in ci, github, numpy ref, wo, backends, nn and onnx (#170328)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170328
Approved by: https://github.com/justinchuby, https://github.com/liangel-02
ghstack dependencies: #170327
2025-12-16 22:41:29 +00:00
albanD
037b363115 Remove asserts in nn (#170327)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170327
Approved by: https://github.com/ezyang
2025-12-16 22:41:29 +00:00
Divyansh Khanna
5e4df381f8 Update persons of interest list for torchdata, torchvision, torchcodec (#170275)
Following the same rules listed here https://github.com/pytorch/pytorch/pull/136672

Making TV, TC changes after consulting with Nicolas.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170275
Approved by: https://github.com/ramanishsingh
2025-12-16 22:25:26 +00:00
Edward Yang
e7ab9015e7 Fix flaky tests related to multithreaded stringstream access (#170586)
Authored with claude code

The flaky test is PyTorchStreamWriterAndReader.LoadWithMultiThreads

Signed-off-by: Edward Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170586
Approved by: https://github.com/Skylion007
2025-12-16 22:09:40 +00:00
eqy
3e5e0a3e4e [CUDA][CUDA Graphs] Restore previous current stream with green context and check underlying stream at capture end (#170317)
@galv pointed out that #170148 is extraneous as the default stream within a green-context should be capturable. Indeed this doesn't seem to be the reason that graph capture wasn't working, rather the issue was the parent stream at capture time was not properly restored and the check for it was also too restrictive (the underlying `CUDAStream` needs to be equal, not the wrapped object).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170317
Approved by: https://github.com/malfet, https://github.com/ngimel
2025-12-16 22:04:27 +00:00
drisspg
60b0b7ee33 [WIP] Add bwd score-mod support to flex-flash (#170397)
# Summary

Adds the ability to lower score-mods correctly to the bwd impl, also rebases on the latest pr so that we dont need to copy out grads.

Needs to wait for: https://github.com/Dao-AILab/flash-attention/pull/2070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170397
Approved by: https://github.com/v0i0
2025-12-16 21:58:59 +00:00
Jagadish Krishnamoorthy
a69907a41e [ROCm] Make grouped GEMM CK opt‑in via env and default to fallback path (#170159)
On ROCm fast path routes to group_gemm_ck and slow path to _grouped_mm_fallback. By default, fast path = False route is activated since CK path is not performant yet. To activate CK path, use ROCM_ALLOW_GROUP_GEMM_CK=1 env variable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170159
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-12-16 21:30:29 +00:00
Will Constable
8656dea039 [DTensor] Add OpSchema.args_meta, kwargs_meta helpers (#170358)
Similar to args_strategy, which gets a flat list of OpStrategies out of
the args_spec, args_meta helps provide a version of args_spec usable by
single_dim_strategy functions: it has the same pytree structure as the
original arg_spec, and contains all non-tensor args, but OpStrategy
values are replaced with TensorMeta values and TupleStrategy values are
replaced with Tuples of TensorMeta.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170358
Approved by: https://github.com/weifengpy
ghstack dependencies: #170197
2025-12-16 20:50:44 +00:00
Shangdi Yu
95da763dc2 [invoke_subgraph] Add tangents aliasing to invoke subgraph cache (#170485)
Fixes `test_grad_accuracy_check` unit test

The root cause is because in backward graphs, the tangent's aliasing behavior can change. e.g. when you call it the first time, two tangents are alias, but in the next call they're not alias. If they share the same backward graph, the gradients can be wrong.

This doesn't happen in the forward graph, because dynamo handles it already, no inputs will be aliases.

We add the aliasing information to the invoke_subgraph_cache, so if input's aliasing changes, we re-trace the backward graph.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170485
Approved by: https://github.com/zou3519
2025-12-16 20:47:08 +00:00
Natalia Gimelshein
530ff92655 don't check mat1 sizes when deciding whether to fuse bias (#170529)
Checking mat1 sizes was added in #163955 but it's not needed. mat1 and mat2 can be swapped only if output is column-major but we've already checked that output is contiguous.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170529
Approved by: https://github.com/drisspg, https://github.com/nikitaved
2025-12-16 20:33:57 +00:00
Jeff Daily
a0c0eb172b Revert "[ROCm][CUDA] add unit test utility busy_wait_for_flag (#166218)" (#170462)
This fully reverts commit d401e4e70a.

CUDA revert was here https://github.com/pytorch/pytorch/pull/169312.

Causes deadlock on both CUDA and ROCm.  The initial implementation is probably UB in CUDA and ROCm; there are no concurrency guarantees for streams.  ROCm deadlock was only detected when compiling for amdgcnspirv backend which is not covered by pytorch CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170462
Approved by: https://github.com/malfet
2025-12-16 20:23:32 +00:00
Tristan Rice
a88bcba3b6 third-party/gloo: bumped submodule version to fix windows support (#170491)
This bumps Gloo submodule to fix Windows support.

See https://github.com/pytorch/pytorch/issues/150381 for more details.

Test plan:

CI

https://github.com/pytorch/pytorch/issues/150381#issuecomment-3605200991
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170491
Approved by: https://github.com/fduwjj
2025-12-16 20:09:00 +00:00
Divyansh Khanna
045cd79633 Update docs/source/data.md (#170313)
Fixes #169252

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170313
Approved by: https://github.com/aelavender, https://github.com/ramanishsingh
2025-12-16 19:29:25 +00:00
albanD
8d0d602651 Fix refcount bug in mode poping (#169148)
Fixes #165857

When the stack is the only owner of the python object, mode is the sole owner.
So when mode goes out of scope, the mode is destroyed, just before we incref it again!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/169148
Approved by: https://github.com/Skylion007
2025-12-16 19:21:31 +00:00
William Wen
b9aa4cdbc2 [dynamo] fix missing step_unsupported graph break message (#170115)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170115
Approved by: https://github.com/mlazos
ghstack dependencies: #169742, #170031
2025-12-16 19:06:29 +00:00
Brian Hirsh
44c28cade6 fix input mutation handling for subclasses that perform intermediate compute during copy_ (DTensor) (#170467)
Before this PR, graph capturing a program that performed an input mutation on a DTensor under training would hard error.

The context:

(1) When AOTAutograd traces out a joint graph and detects an input mutation, it makes a call to `old_input.copy_(new_input)`, and relies on the current make_fx call to capture this copy_ as a node in the joint graph ([code](https://github.com/pytorch/pytorch/blob/main/torch/_functorch/_aot_autograd/graph_capture_wrappers.py#L733))

(2) ordinarily, capturing `old_input.copy_(new_input)` doesn't **need** to generate any fresh proxies in the graph, as the output of `copy_` is the self argument, which we expect to already have a proxy. Why does this matter? @IvanKobzarev added some logic to handle the case where a buffer is mutated during both the fw and the bw ([PR](https://github.com/pytorch/pytorch/pull/155354)), and as part of doing so, tweaked the input mutation handling in AOTAutograd so that these copy_ calls are generated under a [context manager](https://github.com/pytorch/pytorch/blob/main/torch/_functorch/_aot_autograd/graph_capture_wrappers.py#L979C29-L979C72) that prevents proxy_tensor from adding new proxies to the graph. The idea being that we are applying this context manager in a very limited region, where we know no new proxies need to be created

(3) However, this is not true for DTensor. When you call `dtensor.copy_(dtensor)`, DTensor runs fake tensor prop under the hood, which involves constructing fresh FakeTensors for the inputs with which to run the fake prop on ([code](https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/_op_schema.py#L510))

The net result is that we end up *not* constructing proxies for these fake tensor inputs, and we get a "proxy not found" error immediately afterwards when attempting to use them when DTensor runs fake prop ([here](https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/_sharding_prop.py#L243))

The way I fixed this was just by tweaking the "don't clobber proxies" context manager to be a bit more general: it will still generate proxies for inputs that don't already have proxies, and it simply won't overwrite an existing proxy with a new one when you trace an inplace op.

One alternative would have been to disable proxy tracing when DTensor runs fake prop. Since after all, we don't really care about the ops that DTensor ran during fake prop. I decided not to do this because that code has changed a bunch recently and is pretty fragile, but I'm hoping to it if people prefer that path.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170467
Approved by: https://github.com/IvanKobzarev
2025-12-16 18:38:46 +00:00
eellison
a094bb8292 Add back in return None that got lost in refactoring (#170557)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170557
Approved by: https://github.com/IvanKobzarev
2025-12-16 18:32:25 +00:00
Janani Sriram
a70a322f33 [Inductor][Triton] Disable mm_template for main loop scaling templates (#170139)
Summary:
Disable `mm_template` as a template option for `scaled_mm`. `mm_template` leverages epilogue scaling for `scaled_mm`; however, block-wise scaling, which uses the main loop scaling template, must apply scaling to each input tensor before accumulation, rather than as an epilogue after accumulation.

NOTE: It is interesting that for small-enough shapes, `mm_template` seems to actually pass in Inductor without a CUDA error or IMA, and it is the winner over the main loop scaling template. The same generated kernel fails when attempting to repro locally. As a follow-up, it would be useful to investigate whether there is an implementation gap in `mm_template` or `main_loop_scaling_template`.

Test Plan:
```
CUDA_LAUNCH_BLOCKING=1 MAX_AUTOTUNE_PRUNE_CHOICES_BASED_ON_SHARED_MEM=1 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TRITON_PRINT_AUTOTUNING=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 ENABLE_PERSISTENT_TMA_MATMUL=1 TORCHINDUCTOR_CACHE_DIR=~/personal/cache_dir TORCH_LOGS=+inductor,"output_code" python3 run.py --op fp8_gemm --only torch_fp8_gemm,pt2_fp8_gemm --metrics accuracy,tflops,latency --m 32768 --n 8192 --k 8192 --output ~/personal/fp8_scaling_benchmarks/blockwise1x128_blockwise128x128.csv --scaling-pair BlockWise1x128,BlockWise128x128 --bypass-fail 2<&1 | tee ~/personal/fp8_scaling_benchmarks/blockwise1x128_blockwise128x128_2.log
```

Differential Revision: D88900710

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170139
Approved by: https://github.com/NikhilAPatel
2025-12-16 18:28:04 +00:00
Angela Yi
5aefcb2586 [export] Deepcopy graph signature when unlifting (#170461)
Previously https://github.com/pytorch/pytorch/pull/167231 modified ep.module() such that it deepcopies ep before applying ep.module() so that ep.module() does not affect the original ep. However we didn't deepcopy the graph signature which caused some issues.

Differential Revision: D89209276

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170461
Approved by: https://github.com/ydwu4
2025-12-16 17:29:20 +00:00
Anshul Sinha
770def6077 [dtensor][partial] fixed adding scalar to any Partial (#170030)
**Summary:** When we add a scalar to Partial dtensor, we don't redistribute causing us to add the scalar to each part of the partial dtensor as reported in https://github.com/pytorch/pytorch/issues/149768, https://github.com/pytorch/pytorch/issues/163193. This PR addresses this issue so that we always redistribute for Partial when adding a scalar. The iterative process to arriving at this PR can be viewed in https://github.com/pytorch/pytorch/pull/167813.

**Test Case**
1. pytest test/distributed/tensor/test_pointwise_ops.py -k test_add_scalar_partial

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170030
Approved by: https://github.com/wconstab
2025-12-16 17:25:45 +00:00
Huy Do
5587e43ea2 Switch vLLM CI jobs to CUDA 12.9 (#170513)
This matches the current CI setup on vLLM on CUDA 12.9, avoid any funny business between 12.8 and 12.9

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170513
Approved by: https://github.com/yangw-dev, https://github.com/zou3519
2025-12-16 17:21:06 +00:00
PyTorch MergeBot
a4876af74a Revert "Move non-tensor nodes on boundary after split (#163605)"
This reverts commit f927e00362.

Reverted https://github.com/pytorch/pytorch/pull/163605 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/163605#issuecomment-3661422614))
2025-12-16 16:37:21 +00:00
Edward Z. Yang
4ee69c9189 Optimize bs=1 case for allgather on dim 1 to not split/cat (#169404)
Authored with claude code

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/169404
Approved by: https://github.com/ngimel, https://github.com/albanD
2025-12-16 16:30:15 +00:00
PyTorch MergeBot
beb6c77f8a Revert "Checking if the input is finite before calculation in lowering of pow func (#167723)"
This reverts commit 71ecbc44fb.

Reverted https://github.com/pytorch/pytorch/pull/167723 on behalf of https://github.com/seemethere due to This is breaking halide tests see 71ecbc44fb ([comment](https://github.com/pytorch/pytorch/pull/167723#issuecomment-3661231385))
2025-12-16 15:53:07 +00:00
Byron Wang
f927e00362 Move non-tensor nodes on boundary after split (#163605)
Summary: See details in the attached doc
[AOTI Non-Tensor on Boundary Issue.pdf](https://github.com/user-attachments/files/22608882/AOTI.Non-Tensor.on.Boundary.Issue.pdf)

Meta internal post: https://fb.workplace.com/groups/aoti.productionization/permalink/743060011864813/

Test Plan: unit test

Bifferential Revision: D80903201

Steps to show how this feature works:
- checkout commit: 01eb5dd5c3dc3e5d62fc975da9bcfee4befbd421
- run this local lowering command: P2072906967
You will see error like below, this means there is a node on the boundary whose dtype is `torch.dtype`, the corresponding model code is [this line](https://www.internalfb.com/code/fbsource/[e9ae35402388]/fbcode/minimal_viable_ai/umia_v1/ig/omni/root_model.py?lines=2409)
```
Found <class 'torch.dtype'> in output, which is not a known type.
```
- Then search `move_non_tensor_nodes_on_boundary` in file `fbcode/caffe2/torch/fb/model_transform/lower_presets/aoti/ig_ss_t2i_retrieval_aot_inductor.py`, you will see it is commented, now set `move_non_tensor_nodes_on_boundary` to true, and rerun the command in last step, you will see the tracing is done and `_exported_run_on_acc_1_readable.txt` is generated. (although there is IMA error which is not related to this diff)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163605
Approved by: https://github.com/ColinPeppler
2025-12-16 15:47:19 +00:00
Oguz Ulgen
3854d691ce [pallas backend] Fix load and reduction codegen for mean operations (#170464)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170464
Approved by: https://github.com/yf225
ghstack dependencies: #170451
2025-12-16 09:03:19 +00:00
Oguz Ulgen
5694d890f3 remove passing test (#170451)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170451
Approved by: https://github.com/yf225
2025-12-16 09:03:19 +00:00
Pawel Swider
da892d2ddf Enable test_triton_fx_graph_with_et_xpu to run with XPU (#169181)
Change hardcoded `"cuda:0"` to `device` param to allow to running `test_triton_fx_graph_with_et` on different devices, especially test now passed on XPU.

Simplify skip conditions and make minor refactor.

Fixes https://github.com/intel/torch-xpu-ops/issues/2040

Pull Request resolved: https://github.com/pytorch/pytorch/pull/169181
Approved by: https://github.com/guangyey, https://github.com/jansel, https://github.com/EikanWang
2025-12-16 08:41:20 +00:00
Zhang, Liangang
4816fd9122 [xpu][test][FlexAttention]Enable the test_GQA on Intel XPU (#166376)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166376
Approved by: https://github.com/drisspg, https://github.com/EikanWang
2025-12-16 08:31:07 +00:00
Nikita Shulga
ed2e92b4b6 [BE][MPS] Don't pass nnz to mark_segments (#170403)
Fixes following unused variable warning
```
/Users/malfet/git/pytorch/pytorch/aten/src/ATen/native/sparse/mps/kernels/SparseTensorMath.metal:288:27: warning: unused parameter 'nnz' [-Wunused-parameter]
    constant uint&        nnz     [[buffer(2)]],
```

Also, use short circuit language rule to make kernel more compact
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170403
Approved by: https://github.com/Skylion007
2025-12-16 07:52:37 +00:00
Masaki Kozuki
494bce3ff4 fix typo in aot_compile error message: enable_aot_config -> enable_aot_compile (#170441)
As per title, this pull request corrects the error messages related to enabling AOT (Ahead-Of-Time) compilation in the `torch._dynamo.eval_frame` module.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170441
Approved by: https://github.com/Lucaskabela
2025-12-16 07:06:15 +00:00
Weishi.Deng
9d9ecdb349 [xpu][feature] Enable triton online softmax kernels on XPU. (#163251)
This pr is to enable triton online softmax kernels for xpu devices, so we add a device check in prepare_softmax_extra_check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163251
Approved by: https://github.com/etaf, https://github.com/EikanWang, https://github.com/mlazos
2025-12-16 06:37:48 +00:00
Jane Xu
d24276fc2a Add supported StableIValue types to docs (#168385)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/168385
Approved by: https://github.com/albanD
2025-12-16 06:34:05 +00:00
Jeff Daily
225496166b [CI] fix test_pointwise_ops.py test_mul_div_scalar_partial (#170510)
Support any world size; 2, 3 or 4.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170510
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-12-16 06:24:44 +00:00
drisspg
66407ac9cb Fix vllm issue for flex (#170499)
Fixess #170499

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170499
Approved by: https://github.com/zou3519
2025-12-16 06:00:10 +00:00
Rohit Singh Rathaur
9d0d198cb5 [c10d] Add thread safety when calling ncclCommGetAsyncError (#170424)
Fixes #169484

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170424
Approved by: https://github.com/kwen2501
2025-12-16 05:34:37 +00:00
PyTorch MergeBot
bd7e9213d4 Revert "[effects] Handle single return (#170460)"
This reverts commit 6b7d588570.

Reverted https://github.com/pytorch/pytorch/pull/170460 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/170460#issuecomment-3658913666))
2025-12-16 05:32:25 +00:00
shunting314
66bc0d0b1e support multi BLOCK_M for flex-decoding (#170343)
Flex-decoding previously is only active in VLLM in very narrow cases. We use flex-decoding if the workload can be processed with a single BLOCK_M: i.e. `BLOCK_M >= seq_len_q * G`, where G is the ratio between Q head and KV head. We use BLOCK_M as 16 in the vllm integration.

Take llama3 8B as an example, G is 4 here. That means we use flex-decoding only if `seq_len_q` <= 4. This is very restrictive and make flex-decoding being skipped most of the time.

The PR enhance the flex-decoding kernel to work for multi-BLOCK_M.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/170343
Approved by: https://github.com/drisspg
2025-12-16 05:12:00 +00:00
PyTorch UpdateBot
569af32498 [vllm hash update] update the pinned vllm hash (#170408)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vllm hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170408
Approved by: https://github.com/pytorchbot
2025-12-16 05:08:13 +00:00
vishalgoyal316
956911937d Add input_size validation for RNN modules with clear error messages (#166302)
Validates that input_size is an int > 0, providing clear TypeError/ValueError instead of cryptic torch.empty() errors. Applies to nn.RNN, nn.LSTM, nn.GRU.

Fixes #136936

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166302
Approved by: https://github.com/mikaylagawarecki
2025-12-16 05:01:39 +00:00
PyTorch UpdateBot
a19620b5b5 [audio hash update] update the pinned audio hash (#170409)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned audio hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/170409
Approved by: https://github.com/pytorchbot
2025-12-16 04:46:57 +00:00
Zheng, Zhaoqiong
5be28ea833 update get start xpu with new client gpu & update format (#169810)
- update get start xpu with new client gpu
- update script format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/169810
Approved by: https://github.com/EikanWang, https://github.com/gujinghui

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-16 04:37:57 +00:00