Commit Graph

2459 Commits

Author SHA1 Message Date
frost-intel
9b4adc4db7 [fr] [xpu] Add FlightRecorder support for ProcessGroupXCCL (#158568)
Adds support for FlightRecorder in ProcessGroupXCCL.

See https://github.com/intel/torch-xpu-ops/pull/1867 for XCCL implementation and more details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158568
Approved by: https://github.com/guangyey, https://github.com/fduwjj
2025-08-22 09:03:35 +00:00
dolpm
ff4f5dd8ed [nativert] oss layout planner tests (#160942)
Summary: att - changed one of the tests to get rid of torcharrow dep.

Test Plan:
```
buck2 test //caffe2/test/cpp/nativert:layout_planner_tests
Tests finished: Pass 15. Fail 0. Fatal 0. Skip 0. Build failure 0
```

Rollback Plan:

Reviewed By: SherlockNoMad

Differential Revision: D80108549

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160942
Approved by: https://github.com/georgiaphillips, https://github.com/henryoier
2025-08-22 00:26:25 +00:00
dolpm
958f9ca88e [nativert] oss static kernel tests (#161087)
Summary: att - should be no-op

Test Plan:
buck2 test //caffe2/test/cpp/nativert:static_kernel_ops_tests
Tests finished: Pass 24. Fail 0. Fatal 0. Skip 0. Build failure 0

Rollback Plan:

Differential Revision: D80216488

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161087
Approved by: https://github.com/georgiaphillips, https://github.com/henryoier
2025-08-21 19:42:21 +00:00
dolpm
67b98da1b2 [nativert] oss static kernel test utils (#161086)
Summary: att - should be a no-op

Test Plan:
ci

Rollback Plan:

Differential Revision: D80214768

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161086
Approved by: https://github.com/georgiaphillips
2025-08-21 04:49:06 +00:00
dolpm
1471b20cb3 add static dispatch kernel registration to open source (#160439)
Summary: static dispatch registry should be moved to open source. the rest can maintain internally for now, since delegates will all go through ET hop.

Test Plan: spot checked existing tests and didn't see any missing registrations

Differential Revision: D80099377

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160439
Approved by: https://github.com/SherlockNoMad, https://github.com/zhxchen17
2025-08-20 17:58:00 +00:00
dolpm
b439675ae2 [nativert] oss pass graph pass registration (#160859)
Summary: att

Test Plan:
CI

Rollback Plan:

Differential Revision: D80368343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160859
Approved by: https://github.com/georgiaphillips
2025-08-18 22:23:38 +00:00
dolpm
138413907a [nativert] oss subgraph rewriter (#160780)
Summary: att

Test Plan:
ci

Rollback Plan:

Differential Revision: D80367765

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160780
Approved by: https://github.com/SherlockNoMad, https://github.com/georgiaphillips
2025-08-18 04:25:05 +00:00
Catherine Lee
80dd05e31e Disable flaky cpp test RecordDebugHandles.Basic (#160577)
Test is flaky and sometimes hangs in CI

Here's an example of the failure:
https://github.com/pytorch/pytorch/actions/runs/16946153494/job/48027937663
```

2025-08-13T20:54:00.1223688Z ==================================== RERUNS ====================================
2025-08-13T20:54:00.1224156Z ___________________________ RecordDebugHandles.Basic ___________________________
2025-08-13T20:54:00.1224682Z [gw2] linux -- Python 3.13.5 /opt/conda/envs/py_3.13/bin/python3.13
2025-08-13T20:54:00.1225568Z Internal Error: calling /opt/conda/envs/py_3.13/lib/python3.13/site-packages/torch/bin/test_jit for test RecordDebugHandles.Basic failed (returncode=-6):
2025-08-13T20:54:00.1226430Z CUDA not available. Disabling CUDA and MultiCUDA tests
2025-08-13T20:54:00.1226988Z Note: Google Test filter = RecordDebugHandles.Basic-*_CUDA:*_MultiCUDA
2025-08-13T20:54:00.1227450Z [==========] Running 1 test from 1 test suite.
2025-08-13T20:54:00.1227792Z [----------] Global test environment set-up.
2025-08-13T20:54:00.1228145Z [----------] 1 test from RecordDebugHandles
2025-08-13T20:54:00.1228492Z [ RUN      ] RecordDebugHandles.Basic
2025-08-13T20:54:00.1228822Z [       OK ] RecordDebugHandles.Basic (1 ms)
2025-08-13T20:54:00.1229204Z [----------] 1 test from RecordDebugHandles (1 ms total)
2025-08-13T20:54:00.1229501Z
2025-08-13T20:54:00.1229666Z [----------] Global test environment tear-down
2025-08-13T20:54:00.1230033Z [==========] 1 test from 1 test suite ran. (1 ms total)
2025-08-13T20:54:00.1230355Z [  PASSED  ] 1 test.
2025-08-13T20:54:00.1230727Z terminate called after throwing an instance of 'std::system_error'
2025-08-13T20:54:00.1231154Z   what():  Invalid argument
2025-08-13T20:54:00.1231416Z unknown file:0: C++ failure
2025-08-13T20:54:00.1231788Z ------------------------------ Captured c++ call -------------------------------
2025-08-13T20:54:00.1232262Z CUDA not available. Disabling CUDA and MultiCUDA tests
2025-08-13T20:54:00.1232745Z Note: Google Test filter = RecordDebugHandles.Basic-*_CUDA:*_MultiCUDA
2025-08-13T20:54:00.1233199Z [==========] Running 1 test from 1 test suite.
2025-08-13T20:54:00.1233557Z [----------] Global test environment set-up.
2025-08-13T20:54:00.1233915Z [----------] 1 test from RecordDebugHandles
2025-08-13T20:54:00.1234247Z [ RUN      ] RecordDebugHandles.Basic
2025-08-13T20:54:00.1234590Z [       OK ] RecordDebugHandles.Basic (1 ms)
2025-08-13T20:54:00.1235020Z [----------] 1 test from RecordDebugHandles (1 ms total)
2025-08-13T20:54:00.1235304Z
2025-08-13T20:54:00.1235431Z [----------] Global test environment tear-down
2025-08-13T20:54:00.1235793Z [==========] 1 test from 1 test suite ran. (1 ms total)
2025-08-13T20:54:00.1236126Z [  PASSED  ] 1 test.
2025-08-13T20:54:00.1236481Z terminate called after throwing an instance of 'std::system_error'
2025-08-13T20:54:00.1236906Z   what():  Invalid argument
2025-08-13T20:54:00.1237287Z ___________________________ RecordDebugHandles.Basic ___________________________
2025-08-13T20:54:00.1237800Z [gw2] linux -- Python 3.13.5 /opt/conda/envs/py_3.13/bin/python3.13
2025-08-13T20:54:00.1238686Z Internal Error: calling /opt/conda/envs/py_3.13/lib/python3.13/site-packages/torch/bin/test_jit for test RecordDebugHandles.Basic failed (returncode=-6):
2025-08-13T20:54:00.1239551Z CUDA not available. Disabling CUDA and MultiCUDA tests
2025-08-13T20:54:00.1240048Z Note: Google Test filter = RecordDebugHandles.Basic-*_CUDA:*_MultiCUDA
2025-08-13T20:54:00.1240495Z [==========] Running 1 test from 1 test suite.
2025-08-13T20:54:00.1240848Z [----------] Global test environment set-up.
2025-08-13T20:54:00.1241199Z [----------] 1 test from RecordDebugHandles
2025-08-13T20:54:00.1241542Z [ RUN      ] RecordDebugHandles.Basic
2025-08-13T20:54:00.1241871Z [       OK ] RecordDebugHandles.Basic (1 ms)
2025-08-13T20:54:00.1242249Z [----------] 1 test from RecordDebugHandles (1 ms total)
2025-08-13T20:54:00.1242503Z
2025-08-13T20:54:00.1242641Z [----------] Global test environment tear-down
2025-08-13T20:54:00.1242993Z [==========] 1 test from 1 test suite ran. (19 ms total)
2025-08-13T20:54:00.1243329Z [  PASSED  ] 1 test.
2025-08-13T20:54:00.1243697Z terminate called after throwing an instance of 'std::system_error'
2025-08-13T20:54:00.1244113Z   what():  Invalid argument
2025-08-13T20:54:00.1244392Z unknown file:0: C++ failure
2025-08-13T20:54:00.1244759Z ------------------------------ Captured c++ call -------------------------------
2025-08-13T20:54:00.1245235Z CUDA not available. Disabling CUDA and MultiCUDA tests
2025-08-13T20:54:00.1283768Z ============== 1 failed, 568 passed, 2 rerun in 115.57s (0:01:55) ==============
```

Here's an example of the hang:
https://github.com/pytorch/pytorch/actions/runs/16942186826/job/48015238944
Logs aren't super helpful other than stating that it took a long time.  Usually this file takes <2min to run
```
2025-08-13T18:43:24.6586481Z [gw0] [ 97%] PASSED [1.4119s] ../../../../../opt/conda/envs/py_3.13/lib/python3.13/site-packages/torch/bin/test_jit::PyTorch/LiteInterpreterDynamicTypeTestFixture::Conformance/8
2025-08-13T18:43:24.6587278Z [gw1] [ 97%] PASSED [1.4866s] ../../../../../opt/conda/envs/py_3.13/lib/python3.13/site-packages/torch/bin/test_jit::PyTorch/LiteInterpreterDynamicTypeTestFixture::Conformance/9 Command took >30min, returning 124
2025-08-13T18:43:24.6587288Z
2025-08-13T18:43:24.6587632Z FINISHED PRINTING LOG FILE of cpp/test_jit 1/1 (test/test-reports/cpp.test_jit_1.1_c259e5a152845991_.log)
2025-08-13T18:43:24.6587639Z
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160577
Approved by: https://github.com/huydhn
2025-08-15 15:59:21 +00:00
cyy
10e3514c96 Remove tensorexpr tests (#158928)
The tests are not maintained.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158928
Approved by: https://github.com/albanD, https://github.com/malfet
2025-08-09 02:21:22 +00:00
Jane Xu
1690c0c3a0 [Reland] Migrate ScalarType to headeronly (#159911)
The non ghstack version of #159416, to make sure we don't get reverted again
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159911
Approved by: https://github.com/mikaylagawarecki
2025-08-06 07:36:37 +00:00
Jane Xu
3ddfd46bd2 Cut a version of TORCH_ERROR_CODE_CHECK in headeronly from AOTI (#159604)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159604
Approved by: https://github.com/albanD, https://github.com/desertfire
2025-08-06 00:29:56 +00:00
yewentao256
fd6655a0f5 Feature: Implement support for cudnn_batch_norm_out kernel to replace the autogen approach. (#123020)
Fixes #115611

Autogen kernel may cause redundant copy, so we develop the kernel to improve efficiency.

Test Case:

```c++
#include <torch/torch.h>
#include <iostream>
#include <ATen/ATen.h>
#include <ATen/cuda/CUDAContext.h>

int main() {
    auto input = torch::rand({2, 3, 4, 4}, torch::device(torch::kCUDA));
    auto weight = torch::randn({3}, torch::device(torch::kCUDA));
    auto bias = torch::randn({3}, torch::device(torch::kCUDA));
    auto running_mean = torch::zeros({3}, torch::device(torch::kCUDA));
    auto running_var = torch::ones({3}, torch::device(torch::kCUDA));

    bool training = true;
    double exponential_average_factor = 0.1;
    double epsilon = 1e-5;

    auto output = torch::empty_like(input);
    auto save_mean = torch::empty({3}, torch::device(torch::kCUDA));
    auto save_var = torch::empty({3}, torch::device(torch::kCUDA));
    auto reserve = torch::empty({0}, torch::device(torch::kCUDA)); // empty place-holder

    at::native::cudnn_batch_norm_out(input, weight, bias, running_mean, running_var, training, exponential_average_factor, epsilon, output, save_mean, save_var, reserve);
    auto outputs = at::native::cudnn_batch_norm(input, weight, bias, running_mean, running_var, training, exponential_average_factor, epsilon);

    bool is_close_output = torch::allclose(output, std::get<0>(outputs));
    bool is_close_save_mean = torch::allclose(save_mean, std::get<1>(outputs));
    bool is_close_save_var = torch::allclose(save_var, std::get<2>(outputs));
    bool is_close_reserve = torch::allclose(reserve, std::get<3>(outputs));

    std::cout << "Is output close: " << is_close_output << std::endl;
    std::cout << "Is save_mean close: " << is_close_save_mean << std::endl;
    std::cout << "Is save_var close: " << is_close_save_var << std::endl;
    std::cout << "Is reserve close: " << is_close_reserve << std::endl;

    return 0;
}
```

Please CC @albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123020
Approved by: https://github.com/andrewor14, https://github.com/eqy, https://github.com/albanD
2025-08-04 22:40:33 +00:00
PyTorch MergeBot
7e8197e34d Revert "Migrate ScalarType to headeronly (#159416)"
This reverts commit 1371a98b0e.

Reverted https://github.com/pytorch/pytorch/pull/159416 on behalf of https://github.com/izaitsevfb due to breaking internal builds, see D79452481 ([comment](https://github.com/pytorch/pytorch/pull/159416#issuecomment-3152138508))
2025-08-04 19:55:09 +00:00
Jane Xu
8ea86a6e31 Actually test STD_TORCH_CHECK, add testfile to CMake (#159603)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159603
Approved by: https://github.com/Skylion007, https://github.com/albanD
2025-08-01 19:53:41 +00:00
Jane Xu
1371a98b0e Migrate ScalarType to headeronly (#159416)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159416
Approved by: https://github.com/albanD
ghstack dependencies: #159415, #159411
2025-08-01 16:07:01 +00:00
Jane Xu
b95cf5c91d Move complex to headeronly (#159411)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159411
Approved by: https://github.com/albanD
ghstack dependencies: #159415
2025-07-31 22:05:43 +00:00
Jane Xu
5e2ef2a465 Move Float8 variations to headeronly (#159415)
This PR is a big copy pasta from `c10/util/Float8*` -> `torch/headeronly/util/` which is why we are breaking PR sanity :C (sorry @albanD!).

Why is it not a clean copy paste?
- For BC reasons, we have to keep the old c10 file around so that OSS devs relying on those files can still get the same APIs
- Because we reexpose APIs that are headeronly through torch::headeronly, so there is an extra chunk of code in the new torch::headeronly files to do that.

Outside of the copy paste, I:
- changed the tests to call torch::headeronly instead of c10
- updated header_only_apis.txt
- added `// NOLINTNEXTLINE(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions)` to pass lint (which was previously skipped for -inl.h files)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159415
Approved by: https://github.com/albanD
2025-07-31 22:05:43 +00:00
Sherlock Huang
c1722db0f7 [NativeRT] Make VariadicOpConverter and FuseListUnpackConverter for cpu nodes only (#159519)
Summary:
VariadicOpConverter and FuseListUnpackConverter would introduce ops that only have CPU kernels.

Currently, the graph passes are ran if static_dispatch is enabled.

As we plan to enable static_dispatch by default, this diff add the additional check for the graph pass to only work on the node that has all the inputs/outputs on CPU.

Test Plan:
CI

Rollback Plan:

Differential Revision: D79295640

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159519
Approved by: https://github.com/dolpm, https://github.com/henryoier
2025-07-31 18:17:21 +00:00
Jane Xu
c57382a493 Move BFloat16.h to headeronly (#159412)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159412
Approved by: https://github.com/desertfire
2025-07-31 15:29:17 +00:00
Jane Xu
259e79e3ff Move Half to headeronly (#159172)
Essence of this copypasta:
- combine Half-inl.h and Half.h in c10/util -> torch/headeronly/util/Half.h
- Add NOLINTNEXTLINE's to the portions of Half-inl.h that were previously in the ignore list of clangtidy
- Re-expose all APIs in namespaces and through includes of the original files. Ideally, we would have the APIs in torch::headeronly and reexpose them in c10, but that runs into BC issues (see D78997465) so for now we are keeping the APIs in c10 but reexposing them in torch::headeronly.
- Change test cases in test_aoti_abi_check to test torch::headeronly::Half vs c10::Half (they're the same thing but we eventually want all the tests for headeronly APIs to only import from headeronly).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159172
Approved by: https://github.com/albanD, https://github.com/desertfire
2025-07-30 16:11:58 +00:00
Jane Xu
b268f22ab2 Move Float4 to headeronly (#159414)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159414
Approved by: https://github.com/desertfire
2025-07-30 15:34:01 +00:00
PyTorch MergeBot
eaadd1282c Revert "Move Half to headeronly (#159172)"
This reverts commit 6d0f4566e2.

Reverted https://github.com/pytorch/pytorch/pull/159172 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/16613893793/job/47002486679) [HUD commit link](6d0f4566e2).  Note to self: why isn't Dr. CI updating ([comment](https://github.com/pytorch/pytorch/pull/159172#issuecomment-3136769493))
2025-07-30 15:10:26 +00:00
Jane Xu
6d0f4566e2 Move Half to headeronly (#159172)
Essence of this copypasta:
- combine Half-inl.h and Half.h in c10/util -> torch/headeronly/util/Half.h
- Add NOLINTNEXTLINE's to the portions of Half-inl.h that were previously in the ignore list of clangtidy
- Re-expose all APIs in namespaces and through includes of the original files. Ideally, we would have the APIs in torch::headeronly and reexpose them in c10, but that runs into BC issues (see D78997465) so for now we are keeping the APIs in c10 but reexposing them in torch::headeronly.
- Change test cases in test_aoti_abi_check to test torch::headeronly::Half vs c10::Half (they're the same thing but we eventually want all the tests for headeronly APIs to only import from headeronly).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159172
Approved by: https://github.com/albanD, https://github.com/desertfire
2025-07-30 05:02:13 +00:00
Jane Xu
96ac64d00c Migrate easy q(u)int/bits stuff to torch/headeronly (#159302)
Straightup copy pasta. Keeps APIs in c10 and reexposes them to torch::headeronly.

It is arguable that we should just get rid of some of these unused dtypes but that is outside the scope of this PR, which is meant to build up to ScalarType moving to headeronly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159302
Approved by: https://github.com/malfet, https://github.com/albanD
2025-07-30 03:41:27 +00:00
PyTorch MergeBot
e288c258f7 Revert "Remove tensorexpr tests (#158928)"
This reverts commit d742a2896c.

Reverted https://github.com/pytorch/pytorch/pull/158928 on behalf of https://github.com/yangw-dev due to this breaks bunch of internal dependency since some tests are still using the deleted test files from this pr, the internal reviewer please help fix this using codev ([comment](https://github.com/pytorch/pytorch/pull/158928#issuecomment-3134378616))
2025-07-29 23:32:07 +00:00
Zhengxu Chen
8460131087 [nativert] Add OSS version of ModelRunner (#159268)
Summary: Implement a ModelRunner from scratch with the minimum features for OSS only

Test Plan:
test_export -r NativeRT

Rollback Plan:

Differential Revision: D78979812

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159268
Approved by: https://github.com/dolpm
2025-07-29 21:08:14 +00:00
Jane Xu
222fa451a2 Move some of vec into headeronly in preparation for Half.h (#158976)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158976
Approved by: https://github.com/albanD, https://github.com/desertfire
2025-07-29 05:43:53 +00:00
Mu-Chu Lee
19ce1beb05 [AOTInductor] Add test for enabling CUDACachingAllocator for AOTInductor's Weight (#159279)
Summary:
Add test for enabling CUDACachingAllocator for AOTInductor's Weight.
Implementation TBD

Test Plan:
N/A, commit is adding a test.

Rollback Plan:

Differential Revision: D79107507

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159279
Approved by: https://github.com/desertfire, https://github.com/jingsh
2025-07-29 02:52:10 +00:00
zhxchen17
c06164a9c5 [nativert][ez] Remove unused dist collectives ops. (#159220)
Removing dependency to c10d/ in ExecutionFrame.h. We don't need c10d::Work in the frame.

Differential Revision: [D79041618](https://our.internmc.facebook.com/intern/diff/D79041618/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159220
Approved by: https://github.com/SherlockNoMad, https://github.com/dolpm
2025-07-28 16:03:14 +00:00
Sherlock Huang
1abff80fae Reland D78841818 (#159216)
Summary: Relanding D78841818 with fixes

Test Plan:
Tested all failing tests

buck build --config fbcode.use_link_groups=true --flagfile fbcode//mode/dev-nosan fbcode//sigmoid/core/executor/memory/test:layout_planner_tests

buck test 'fbcode//mode/opt' fbcode//sigmoid/inference/test:test_passes

Rollback Plan:

Reviewed By: hl475

Differential Revision: D79038615

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159216
Approved by: https://github.com/dolpm
2025-07-28 07:39:35 +00:00
cyy
d742a2896c Remove tensorexpr tests (#158928)
The tests are not maintained.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158928
Approved by: https://github.com/albanD, https://github.com/malfet
2025-07-27 07:13:27 +00:00
PyTorch MergeBot
f62772f365 Revert "Remove tensorexpr tests (#158928)"
This reverts commit 517eebc1dd.

Reverted https://github.com/pytorch/pytorch/pull/158928 on behalf of https://github.com/ZainRizvi due to Sorry but this breaks trunk test_jit_fuser_te.py::TestNNCOpInfoCPU::test_nnc_correctness_frac_cpu_bfloat16 [GH job link](https://github.com/pytorch/pytorch/actions/runs/16534544469/job/46768022799) [HUD commit link](517eebc1dd) ([comment](https://github.com/pytorch/pytorch/pull/158928#issuecomment-3122158944))
2025-07-26 17:01:54 +00:00
PyTorch MergeBot
3db8623dcb Revert "[NativeRT] Apply Device placement once when loading the graph (#158996)"
This reverts commit 28ee8be5bf.

Reverted https://github.com/pytorch/pytorch/pull/158996 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/158996#issuecomment-3121540050))
2025-07-26 09:05:26 +00:00
cyy
517eebc1dd Remove tensorexpr tests (#158928)
The tests are not maintained.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158928
Approved by: https://github.com/albanD, https://github.com/malfet
2025-07-26 01:21:01 +00:00
Yidi Wu
0f31e9a656 [torchbind] fix fakifying a staitc tensor returns dynamic accidentally (#158607)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158607
Approved by: https://github.com/zou3519
ghstack dependencies: #158583, #158606
2025-07-25 20:55:41 +00:00
Yidi Wu
0427e439aa [test][torchbind] turn on inductor backend for compile torchbind tests (#158606)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158606
Approved by: https://github.com/zou3519
ghstack dependencies: #158583
2025-07-25 20:55:41 +00:00
Yidi Wu
4aa69ae336 [torchbind] support register_autocast for torchbind custom op (#158583)
Fix https://github.com/pytorch/pytorch/issues/158414

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158583
Approved by: https://github.com/zou3519
2025-07-25 20:55:41 +00:00
Sherlock Huang
28ee8be5bf [NativeRT] Apply Device placement once when loading the graph (#158996)
Summary:
Placement is leaked to too many classes!

In this diff, we consolidate all placement lookup into one place: Graph::ApplyDevicePlacement.

After applying placement, the in-memory graph, tensorMeta, weightMeta would already have the re-mapped device.
The subsequence weight loading, sample input loading, target device inference would look up the re-mapped device from graph's tensorMeta.

graph's tensorMeta becomes the only ground truth!

Test Plan:
Need to add some tests before landing.
This is a big change.

Rollback Plan:

Differential Revision: D78841818

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158996
Approved by: https://github.com/henryoier
2025-07-25 20:11:35 +00:00
PyTorch MergeBot
9535995bbc Revert "Remove tensorexpr tests (#158928)"
This reverts commit a0bc865123.

Reverted https://github.com/pytorch/pytorch/pull/158928 on behalf of https://github.com/clee2000 due to broke cpp static runtime test? [GH job link](https://github.com/pytorch/pytorch/actions/runs/16517697273/job/46715871457) [HUD commit link](a0bc865123) ([comment](https://github.com/pytorch/pytorch/pull/158928#issuecomment-3118554478))
2025-07-25 15:22:51 +00:00
cyy
a0bc865123 Remove tensorexpr tests (#158928)
The tests are not maintained.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158928
Approved by: https://github.com/albanD
2025-07-25 08:37:51 +00:00
PyTorch MergeBot
751285cb22 Revert "Move some of vec into headeronly in preparation for Half.h (#158976)"
This reverts commit 5564f2ca2e.

Reverted https://github.com/pytorch/pytorch/pull/158976 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. See D78924504 for details. To validate your fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/158976#issuecomment-3115198443))
2025-07-24 22:31:49 +00:00
PyTorch MergeBot
13398dab79 Revert "Remove tensorexpr tests (#158928)"
This reverts commit a3f9f79f59.

Reverted https://github.com/pytorch/pytorch/pull/158928 on behalf of https://github.com/clee2000 due to Theres still some references to the things removed in this PR in test.sh, the jobs on this PR are failing because of that but log classifier is probably pointing to a wrong line, should be an easy fix tho ([comment](https://github.com/pytorch/pytorch/pull/158928#issuecomment-3114873706))
2025-07-24 20:45:30 +00:00
Jane Xu
5564f2ca2e Move some of vec into headeronly in preparation for Half.h (#158976)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158976
Approved by: https://github.com/albanD, https://github.com/desertfire
2025-07-24 20:32:33 +00:00
cyy
a3f9f79f59 Remove tensorexpr tests (#158928)
The tests are not maintained.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158928
Approved by: https://github.com/albanD
2025-07-24 15:38:36 +00:00
Sherlock Huang
fb067de550 [NativeRT] Remove device_ member from OpKernel base class (#158944)
Summary:
In general, device_ is not very useful in OpKernel.  Remove it to avoid misuse.

Also, the meaning of `device_` is also ambiguous in the OpKernel.
For StaticDispatch kernels, we always call cpu kernel.
For C10Kernel, we rely on input tensor's device and dispatcher to determine which device to run on.
For ops involves multiple device, e.g. aten._to_copy(device), the meaning of device is ill-defined.

Test Plan:
CI

Rollback Plan:

Reviewed By: henryoier, dolpm, kqfu, zhxchen17

Differential Revision: D78704840

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158944
Approved by: https://github.com/dolpm
2025-07-24 09:21:37 +00:00
Benjamin Glass
4060f30042 [AOTI] Convert C-struct zip handling to RAII container (#158687)
Attempts to fix a memory leak reported in #158614 by wrapping manually managed MiniZ C-structs in an RAII container. I have been unable to reproduce the reported leak, but this seems like the most likely candidate.

Fixes #158614 (hopefully)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158687
Approved by: https://github.com/desertfire
2025-07-22 16:01:51 +00:00
cyy
3639d29ea1 Fix warnings of unused-variable (#158627)
Fixes
```
/var/lib/jenkins/workspace/test/cpp/tensorexpr/test_kernel.cpp:42:22: error: unused variable 'verification_pattern' [-Werror,-Wunused-variable]
```
and also extra semicolons.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158627
Approved by: https://github.com/albanD
2025-07-22 02:49:06 +00:00
PyTorch MergeBot
97d7dc197f Revert "[AOTI] Convert C-struct zip handling to RAII container (#158687)"
This reverts commit 8ed5e1844c.

Reverted https://github.com/pytorch/pytorch/pull/158687 on behalf of https://github.com/ZainRizvi due to Sorry but I had to revert this PR in order to revert https://github.com/pytorch/pytorch/pull/158671 ([comment](https://github.com/pytorch/pytorch/pull/158687#issuecomment-3099515618))
2025-07-21 22:13:26 +00:00
Benjamin Glass
8ed5e1844c [AOTI] Convert C-struct zip handling to RAII container (#158687)
Attempts to fix a memory leak reported in #158614 by wrapping manually managed MiniZ C-structs in an RAII container. I have been unable to reproduce the reported leak, but this seems like the most likely candidate.

Fixes #158614 (hopefully)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158687
Approved by: https://github.com/desertfire
2025-07-21 18:53:14 +00:00
Tristan Rice
ab557421a4 [cca] [c10d] Refactor CUDAEventCache into separate files (#158616)
Summary:
Refactored CUDAEventCache from ProcessGroupNCCL.hpp/.cpp into dedicated header and implementation files for better code organization and maintainability.

Split out CUDAEventCache into:
- New header file: CUDAEventCache.hpp
- New implementation file: CUDAEventCache.cpp
- Updated build_variables.bzl to include the new file

This change improves code maintainability, readability, and follows better code organization practices.
---
> Generated by [Confucius Code Assist (CCA)](https://www.internalfb.com/wiki/Confucius/Analect/Shared_Analects/Confucius_Code_Assist_(CCA)/)
[Session](https://www.internalfb.com/confucius?session_id=61b9029a-636b-11f0-9d9a-f1bcc55be1ce&tab=Chat), [Trace](https://www.internalfb.com/confucius?session_id=61b9029a-636b-11f0-9d9a-f1bcc55be1ce&tab=Trace)

Test Plan:
Verified build with:
```
buck build //caffe2/test/distributed:c10d
```
---
> Generated by [Confucius Code Assist (CCA)](https://www.internalfb.com/wiki/Confucius/Analect/Shared_Analects/Confucius_Code_Assist_(CCA)/)
[Session](https://www.internalfb.com/confucius?session_id=61b9029a-636b-11f0-9d9a-f1bcc55be1ce&tab=Chat), [Trace](https://www.internalfb.com/confucius?session_id=61b9029a-636b-11f0-9d9a-f1bcc55be1ce&tab=Trace)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158616
Approved by: https://github.com/fduwjj
2025-07-19 02:51:28 +00:00