pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2026-01-15 12:15:51 +00:00

Author	SHA1	Message	Date
frost-intel	9b4adc4db7	[fr] [xpu] Add FlightRecorder support for ProcessGroupXCCL (#158568 ) Adds support for FlightRecorder in ProcessGroupXCCL. See https://github.com/intel/torch-xpu-ops/pull/1867 for XCCL implementation and more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158568 Approved by: https://github.com/guangyey, https://github.com/fduwjj	2025-08-22 09:03:35 +00:00
dolpm	ff4f5dd8ed	[nativert] oss layout planner tests (#160942 ) Summary: att - changed one of the tests to get rid of torcharrow dep. Test Plan: ``` buck2 test //caffe2/test/cpp/nativert:layout_planner_tests Tests finished: Pass 15. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` Rollback Plan: Reviewed By: SherlockNoMad Differential Revision: D80108549 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160942 Approved by: https://github.com/georgiaphillips, https://github.com/henryoier	2025-08-22 00:26:25 +00:00
dolpm	958f9ca88e	[nativert] oss static kernel tests (#161087 ) Summary: att - should be no-op Test Plan: buck2 test //caffe2/test/cpp/nativert:static_kernel_ops_tests Tests finished: Pass 24. Fail 0. Fatal 0. Skip 0. Build failure 0 Rollback Plan: Differential Revision: D80216488 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161087 Approved by: https://github.com/georgiaphillips, https://github.com/henryoier	2025-08-21 19:42:21 +00:00
dolpm	67b98da1b2	[nativert] oss static kernel test utils (#161086 ) Summary: att - should be a no-op Test Plan: ci Rollback Plan: Differential Revision: D80214768 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161086 Approved by: https://github.com/georgiaphillips	2025-08-21 04:49:06 +00:00
dolpm	1471b20cb3	add static dispatch kernel registration to open source (#160439 ) Summary: static dispatch registry should be moved to open source. the rest can maintain internally for now, since delegates will all go through ET hop. Test Plan: spot checked existing tests and didn't see any missing registrations Differential Revision: D80099377 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160439 Approved by: https://github.com/SherlockNoMad, https://github.com/zhxchen17	2025-08-20 17:58:00 +00:00
dolpm	b439675ae2	[nativert] oss pass graph pass registration (#160859 ) Summary: att Test Plan: CI Rollback Plan: Differential Revision: D80368343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160859 Approved by: https://github.com/georgiaphillips	2025-08-18 22:23:38 +00:00
dolpm	138413907a	[nativert] oss subgraph rewriter (#160780 ) Summary: att Test Plan: ci Rollback Plan: Differential Revision: D80367765 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160780 Approved by: https://github.com/SherlockNoMad, https://github.com/georgiaphillips	2025-08-18 04:25:05 +00:00
Catherine Lee	80dd05e31e	Disable flaky cpp test RecordDebugHandles.Basic (#160577 ) Test is flaky and sometimes hangs in CI Here's an example of the failure: https://github.com/pytorch/pytorch/actions/runs/16946153494/job/48027937663 ``` 2025-08-13T20:54:00.1223688Z ==================================== RERUNS ==================================== 2025-08-13T20:54:00.1224156Z ___________________________ RecordDebugHandles.Basic ___________________________ 2025-08-13T20:54:00.1224682Z [gw2] linux -- Python 3.13.5 /opt/conda/envs/py_3.13/bin/python3.13 2025-08-13T20:54:00.1225568Z Internal Error: calling /opt/conda/envs/py_3.13/lib/python3.13/site-packages/torch/bin/test_jit for test RecordDebugHandles.Basic failed (returncode=-6): 2025-08-13T20:54:00.1226430Z CUDA not available. Disabling CUDA and MultiCUDA tests 2025-08-13T20:54:00.1226988Z Note: Google Test filter = RecordDebugHandles.Basic-_CUDA:_MultiCUDA 2025-08-13T20:54:00.1227450Z [==========] Running 1 test from 1 test suite. 2025-08-13T20:54:00.1227792Z [----------] Global test environment set-up. 2025-08-13T20:54:00.1228145Z [----------] 1 test from RecordDebugHandles 2025-08-13T20:54:00.1228492Z [ RUN ] RecordDebugHandles.Basic 2025-08-13T20:54:00.1228822Z [ OK ] RecordDebugHandles.Basic (1 ms) 2025-08-13T20:54:00.1229204Z [----------] 1 test from RecordDebugHandles (1 ms total) 2025-08-13T20:54:00.1229501Z 2025-08-13T20:54:00.1229666Z [----------] Global test environment tear-down 2025-08-13T20:54:00.1230033Z [==========] 1 test from 1 test suite ran. (1 ms total) 2025-08-13T20:54:00.1230355Z [ PASSED ] 1 test. 2025-08-13T20:54:00.1230727Z terminate called after throwing an instance of 'std::system_error' 2025-08-13T20:54:00.1231154Z what(): Invalid argument 2025-08-13T20:54:00.1231416Z unknown file:0: C++ failure 2025-08-13T20:54:00.1231788Z ------------------------------ Captured c++ call ------------------------------- 2025-08-13T20:54:00.1232262Z CUDA not available. Disabling CUDA and MultiCUDA tests 2025-08-13T20:54:00.1232745Z Note: Google Test filter = RecordDebugHandles.Basic-_CUDA:_MultiCUDA 2025-08-13T20:54:00.1233199Z [==========] Running 1 test from 1 test suite. 2025-08-13T20:54:00.1233557Z [----------] Global test environment set-up. 2025-08-13T20:54:00.1233915Z [----------] 1 test from RecordDebugHandles 2025-08-13T20:54:00.1234247Z [ RUN ] RecordDebugHandles.Basic 2025-08-13T20:54:00.1234590Z [ OK ] RecordDebugHandles.Basic (1 ms) 2025-08-13T20:54:00.1235020Z [----------] 1 test from RecordDebugHandles (1 ms total) 2025-08-13T20:54:00.1235304Z 2025-08-13T20:54:00.1235431Z [----------] Global test environment tear-down 2025-08-13T20:54:00.1235793Z [==========] 1 test from 1 test suite ran. (1 ms total) 2025-08-13T20:54:00.1236126Z [ PASSED ] 1 test. 2025-08-13T20:54:00.1236481Z terminate called after throwing an instance of 'std::system_error' 2025-08-13T20:54:00.1236906Z what(): Invalid argument 2025-08-13T20:54:00.1237287Z ___________________________ RecordDebugHandles.Basic ___________________________ 2025-08-13T20:54:00.1237800Z [gw2] linux -- Python 3.13.5 /opt/conda/envs/py_3.13/bin/python3.13 2025-08-13T20:54:00.1238686Z Internal Error: calling /opt/conda/envs/py_3.13/lib/python3.13/site-packages/torch/bin/test_jit for test RecordDebugHandles.Basic failed (returncode=-6): 2025-08-13T20:54:00.1239551Z CUDA not available. Disabling CUDA and MultiCUDA tests 2025-08-13T20:54:00.1240048Z Note: Google Test filter = RecordDebugHandles.Basic-_CUDA:_MultiCUDA 2025-08-13T20:54:00.1240495Z [==========] Running 1 test from 1 test suite. 2025-08-13T20:54:00.1240848Z [----------] Global test environment set-up. 2025-08-13T20:54:00.1241199Z [----------] 1 test from RecordDebugHandles 2025-08-13T20:54:00.1241542Z [ RUN ] RecordDebugHandles.Basic 2025-08-13T20:54:00.1241871Z [ OK ] RecordDebugHandles.Basic (1 ms) 2025-08-13T20:54:00.1242249Z [----------] 1 test from RecordDebugHandles (1 ms total) 2025-08-13T20:54:00.1242503Z 2025-08-13T20:54:00.1242641Z [----------] Global test environment tear-down 2025-08-13T20:54:00.1242993Z [==========] 1 test from 1 test suite ran. (19 ms total) 2025-08-13T20:54:00.1243329Z [ PASSED ] 1 test. 2025-08-13T20:54:00.1243697Z terminate called after throwing an instance of 'std::system_error' 2025-08-13T20:54:00.1244113Z what(): Invalid argument 2025-08-13T20:54:00.1244392Z unknown file:0: C++ failure 2025-08-13T20:54:00.1244759Z ------------------------------ Captured c++ call ------------------------------- 2025-08-13T20:54:00.1245235Z CUDA not available. Disabling CUDA and MultiCUDA tests 2025-08-13T20:54:00.1283768Z ============== 1 failed, 568 passed, 2 rerun in 115.57s (0:01:55) ============== ``` Here's an example of the hang: https://github.com/pytorch/pytorch/actions/runs/16942186826/job/48015238944 Logs aren't super helpful other than stating that it took a long time. Usually this file takes <2min to run ``` 2025-08-13T18:43:24.6586481Z [gw0] [ 97%] PASSED [1.4119s] ../../../../../opt/conda/envs/py_3.13/lib/python3.13/site-packages/torch/bin/test_jit::PyTorch/LiteInterpreterDynamicTypeTestFixture::Conformance/8 2025-08-13T18:43:24.6587278Z [gw1] [ 97%] PASSED [1.4866s] ../../../../../opt/conda/envs/py_3.13/lib/python3.13/site-packages/torch/bin/test_jit::PyTorch/LiteInterpreterDynamicTypeTestFixture::Conformance/9 Command took >30min, returning 124 2025-08-13T18:43:24.6587288Z 2025-08-13T18:43:24.6587632Z FINISHED PRINTING LOG FILE of cpp/test_jit 1/1 (test/test-reports/cpp.test_jit_1.1_c259e5a152845991_.log) 2025-08-13T18:43:24.6587639Z ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/160577 Approved by: https://github.com/huydhn	2025-08-15 15:59:21 +00:00
cyy	10e3514c96	Remove tensorexpr tests (#158928 ) The tests are not maintained. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158928 Approved by: https://github.com/albanD, https://github.com/malfet	2025-08-09 02:21:22 +00:00
Jane Xu	1690c0c3a0	[Reland] Migrate ScalarType to headeronly (#159911 ) The non ghstack version of #159416, to make sure we don't get reverted again Pull Request resolved: https://github.com/pytorch/pytorch/pull/159911 Approved by: https://github.com/mikaylagawarecki	2025-08-06 07:36:37 +00:00
Jane Xu	3ddfd46bd2	Cut a version of TORCH_ERROR_CODE_CHECK in headeronly from AOTI (#159604 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159604 Approved by: https://github.com/albanD, https://github.com/desertfire	2025-08-06 00:29:56 +00:00
yewentao256	fd6655a0f5	Feature: Implement support for `cudnn_batch_norm_out` kernel to replace the autogen approach. (#123020 ) Fixes #115611 Autogen kernel may cause redundant copy, so we develop the kernel to improve efficiency. Test Case: ```c++ #include <torch/torch.h> #include <iostream> #include <ATen/ATen.h> #include <ATen/cuda/CUDAContext.h> int main() { auto input = torch::rand({2, 3, 4, 4}, torch::device(torch::kCUDA)); auto weight = torch::randn({3}, torch::device(torch::kCUDA)); auto bias = torch::randn({3}, torch::device(torch::kCUDA)); auto running_mean = torch::zeros({3}, torch::device(torch::kCUDA)); auto running_var = torch::ones({3}, torch::device(torch::kCUDA)); bool training = true; double exponential_average_factor = 0.1; double epsilon = 1e-5; auto output = torch::empty_like(input); auto save_mean = torch::empty({3}, torch::device(torch::kCUDA)); auto save_var = torch::empty({3}, torch::device(torch::kCUDA)); auto reserve = torch::empty({0}, torch::device(torch::kCUDA)); // empty place-holder at::native::cudnn_batch_norm_out(input, weight, bias, running_mean, running_var, training, exponential_average_factor, epsilon, output, save_mean, save_var, reserve); auto outputs = at::native::cudnn_batch_norm(input, weight, bias, running_mean, running_var, training, exponential_average_factor, epsilon); bool is_close_output = torch::allclose(output, std::get<0>(outputs)); bool is_close_save_mean = torch::allclose(save_mean, std::get<1>(outputs)); bool is_close_save_var = torch::allclose(save_var, std::get<2>(outputs)); bool is_close_reserve = torch::allclose(reserve, std::get<3>(outputs)); std::cout << "Is output close: " << is_close_output << std::endl; std::cout << "Is save_mean close: " << is_close_save_mean << std::endl; std::cout << "Is save_var close: " << is_close_save_var << std::endl; std::cout << "Is reserve close: " << is_close_reserve << std::endl; return 0; } ``` Please CC @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/123020 Approved by: https://github.com/andrewor14, https://github.com/eqy, https://github.com/albanD	2025-08-04 22:40:33 +00:00
PyTorch MergeBot	7e8197e34d	Revert "Migrate ScalarType to headeronly (#159416 )" This reverts commit `1371a98b0e`. Reverted https://github.com/pytorch/pytorch/pull/159416 on behalf of https://github.com/izaitsevfb due to breaking internal builds, see D79452481 ([comment](https://github.com/pytorch/pytorch/pull/159416#issuecomment-3152138508))	2025-08-04 19:55:09 +00:00
Jane Xu	8ea86a6e31	Actually test STD_TORCH_CHECK, add testfile to CMake (#159603 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159603 Approved by: https://github.com/Skylion007, https://github.com/albanD	2025-08-01 19:53:41 +00:00
Jane Xu	1371a98b0e	Migrate ScalarType to headeronly (#159416 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159416 Approved by: https://github.com/albanD ghstack dependencies: #159415, #159411	2025-08-01 16:07:01 +00:00
Jane Xu	b95cf5c91d	Move complex to headeronly (#159411 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159411 Approved by: https://github.com/albanD ghstack dependencies: #159415	2025-07-31 22:05:43 +00:00
Jane Xu	5e2ef2a465	Move Float8 variations to headeronly (#159415 ) This PR is a big copy pasta from `c10/util/Float8*` -> `torch/headeronly/util/` which is why we are breaking PR sanity :C (sorry @albanD!). Why is it not a clean copy paste? - For BC reasons, we have to keep the old c10 file around so that OSS devs relying on those files can still get the same APIs - Because we reexpose APIs that are headeronly through torch::headeronly, so there is an extra chunk of code in the new torch::headeronly files to do that. Outside of the copy paste, I: - changed the tests to call torch::headeronly instead of c10 - updated header_only_apis.txt - added `// NOLINTNEXTLINE(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions)` to pass lint (which was previously skipped for -inl.h files) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159415 Approved by: https://github.com/albanD	2025-07-31 22:05:43 +00:00
Sherlock Huang	c1722db0f7	[NativeRT] Make VariadicOpConverter and FuseListUnpackConverter for cpu nodes only (#159519 ) Summary: VariadicOpConverter and FuseListUnpackConverter would introduce ops that only have CPU kernels. Currently, the graph passes are ran if static_dispatch is enabled. As we plan to enable static_dispatch by default, this diff add the additional check for the graph pass to only work on the node that has all the inputs/outputs on CPU. Test Plan: CI Rollback Plan: Differential Revision: D79295640 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159519 Approved by: https://github.com/dolpm, https://github.com/henryoier	2025-07-31 18:17:21 +00:00
Jane Xu	c57382a493	Move BFloat16.h to headeronly (#159412 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159412 Approved by: https://github.com/desertfire	2025-07-31 15:29:17 +00:00
Jane Xu	259e79e3ff	Move Half to headeronly (#159172 ) Essence of this copypasta: - combine Half-inl.h and Half.h in c10/util -> torch/headeronly/util/Half.h - Add NOLINTNEXTLINE's to the portions of Half-inl.h that were previously in the ignore list of clangtidy - Re-expose all APIs in namespaces and through includes of the original files. Ideally, we would have the APIs in torch::headeronly and reexpose them in c10, but that runs into BC issues (see D78997465) so for now we are keeping the APIs in c10 but reexposing them in torch::headeronly. - Change test cases in test_aoti_abi_check to test torch::headeronly::Half vs c10::Half (they're the same thing but we eventually want all the tests for headeronly APIs to only import from headeronly). Pull Request resolved: https://github.com/pytorch/pytorch/pull/159172 Approved by: https://github.com/albanD, https://github.com/desertfire	2025-07-30 16:11:58 +00:00
Jane Xu	b268f22ab2	Move Float4 to headeronly (#159414 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159414 Approved by: https://github.com/desertfire	2025-07-30 15:34:01 +00:00
PyTorch MergeBot	eaadd1282c	Revert "Move Half to headeronly (#159172 )" This reverts commit `6d0f4566e2`. Reverted https://github.com/pytorch/pytorch/pull/159172 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/16613893793/job/47002486679) [HUD commit link](`6d0f4566e2`). Note to self: why isn't Dr. CI updating ([comment](https://github.com/pytorch/pytorch/pull/159172#issuecomment-3136769493))	2025-07-30 15:10:26 +00:00
Jane Xu	6d0f4566e2	Move Half to headeronly (#159172 ) Essence of this copypasta: - combine Half-inl.h and Half.h in c10/util -> torch/headeronly/util/Half.h - Add NOLINTNEXTLINE's to the portions of Half-inl.h that were previously in the ignore list of clangtidy - Re-expose all APIs in namespaces and through includes of the original files. Ideally, we would have the APIs in torch::headeronly and reexpose them in c10, but that runs into BC issues (see D78997465) so for now we are keeping the APIs in c10 but reexposing them in torch::headeronly. - Change test cases in test_aoti_abi_check to test torch::headeronly::Half vs c10::Half (they're the same thing but we eventually want all the tests for headeronly APIs to only import from headeronly). Pull Request resolved: https://github.com/pytorch/pytorch/pull/159172 Approved by: https://github.com/albanD, https://github.com/desertfire	2025-07-30 05:02:13 +00:00
Jane Xu	96ac64d00c	Migrate easy q(u)int/bits stuff to torch/headeronly (#159302 ) Straightup copy pasta. Keeps APIs in c10 and reexposes them to torch::headeronly. It is arguable that we should just get rid of some of these unused dtypes but that is outside the scope of this PR, which is meant to build up to ScalarType moving to headeronly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159302 Approved by: https://github.com/malfet, https://github.com/albanD	2025-07-30 03:41:27 +00:00
PyTorch MergeBot	e288c258f7	Revert "Remove tensorexpr tests (#158928 )" This reverts commit `d742a2896c`. Reverted https://github.com/pytorch/pytorch/pull/158928 on behalf of https://github.com/yangw-dev due to this breaks bunch of internal dependency since some tests are still using the deleted test files from this pr, the internal reviewer please help fix this using codev ([comment](https://github.com/pytorch/pytorch/pull/158928#issuecomment-3134378616))	2025-07-29 23:32:07 +00:00
Zhengxu Chen	8460131087	[nativert] Add OSS version of ModelRunner (#159268 ) Summary: Implement a ModelRunner from scratch with the minimum features for OSS only Test Plan: test_export -r NativeRT Rollback Plan: Differential Revision: D78979812 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159268 Approved by: https://github.com/dolpm	2025-07-29 21:08:14 +00:00
Jane Xu	222fa451a2	Move some of vec into headeronly in preparation for Half.h (#158976 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158976 Approved by: https://github.com/albanD, https://github.com/desertfire	2025-07-29 05:43:53 +00:00
Mu-Chu Lee	19ce1beb05	[AOTInductor] Add test for enabling CUDACachingAllocator for AOTInductor's Weight (#159279 ) Summary: Add test for enabling CUDACachingAllocator for AOTInductor's Weight. Implementation TBD Test Plan: N/A, commit is adding a test. Rollback Plan: Differential Revision: D79107507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159279 Approved by: https://github.com/desertfire, https://github.com/jingsh	2025-07-29 02:52:10 +00:00
zhxchen17	c06164a9c5	[nativert][ez] Remove unused dist collectives ops. (#159220 ) Removing dependency to c10d/ in ExecutionFrame.h. We don't need c10d::Work in the frame. Differential Revision: [D79041618](https://our.internmc.facebook.com/intern/diff/D79041618/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159220 Approved by: https://github.com/SherlockNoMad, https://github.com/dolpm	2025-07-28 16:03:14 +00:00
Sherlock Huang	1abff80fae	Reland D78841818 (#159216 ) Summary: Relanding D78841818 with fixes Test Plan: Tested all failing tests buck build --config fbcode.use_link_groups=true --flagfile fbcode//mode/dev-nosan fbcode//sigmoid/core/executor/memory/test:layout_planner_tests buck test 'fbcode//mode/opt' fbcode//sigmoid/inference/test:test_passes Rollback Plan: Reviewed By: hl475 Differential Revision: D79038615 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159216 Approved by: https://github.com/dolpm	2025-07-28 07:39:35 +00:00
cyy	d742a2896c	Remove tensorexpr tests (#158928 ) The tests are not maintained. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158928 Approved by: https://github.com/albanD, https://github.com/malfet	2025-07-27 07:13:27 +00:00
PyTorch MergeBot	f62772f365	Revert "Remove tensorexpr tests (#158928 )" This reverts commit `517eebc1dd`. Reverted https://github.com/pytorch/pytorch/pull/158928 on behalf of https://github.com/ZainRizvi due to Sorry but this breaks trunk test_jit_fuser_te.py::TestNNCOpInfoCPU::test_nnc_correctness_frac_cpu_bfloat16 [GH job link](https://github.com/pytorch/pytorch/actions/runs/16534544469/job/46768022799) [HUD commit link](`517eebc1dd`) ([comment](https://github.com/pytorch/pytorch/pull/158928#issuecomment-3122158944))	2025-07-26 17:01:54 +00:00
PyTorch MergeBot	3db8623dcb	Revert "[NativeRT] Apply Device placement once when loading the graph (#158996 )" This reverts commit `28ee8be5bf`. Reverted https://github.com/pytorch/pytorch/pull/158996 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/158996#issuecomment-3121540050))	2025-07-26 09:05:26 +00:00
cyy	517eebc1dd	Remove tensorexpr tests (#158928 ) The tests are not maintained. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158928 Approved by: https://github.com/albanD, https://github.com/malfet	2025-07-26 01:21:01 +00:00
Yidi Wu	0f31e9a656	[torchbind] fix fakifying a staitc tensor returns dynamic accidentally (#158607 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158607 Approved by: https://github.com/zou3519 ghstack dependencies: #158583, #158606	2025-07-25 20:55:41 +00:00
Yidi Wu	0427e439aa	[test][torchbind] turn on inductor backend for compile torchbind tests (#158606 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158606 Approved by: https://github.com/zou3519 ghstack dependencies: #158583	2025-07-25 20:55:41 +00:00
Yidi Wu	4aa69ae336	[torchbind] support register_autocast for torchbind custom op (#158583 ) Fix https://github.com/pytorch/pytorch/issues/158414 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158583 Approved by: https://github.com/zou3519	2025-07-25 20:55:41 +00:00
Sherlock Huang	28ee8be5bf	[NativeRT] Apply Device placement once when loading the graph (#158996 ) Summary: Placement is leaked to too many classes! In this diff, we consolidate all placement lookup into one place: Graph::ApplyDevicePlacement. After applying placement, the in-memory graph, tensorMeta, weightMeta would already have the re-mapped device. The subsequence weight loading, sample input loading, target device inference would look up the re-mapped device from graph's tensorMeta. graph's tensorMeta becomes the only ground truth! Test Plan: Need to add some tests before landing. This is a big change. Rollback Plan: Differential Revision: D78841818 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158996 Approved by: https://github.com/henryoier	2025-07-25 20:11:35 +00:00
PyTorch MergeBot	9535995bbc	Revert "Remove tensorexpr tests (#158928 )" This reverts commit `a0bc865123`. Reverted https://github.com/pytorch/pytorch/pull/158928 on behalf of https://github.com/clee2000 due to broke cpp static runtime test? [GH job link](https://github.com/pytorch/pytorch/actions/runs/16517697273/job/46715871457) [HUD commit link](`a0bc865123`) ([comment](https://github.com/pytorch/pytorch/pull/158928#issuecomment-3118554478))	2025-07-25 15:22:51 +00:00
cyy	a0bc865123	Remove tensorexpr tests (#158928 ) The tests are not maintained. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158928 Approved by: https://github.com/albanD	2025-07-25 08:37:51 +00:00
PyTorch MergeBot	751285cb22	Revert "Move some of vec into headeronly in preparation for Half.h (#158976 )" This reverts commit `5564f2ca2e`. Reverted https://github.com/pytorch/pytorch/pull/158976 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. See D78924504 for details. To validate your fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/158976#issuecomment-3115198443))	2025-07-24 22:31:49 +00:00
PyTorch MergeBot	13398dab79	Revert "Remove tensorexpr tests (#158928 )" This reverts commit `a3f9f79f59`. Reverted https://github.com/pytorch/pytorch/pull/158928 on behalf of https://github.com/clee2000 due to Theres still some references to the things removed in this PR in test.sh, the jobs on this PR are failing because of that but log classifier is probably pointing to a wrong line, should be an easy fix tho ([comment](https://github.com/pytorch/pytorch/pull/158928#issuecomment-3114873706))	2025-07-24 20:45:30 +00:00
Jane Xu	5564f2ca2e	Move some of vec into headeronly in preparation for Half.h (#158976 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158976 Approved by: https://github.com/albanD, https://github.com/desertfire	2025-07-24 20:32:33 +00:00
cyy	a3f9f79f59	Remove tensorexpr tests (#158928 ) The tests are not maintained. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158928 Approved by: https://github.com/albanD	2025-07-24 15:38:36 +00:00
Sherlock Huang	fb067de550	[NativeRT] Remove device_ member from OpKernel base class (#158944 ) Summary: In general, device_ is not very useful in OpKernel. Remove it to avoid misuse. Also, the meaning of `device_` is also ambiguous in the OpKernel. For StaticDispatch kernels, we always call cpu kernel. For C10Kernel, we rely on input tensor's device and dispatcher to determine which device to run on. For ops involves multiple device, e.g. aten._to_copy(device), the meaning of device is ill-defined. Test Plan: CI Rollback Plan: Reviewed By: henryoier, dolpm, kqfu, zhxchen17 Differential Revision: D78704840 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158944 Approved by: https://github.com/dolpm	2025-07-24 09:21:37 +00:00
Benjamin Glass	4060f30042	[AOTI] Convert C-struct zip handling to RAII container (#158687 ) Attempts to fix a memory leak reported in #158614 by wrapping manually managed MiniZ C-structs in an RAII container. I have been unable to reproduce the reported leak, but this seems like the most likely candidate. Fixes #158614 (hopefully) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158687 Approved by: https://github.com/desertfire	2025-07-22 16:01:51 +00:00
cyy	3639d29ea1	Fix warnings of unused-variable (#158627 ) Fixes ``` /var/lib/jenkins/workspace/test/cpp/tensorexpr/test_kernel.cpp:42:22: error: unused variable 'verification_pattern' [-Werror,-Wunused-variable] ``` and also extra semicolons. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158627 Approved by: https://github.com/albanD	2025-07-22 02:49:06 +00:00
PyTorch MergeBot	97d7dc197f	Revert "[AOTI] Convert C-struct zip handling to RAII container (#158687 )" This reverts commit `8ed5e1844c`. Reverted https://github.com/pytorch/pytorch/pull/158687 on behalf of https://github.com/ZainRizvi due to Sorry but I had to revert this PR in order to revert https://github.com/pytorch/pytorch/pull/158671 ([comment](https://github.com/pytorch/pytorch/pull/158687#issuecomment-3099515618))	2025-07-21 22:13:26 +00:00
Benjamin Glass	8ed5e1844c	[AOTI] Convert C-struct zip handling to RAII container (#158687 ) Attempts to fix a memory leak reported in #158614 by wrapping manually managed MiniZ C-structs in an RAII container. I have been unable to reproduce the reported leak, but this seems like the most likely candidate. Fixes #158614 (hopefully) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158687 Approved by: https://github.com/desertfire	2025-07-21 18:53:14 +00:00
Tristan Rice	ab557421a4	[cca] [c10d] Refactor CUDAEventCache into separate files (#158616 ) Summary: Refactored CUDAEventCache from ProcessGroupNCCL.hpp/.cpp into dedicated header and implementation files for better code organization and maintainability. Split out CUDAEventCache into: - New header file: CUDAEventCache.hpp - New implementation file: CUDAEventCache.cpp - Updated build_variables.bzl to include the new file This change improves code maintainability, readability, and follows better code organization practices. --- > Generated by [Confucius Code Assist (CCA)](https://www.internalfb.com/wiki/Confucius/Analect/Shared_Analects/Confucius_Code_Assist_(CCA)/) [Session](https://www.internalfb.com/confucius?session_id=61b9029a-636b-11f0-9d9a-f1bcc55be1ce&tab=Chat), [Trace](https://www.internalfb.com/confucius?session_id=61b9029a-636b-11f0-9d9a-f1bcc55be1ce&tab=Trace) Test Plan: Verified build with: ``` buck build //caffe2/test/distributed:c10d ``` --- > Generated by [Confucius Code Assist (CCA)](https://www.internalfb.com/wiki/Confucius/Analect/Shared_Analects/Confucius_Code_Assist_(CCA)/) [Session](https://www.internalfb.com/confucius?session_id=61b9029a-636b-11f0-9d9a-f1bcc55be1ce&tab=Chat), [Trace](https://www.internalfb.com/confucius?session_id=61b9029a-636b-11f0-9d9a-f1bcc55be1ce&tab=Trace) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158616 Approved by: https://github.com/fduwjj	2025-07-19 02:51:28 +00:00

1 2 3 4 5 ...

2459 Commits