pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2026-01-15 12:15:51 +00:00

Author	SHA1	Message	Date
albanD	1913ee1aec	Assert in github, docs, setup and top (#170598 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/170598 Approved by: https://github.com/ezyang, https://github.com/cyyever	2026-01-01 05:27:53 +00:00
Masaki Kozuki	28f22d94eb	Touch `__init__.py` in `vendored_templates` for CuTeDSL Grouped MM template (#170566 ) This pull request makes a small improvement to the `mirror_inductor_external_kernels` function in `setup.py` to ensure that newly created directories are recognized as Python packages by `find_packages`. * When creating a new directory for mirrored files, the code now adds an empty `__init__.py` file to the directory so that `find_packages` treats it as a submodule.Added inclusion of vendored_templates for CuTeDSL. This is to fix `ModuleNotFoundError` of `torch._inductor.kernel.vendored_templates` ``` $ pytest -v -s -x test/inductor/test_cutedsl_grouped_mm.py ... FAILED [2.2725s] test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_contiguous_layout_B_broadcasted - torch._inductor.exc.InductorError: ModuleNotFoundError: No module named 'torch._inductor.kernel.vendored_templates' ``` The error wouldn't be there if pytorch is installed in development mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/170566 Approved by: https://github.com/Skylion007	2025-12-20 03:09:07 +00:00
Puneet Matharu	7a38744ffa	[AArch64][Build] allow missing cutlass file if CUDA disabled (#167720 ) In a CUDA-disabled build of PyTorch, you may well want to wipe the `third_party/cutlass` directory. However, this can produce an error in `setup.py:mirror_inductor_external_kernels()`. This patch ignores the missing file if `USE_CUDA=0` is set before calling `setup.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/167720 Approved by: https://github.com/NikhilAPatel, https://github.com/fadara01, https://github.com/aditew01	2025-12-08 13:02:31 +00:00
Mikayla Gawarecki	892640e25a	Hide all symbols (except stable/headeronly/shim) if TORCH_STABLE_ONLY is defined (#167496 ) Fixes https://github.com/pytorch/pytorch/issues/161660 This extends the `TORCH_STABLE_ONLY` stopgap added in https://github.com/pytorch/pytorch/pull/161658 Pull Request resolved: https://github.com/pytorch/pytorch/pull/167496 Approved by: https://github.com/janeyx99, https://github.com/malfet, https://github.com/atalman	2025-12-02 13:10:20 +00:00
PyTorch MergeBot	acf5b204b0	Revert "Hide all symbols (except stable/headeronly/shim) if TORCH_STABLE_ONLY is defined (#167496 )" This reverts commit `8f4dc30453`. Reverted https://github.com/pytorch/pytorch/pull/167496 on behalf of https://github.com/atalman due to Failing validations - https://github.com/pytorch/test-infra/actions/runs/19513141127/job/55857898996 ([comment](https://github.com/pytorch/pytorch/pull/167496#issuecomment-3554287955))	2025-11-19 19:26:12 +00:00
PyTorch MergeBot	a097e166db	Revert "Error when non stable/headeronly/shim headers are included by stable extension (#167855 )" This reverts commit `a0ccd3e5ff`. Reverted https://github.com/pytorch/pytorch/pull/167855 on behalf of https://github.com/atalman due to Failing validations ([comment](https://github.com/pytorch/pytorch/pull/167855#issuecomment-3553987894))	2025-11-19 17:59:50 +00:00
Mikayla Gawarecki	a0ccd3e5ff	Error when non stable/headeronly/shim headers are included by stable extension (#167855 ) Address Nikita's offline comment on #167496 Pull Request resolved: https://github.com/pytorch/pytorch/pull/167855 Approved by: https://github.com/janeyx99 ghstack dependencies: #167496	2025-11-19 14:13:45 +00:00
Mikayla Gawarecki	8f4dc30453	Hide all symbols (except stable/headeronly/shim) if TORCH_STABLE_ONLY is defined (#167496 ) Fixes https://github.com/pytorch/pytorch/issues/161660 This extends the `TORCH_STABLE_ONLY` stopgap added in https://github.com/pytorch/pytorch/pull/161658 Pull Request resolved: https://github.com/pytorch/pytorch/pull/167496 Approved by: https://github.com/janeyx99, https://github.com/malfet	2025-11-19 14:13:45 +00:00
Tristan Rice	f6b54d8899	flight_recorder: move to torch.distributed (#167782 ) Summary: This moves torchfrtrace to be under `torch.distributed.flight_recorder` instead of `tools.flight_recorder` as the `tools` package is not included in the torch wheels. This makes it so you can use fr trace analyze without using it from a source checkout Test Plan: ``` buck run //caffe2/fb/flight_recorder:fr_trace ``` CI Differential Revision: D87022129 Pull Request resolved: https://github.com/pytorch/pytorch/pull/167782 Approved by: https://github.com/fduwjj	2025-11-15 01:16:59 +00:00
PyTorch MergeBot	602102be50	Revert "Hide all symbols (except stable/headeronly/shim) if TORCH_STABLE_ONLY is defined (#167496 )" This reverts commit `bc09a84150`. Reverted https://github.com/pytorch/pytorch/pull/167496 on behalf of https://github.com/jeanschmidt due to trying to revert 165139, my intention is to land it again, so, will land this once both are reverted ([comment](https://github.com/pytorch/pytorch/pull/167496#issuecomment-3534641209))	2025-11-14 21:33:02 +00:00
Mikayla Gawarecki	bc09a84150	Hide all symbols (except stable/headeronly/shim) if TORCH_STABLE_ONLY is defined (#167496 ) Fixes https://github.com/pytorch/pytorch/issues/161660 This extends the `TORCH_STABLE_ONLY` stopgap added in https://github.com/pytorch/pytorch/pull/161658 Pull Request resolved: https://github.com/pytorch/pytorch/pull/167496 Approved by: https://github.com/janeyx99 ghstack dependencies: #167495	2025-11-12 19:15:52 +00:00
Nikhil Patel	a4c7856112	[Inductor][Grouped Gemm] Add Blackwell CuTeDSL Kernel (#167340 ) Summary: This is a reland of https://github.com/pytorch/pytorch/pull/165036, which previously contained a minor bug in the logic that determined whether the kernel should be enabled. As a result, it was incorrectly activated on non-Blackwell GPUs. Test Plan: Inductor test (fbcode): `INDUCTOR_TEST_DISABLE_FRESH_CACHE=1 TORCHINDUCTOR_CACHE_DIR=~/cutetest buck2 run mode/opt //caffe2/test/inductor:cutedsl_grouped_mm -c fbcode.nvcc_arch=b200a -c fbcode.enable_gpu_sections=true -c fbcode.platform010_cuda_version=12.8 -m "ovr_config//third-party/pypi/nvidia-cutlass-dsl/constraints:4.2.1"` Tritonbench (fbcode): `clear; CUDA_VISIBLE_DEVICES=7 TRITON_PRINT_AUTOTUNING=1 TRITON_ALWAYS_COMPILE=1 TORCH_LOGS=+inductor TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 buck2 run mode/opt //pytorch/tritonbench:run -c fbcode.nvcc_arch=b200a -c fbcode.enable_gpu_sections=true -c fbcode.platform010_cuda_version=12.8 -m "ovr_config//third-party/pypi/nvidia-cutlass-dsl/constraints:4.2.1" -- --op grouped_gemm --only aten_grouped_mm,preprocessed_pt2_cute_grouped_mm --precision bf16 --num-inputs 1 --metrics tflops,accuracy` Tritonbench(oss): `clear; CUDA_VISIBLE_DEVICES=2 TRITON_PRINT_AUTOTUNING=1 TRITON_ALWAYS_COMPILE=1 TORCH_LOGS=+inductor TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 python run.py --op grouped_gemm --only aten_grouped_mm,preprocessed_pt2_triton_grouped_mm --precision bf16 --num-inputs 1 --metrics tflops,accuracy` Unit Tests(oss): `clear; python test/inductor/test_cutedsl_grouped_mm.py` Differential Revision: D86537373 Pull Request resolved: https://github.com/pytorch/pytorch/pull/167340 Approved by: https://github.com/jananisriram	2025-11-10 00:29:07 +00:00
PyTorch MergeBot	12860892f8	Revert "[Inductor][Grouped Gemm] Add Blackwell CuTeDSL Kernel (#167182 )" This reverts commit `77b70970f7`. Reverted https://github.com/pytorch/pytorch/pull/167182 on behalf of https://github.com/NikhilAPatel due to breaks local source build ([comment](https://github.com/pytorch/pytorch/pull/167182#issuecomment-3503598156))	2025-11-07 16:45:23 +00:00
Nikhil Patel	77b70970f7	[Inductor][Grouped Gemm] Add Blackwell CuTeDSL Kernel (#167182 ) Summary: This is a reland of https://github.com/pytorch/pytorch/pull/165036, which previously contained a minor bug in the logic that determined whether the kernel should be enabled. As a result, it was incorrectly activated on non-Blackwell GPUs. Test Plan: Inductor test (fbcode): `INDUCTOR_TEST_DISABLE_FRESH_CACHE=1 TORCHINDUCTOR_CACHE_DIR=~/cutetest buck2 run mode/opt //caffe2/test/inductor:cutedsl_grouped_mm -c fbcode.nvcc_arch=b200a -c fbcode.enable_gpu_sections=true -c fbcode.platform010_cuda_version=12.8 -m "ovr_config//third-party/pypi/nvidia-cutlass-dsl/constraints:4.2.1"` Tritonbench (fbcode): `clear; CUDA_VISIBLE_DEVICES=7 TRITON_PRINT_AUTOTUNING=1 TRITON_ALWAYS_COMPILE=1 TORCH_LOGS=+inductor TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 buck2 run mode/opt //pytorch/tritonbench:run -c fbcode.nvcc_arch=b200a -c fbcode.enable_gpu_sections=true -c fbcode.platform010_cuda_version=12.8 -m "ovr_config//third-party/pypi/nvidia-cutlass-dsl/constraints:4.2.1" -- --op grouped_gemm --only aten_grouped_mm,preprocessed_pt2_cute_grouped_mm --precision bf16 --num-inputs 1 --metrics tflops,accuracy` Tritonbench(oss): `clear; CUDA_VISIBLE_DEVICES=2 TRITON_PRINT_AUTOTUNING=1 TRITON_ALWAYS_COMPILE=1 TORCH_LOGS=+inductor TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 python run.py --op grouped_gemm --only aten_grouped_mm,preprocessed_pt2_triton_grouped_mm --precision bf16 --num-inputs 1 --metrics tflops,accuracy` Unit Tests(oss): `clear; python test/inductor/test_cutedsl_grouped_mm.py` Differential Revision: D86376880 Pull Request resolved: https://github.com/pytorch/pytorch/pull/167182 Approved by: https://github.com/mlazos, https://github.com/jananisriram	2025-11-06 19:55:38 +00:00
PyTorch MergeBot	5c639466f7	Revert "[Inductor][Grouped Gemm] Add Blackwell CuTeDSL Kernel (#167003 )" This reverts commit `658c5f879c`. Reverted https://github.com/pytorch/pytorch/pull/167003 on behalf of https://github.com/atalman due to regressed vllm signal: [GH job link](https://github.com/pytorch/pytorch/actions/runs/19093785744/job/54553796743) [HUD commit link](`658c5f879c`) ([comment](https://github.com/pytorch/pytorch/pull/167003#issuecomment-3491527704))	2025-11-05 14:30:15 +00:00
Nikhil Patel	658c5f879c	[Inductor][Grouped Gemm] Add Blackwell CuTeDSL Kernel (#167003 ) Summary: This is a reland of https://github.com/pytorch/pytorch/pull/165036?fbclid=IwY2xjawN3RL1leHRuA2FlbQIxMQBicmlkETExOEcxcnVhNVA1TzRSVmhiAR63GOEpJbZA-JhQ0CSj9ji8H_RHBUhDwYNDtxjOYfDol56OGqmC4r7jPP96Fw_aem_bWvtMfVifLQrnpv1YB_fJA, which previously contained a minor bug in the logic that determined whether the kernel should be enabled. As a result, it was incorrectly activated on non-Blackwell GPUs. Test Plan: Inductor test (fbcode): `INDUCTOR_TEST_DISABLE_FRESH_CACHE=1 TORCHINDUCTOR_CACHE_DIR=~/cutetest buck2 run mode/opt //caffe2/test/inductor:cutedsl_grouped_mm -c fbcode.nvcc_arch=b200a -c fbcode.enable_gpu_sections=true -c fbcode.platform010_cuda_version=12.8 -m "ovr_config//third-party/pypi/nvidia-cutlass-dsl/constraints:4.2.1"` Tritonbench (fbcode): `clear; CUDA_VISIBLE_DEVICES=7 TRITON_PRINT_AUTOTUNING=1 TRITON_ALWAYS_COMPILE=1 TORCH_LOGS=+inductor TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 buck2 run mode/opt //pytorch/tritonbench:run -c fbcode.nvcc_arch=b200a -c fbcode.enable_gpu_sections=true -c fbcode.platform010_cuda_version=12.8 -m "ovr_config//third-party/pypi/nvidia-cutlass-dsl/constraints:4.2.1" -- --op grouped_gemm --only aten_grouped_mm,preprocessed_pt2_cute_grouped_mm --precision bf16 --num-inputs 1 --metrics tflops,accuracy` Tritonbench(oss): `clear; CUDA_VISIBLE_DEVICES=2 TRITON_PRINT_AUTOTUNING=1 TRITON_ALWAYS_COMPILE=1 TORCH_LOGS=+inductor TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 python run.py --op grouped_gemm --only aten_grouped_mm,preprocessed_pt2_triton_grouped_mm --precision bf16 --num-inputs 1 --metrics tflops,accuracy` Unit Tests(oss): `clear; python test/inductor/test_cutedsl_grouped_mm.py` Differential Revision: D86231180 Pull Request resolved: https://github.com/pytorch/pytorch/pull/167003 Approved by: https://github.com/jananisriram	2025-11-05 06:51:30 +00:00
PyTorch MergeBot	d77c24caac	Revert "[Inductor][Grouped Gemm] Add Blackwell CuTeDSL Kernel (#165036 )" This reverts commit `0e1a88904f`. Reverted https://github.com/pytorch/pytorch/pull/165036 on behalf of https://github.com/atalman due to regressed vllm signal: [GH job link](https://github.com/pytorch/pytorch/actions/runs/19059329909/job/54439919668) [HUD commit link](`0e1a88904f`) ([comment](https://github.com/pytorch/pytorch/pull/165036#issuecomment-3487846555))	2025-11-04 20:13:33 +00:00
Nikhil Patel	0e1a88904f	[Inductor][Grouped Gemm] Add Blackwell CuTeDSL Kernel (#165036 ) Make sure you're on cutlass 4.2.0+ Test Plan: Tritonbench(oss): `clear; CUDA_VISIBLE_DEVICES=2 TRITON_PRINT_AUTOTUNING=1 TRITON_ALWAYS_COMPILE=1 TORCH_LOGS=+inductor TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 python run.py --op grouped_gemm --only aten_grouped_mm,preprocessed_pt2_triton_grouped_mm --precision bf16 --num-inputs 1 --metrics tflops,accuracy` Unit Tests(oss): `clear; python test/inductor/test_cutedsl_grouped_mm.py` Differential Revision: D82010227 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165036 Approved by: https://github.com/alexsamardzic, https://github.com/drisspg, https://github.com/mlazos	2025-11-04 05:58:58 +00:00
linhaifeng	369f2d6951	[3/N] fix typo in other folders (#166606 ) fix typo in other folders #166374 #166126 _typos.toml ```bash [files] extend-exclude = ["tools/linter/dictionary.txt"] [default.extend-words] nd = "nd" arange = "arange" Nd = "Nd" GLOBALs = "GLOBALs" hte = "hte" iy = "iy" PN = "PN" Dout = "Dout" optin = "optin" gam = "gam" PTD = "PTD" Sur = "Sur" nin = "nin" tme = "tme" inpt = "inpt" mis = "mis" Raison = "Raison" ouput = "ouput" nto = "nto" Onwer = "Onwer" callibrate = "callibrate" ser = "ser" Metdata = "Metdata" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166606 Approved by: https://github.com/ezyang	2025-10-30 10:30:40 +00:00
Jerry Mannil	202f83dc4e	[ROCm][layer_norm] Use __builtin_amdgcn_rcpf(x) instead of 1.f/x (#165589 ) Replace (more) exact calculation with hardware approximation. Benefits: Reduced code size. Improved performance for certain scenarios. Experiments show low reduction in precision. Experiments show no significant performance regressions. bfloat16 as well as float16 related calculations may benefit largely from this change. Co-author: @mhalk @amd-hhashemi Pull Request resolved: https://github.com/pytorch/pytorch/pull/165589 Approved by: https://github.com/jeffdaily	2025-10-17 09:12:30 +00:00
Murray Steele	0fd976b65c	Enable mimalloc on non-Windows platforms and make default for AArch64 builds (#164741 ) This change removes the Windows requirement for mimalloc builds, and makes mimalloc the default c10 system allocator for AArch64 builds. This significantly improves the performance of AArch64 builds of PyTorch as large allocations are better cached by mimalloc than glibc. Updated Results Torchbench FP32 eager Inference, 16 threads: <img width="1510" height="733" alt="mimalloc-v2-fp32-diff" src="https://github.com/user-attachments/assets/7fe3ea0c-3b52-42e7-879b-612444479c90" /> Torchbench BF16 eager Inference, 16 threads: <img width="1510" height="733" alt="mimalloc-v2-bf16-diff" src="https://github.com/user-attachments/assets/56469a72-9e06-4d57-ae2a-aeb139ca79a3" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164741 Approved by: https://github.com/fadara01, https://github.com/aditew01, https://github.com/malfet	2025-10-09 20:49:46 +00:00
PyTorch MergeBot	688efd9741	Revert "Enable mimalloc on non-Windows platforms and make default for AArch64 builds (#164741 )" This reverts commit `87eccf10e8`. Reverted https://github.com/pytorch/pytorch/pull/164741 on behalf of https://github.com/malfet due to But it breaks MacOS builds, see https://github.com/pytorch/pytorch/actions/runs/18382886648/job/52373781138 ([comment](https://github.com/pytorch/pytorch/pull/164741#issuecomment-3386859778))	2025-10-09 17:30:25 +00:00
Murray Steele	87eccf10e8	Enable mimalloc on non-Windows platforms and make default for AArch64 builds (#164741 ) This change removes the Windows requirement for mimalloc builds, and makes mimalloc the default c10 system allocator for AArch64 builds. This significantly improves the performance of AArch64 builds of PyTorch as large allocations are better cached by mimalloc than glibc. Updated Results Torchbench FP32 eager Inference, 16 threads: <img width="1510" height="733" alt="mimalloc-v2-fp32-diff" src="https://github.com/user-attachments/assets/7fe3ea0c-3b52-42e7-879b-612444479c90" /> Torchbench BF16 eager Inference, 16 threads: <img width="1510" height="733" alt="mimalloc-v2-bf16-diff" src="https://github.com/user-attachments/assets/56469a72-9e06-4d57-ae2a-aeb139ca79a3" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164741 Approved by: https://github.com/fadara01, https://github.com/aditew01, https://github.com/malfet	2025-10-09 16:45:31 +00:00
atalman	98c4e35f14	[CD] Add statically linked windows libraries to exclude list (#163768 ) Fixes: https://github.com/pytorch/pytorch/issues/159514 Seeing following in the Wheel build logs: ``` Linking CXX static library lib\kineto.lib Linking CXX static library lib\dnnl.lib .... ``` These files are around 800MB uncompressed and 109MB compressed, hence provide ~50% size reduction for Windows CPU builds. Test Plan: Build Pytorch Windows binary. Build vision, audio and torchcodec with this binary. Smoke test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163768 Approved by: https://github.com/albanD, https://github.com/malfet	2025-09-25 14:03:14 +00:00
Edward Yang	2c5a3d7e60	Delete functorch C extension entirely. (#163340 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/163340 Approved by: https://github.com/aorenste, https://github.com/wdvr, https://github.com/albanD, https://github.com/malfet	2025-09-24 06:08:58 +00:00
Nikita Shulga	5e7be98800	[BE] Update Python min version to 3.10 (#162310 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162310 Approved by: https://github.com/atalman, https://github.com/Skylion007, https://github.com/ZainRizvi	2025-09-22 17:04:21 +00:00
PyTorch MergeBot	10adeb9044	Revert "[BE] Update Python min version to 3.10 (#162310 )" This reverts commit `9f5a644f07`. Reverted https://github.com/pytorch/pytorch/pull/162310 on behalf of https://github.com/malfet due to Broke lint, but to the best of my knowledge it's no longer possible to run lint for all files on PRs ([comment](https://github.com/pytorch/pytorch/pull/162310#issuecomment-3319289031))	2025-09-22 14:13:59 +00:00
Nikita Shulga	9f5a644f07	[BE] Update Python min version to 3.10 (#162310 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162310 Approved by: https://github.com/atalman, https://github.com/Skylion007, https://github.com/ZainRizvi	2025-09-22 13:37:02 +00:00
PyTorch MergeBot	ae5be038a6	Revert "Delete functorch C extension entirely. (#163340 )" This reverts commit `1faf6367e3`. Reverted https://github.com/pytorch/pytorch/pull/163340 on behalf of https://github.com/wdvr due to temporary revert to pull out #162659 ([comment](https://github.com/pytorch/pytorch/pull/163340#issuecomment-3317105243))	2025-09-22 06:20:04 +00:00
Edward Yang	1faf6367e3	Delete functorch C extension entirely. (#163340 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/163340 Approved by: https://github.com/aorenste ghstack dependencies: #160236	2025-09-21 06:02:21 +00:00
PyTorch MergeBot	578047838c	Revert "[BE] Update Python min version to 3.10 (#162310 )" This reverts commit `3016616ccb`. Reverted https://github.com/pytorch/pytorch/pull/162310 on behalf of https://github.com/malfet due to Breaks some windows tests ([comment](https://github.com/pytorch/pytorch/pull/162862#issuecomment-3310606135))	2025-09-19 05:16:49 +00:00
Nikita Shulga	3016616ccb	[BE] Update Python min version to 3.10 (#162310 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162310 Approved by: https://github.com/atalman, https://github.com/Skylion007, https://github.com/ZainRizvi ghstack dependencies: #162862	2025-09-19 04:28:56 +00:00
Robert Hardwick	1aeac304b8	Move prioritized text linker optimization code from setup.py to cmake (#160078 ) Note. This is a replica PR of #155901 which will be closed. I had to create a new PR in order to add it into my ghstack as there are some later commits which depend on it. ### Summary 🚀 This PR moves the prioritized text linker optimization from setup.py to cmake ( and enables by default on Linux aarch64 systems ) This change consolidates what was previously manual CI logic into a single location (cmake), ensuring consistent behavior across local builds, CI pipelines, and developer environments. ### Motivation Prioritized text layout has measurable performance benefits on Arm systems by reducing code padding and improving cache utilization. This optimization was previously triggered manually via CI scripts (.ci/aarch64_linux/aarch64_ci_build.sh) or user-set environment variables. By detecting the target architecture within setup.py, this change enables the optimization automatically where applicable, improving maintainability and usability. Note: Due to ninja/cmake graph generation issues we cannot apply the linker file globally to all targets to the targets must be manually defined. See CMakeLists.txt the main libraries torch_python, torch, torch_cpu, torch_cuda, torch_xpu have been targetted which should be enough to maintain the performance benefits outlined above. Co-authored-by: Usamah Zaheer <usamah.zaheer@arm.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160078 Approved by: https://github.com/seemethere	2025-09-18 17:09:48 +00:00
PyTorch MergeBot	94db2ad51d	Revert "Move prioritized text linker optimization code from setup.py to cmake (#160078 )" This reverts commit `26b3ae5890`. Reverted https://github.com/pytorch/pytorch/pull/160078 on behalf of https://github.com/atalman due to Sorry reverting this broke linux aarch64 CUDA nightlies [pytorch/pytorch/actions/runs/17637486681/job/50146967503](https://github.com/pytorch/pytorch/actions/runs/17637486681/job/50146967503) ([comment](https://github.com/pytorch/pytorch/pull/160078#issuecomment-3281426631))	2025-09-11 15:29:29 +00:00
Robert Hardwick	26b3ae5890	Move prioritized text linker optimization code from setup.py to cmake (#160078 ) Note. This is a replica PR of #155901 which will be closed. I had to create a new PR in order to add it into my ghstack as there are some later commits which depend on it. ### Summary 🚀 This PR moves the prioritized text linker optimization from setup.py to cmake ( and enables by default on Linux aarch64 systems ) This change consolidates what was previously manual CI logic into a single location (cmake), ensuring consistent behavior across local builds, CI pipelines, and developer environments. ### Motivation Prioritized text layout has measurable performance benefits on Arm systems by reducing code padding and improving cache utilization. This optimization was previously triggered manually via CI scripts (.ci/aarch64_linux/aarch64_ci_build.sh) or user-set environment variables. By detecting the target architecture within setup.py, this change enables the optimization automatically where applicable, improving maintainability and usability. Note: Due to ninja/cmake graph generation issues we cannot apply the linker file globally to all targets to the targets must be manually defined. See CMakeLists.txt the main libraries torch_python, torch, torch_cpu, torch_cuda, torch_xpu have been targetted which should be enough to maintain the performance benefits outlined above. Co-authored-by: Usamah Zaheer <usamah.zaheer@arm.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160078 Approved by: https://github.com/seemethere	2025-09-10 09:21:53 +00:00
PyTorch MergeBot	d711f27845	Revert "[ROCm] [CK] Composable Kernel integration for inductor backend (#158747 )" This reverts commit `019fed39aa`. Reverted https://github.com/pytorch/pytorch/pull/158747 on behalf of https://github.com/jithunnair-amd due to Broke linux-binary-manywheel-rocm / manywheel-py3_9-rocm6_4-test: `019fed39aa/1` ... PR didn't have this job run successfully due to CI outage ([comment](https://github.com/pytorch/pytorch/pull/158747#issuecomment-3259212343))	2025-09-05 17:27:45 +00:00
iupaikov-amd	019fed39aa	[ROCm] [CK] Composable Kernel integration for inductor backend (#158747 ) This is a part of our effort for integrating Composable Kernel library for Inductor backend. Currently we have a submodule, but would prefer to have commit pin control over the library as with Triton. We intentionally avoid putting all installation logic in CI scripts to allow locally built versions to have this functionality. The idea is to have CK as a pytorch dependency in pytorch 2.9 release to allow people to use it with inductor and AOT inductor and then gradually step away from submodule usage. Right now CK usage in SDPA/Gemm is tied to submodule files. This PR is a remake of due to branch error: https://github.com/pytorch/pytorch/pull/156192 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158747 Approved by: https://github.com/jeffdaily Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com> Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-09-04 16:51:06 +00:00
Chris Thi	69a25f6888	[ROCm] Enable USE_FBGEMM_GENAI (#160676 ) Summary: X-link: https://github.com/pytorch/FBGEMM/pull/4703 X-link: https://github.com/facebookresearch/FBGEMM/pull/1728 In this diff we enable the support for the new FBGEMM backed FP8 _scaled_grouped_mm on ROCm. For now we only enable support for `gfx942` as that is what we have thoroughly tested performance and correctness on. Rollback Plan: Differential Revision: D79564024 Test Plan: Ensure builds with: - `USE_FBGEMM_GENAI=1` and without gfx942 - `USE_FBGEMM_GENAI=1` and with gfx942 - `USE_FBGEMM_GENAI=1` and all current [`PYTORCH_ROCM_ARCH`](`9491d289b3/.ci/docker/libtorch/build.sh (L48)`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160676 Approved by: https://github.com/drisspg	2025-09-04 07:13:17 +00:00
Eli Uriegas	0447f2d99b	build: Add fallback commands to setup.py (#162009 ) Adds fallback commands for the following: * python setup.py install * python setup.py develop Ideally these should just work and should provide backwards compat. Thought process here is that multiple people rely on these commands and just because setuptools wants to drop support for this I don't think a lot of our downstream users who build from source are expecting these to be gone. This should provide some room for developers to move away from these commands until we have a unified frontend for doing all of these commands that should abstract most of these away. Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/162009 Approved by: https://github.com/clee2000, https://github.com/atalman	2025-09-03 02:56:10 +00:00
PyTorch MergeBot	ab7787fb82	Revert "[inductor] Windows inductor use intel-openmp. (#160258 )" This reverts commit `41673110cd`. Reverted https://github.com/pytorch/pytorch/pull/160258 on behalf of https://github.com/malfet due to Reverting to fix https://github.com/pytorch/pytorch/issues/160898 and https://github.com/pytorch/pytorch/issues/160962 ([comment](https://github.com/pytorch/pytorch/pull/160258#issuecomment-3220158145))	2025-08-25 12:57:47 +00:00
PyTorch MergeBot	1eccfb157a	Revert "[BE] Remove intel-openmp dependency in setup.py (#160976 )" This reverts commit `e483947047`. Reverted https://github.com/pytorch/pytorch/pull/160976 on behalf of https://github.com/malfet due to This PR is doing something strange ([comment](https://github.com/pytorch/pytorch/pull/160976#issuecomment-3220120462))	2025-08-25 12:46:12 +00:00
Wang, Chuanqi	e483947047	[BE] Remove intel-openmp dependency in setup.py (#160976 ) Fixes #160962 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160976 Approved by: https://github.com/xuhancn, https://github.com/atalman	2025-08-20 16:33:16 +00:00
FFFrog	39aa3d1471	Remove the dead code in setup.py (#160515 ) The following line has no effect. `34ec5ed275/setup.py (L1205)` This code was originally introduced in this PR: `dd7cec680c`, and clang11 and later now support `-fstack-clash-protection`. Can we remove this line? @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/160515 Approved by: https://github.com/isuruf, https://github.com/albanD	2025-08-14 06:02:11 +00:00
drisspg	15e49f6164	Factor out the strings to templates for better editor integration (#160357 ) # Summary More code motion, tldr is that install 'Better Jinja' in vscode and now you can get highlighting Before <img width="776" height="926" alt="Screenshot 2025-08-11 at 2 41 08 PM" src="https://github.com/user-attachments/assets/10868b31-f8ac-4cf5-99fe-19b8789ce06b" /> After: <img width="1184" height="1299" alt="Screenshot 2025-08-11 at 2 40 27 PM" src="https://github.com/user-attachments/assets/45203765-589e-4d76-8196-d895a2f2fbf6" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160357 Approved by: https://github.com/eellison	2025-08-14 01:07:53 +00:00
PyTorch MergeBot	c656334120	Revert "Factor out the strings to templates for better editor integration (#160357 )" This reverts commit `cbffde7745`. Reverted https://github.com/pytorch/pytorch/pull/160357 on behalf of https://github.com/clee2000 due to broke a bunch of internal builds due to not being able to find the file No such file or directory: torch/_inductor/kernel/flex/templates/flex_decode.py.jinja D80145761, might need a buck targets change? ([comment](https://github.com/pytorch/pytorch/pull/160357#issuecomment-3184435581))	2025-08-13 15:40:50 +00:00
Xu Han	41673110cd	[inductor] Windows inductor use intel-openmp. (#160258 ) After some debug work, I found PyTorch torch_cpu.dll is using intel-openmp, but not MSVC openmp. So, switch Windows inductor to intel-openmp. It fixed: `c8205cb354/test/inductor/test_aot_inductor.py (L2405-L2408)` <img width="896" height="230" alt="image" src="https://github.com/user-attachments/assets/273b00f8-7dc1-43c9-9b7f-752e16355a80" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160258 Approved by: https://github.com/ezyang	2025-08-13 02:36:19 +00:00
drisspg	cbffde7745	Factor out the strings to templates for better editor integration (#160357 ) # Summary More code motion, tldr is that install 'Better Jinja' in vscode and now you can get highlighting Before <img width="776" height="926" alt="Screenshot 2025-08-11 at 2 41 08 PM" src="https://github.com/user-attachments/assets/10868b31-f8ac-4cf5-99fe-19b8789ce06b" /> After: <img width="1184" height="1299" alt="Screenshot 2025-08-11 at 2 40 27 PM" src="https://github.com/user-attachments/assets/45203765-589e-4d76-8196-d895a2f2fbf6" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160357 Approved by: https://github.com/eellison	2025-08-12 21:59:54 +00:00
Scott Todd	bfc873d02e	[ROCm][Windows] Revert copying hipblaslt and rocblas dirs. (#159083 ) This reverts the changes from `b367e5f6a6`. This will also close https://github.com/pytorch/pytorch/pull/158922. Since `30387ab2e4`, ROCm is bootstrapped using the 'rocm' Python module which contains these files (see https://github.com/ROCm/TheRock/blob/main/docs/packaging/python_packaging.md), so they do not need to be bundled into torch/lib. There was also a bug in here - if `ROCM_DIR` is unset, the code crashes: ``` File "D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\setuptools\_distutils\dist.py", line 1002, in run_command cmd_obj.run() File "D:\b\pytorch_main\setup.py", line 853, in run rocm_dir_path = Path(os.environ["ROCM_DIR"]) ~~~~~~~~~~^^^^^^^^^^^^ File "<frozen os>", line 714, in __getitem__ KeyError: 'ROCM_DIR' ``` The code could have checked for `ROCM_PATH` too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159083 Approved by: https://github.com/jeffdaily	2025-08-12 02:45:49 +00:00
Andres Lugo	5f5f508aa8	[ROCm] Ck backend UX refactor (#152951 ) Refactors how the enablement/disablement of CK Gemms and SDPA works. - Adds USE_ROCM_CK_GEMM compile flag for enabling CK gemms. - USE_ROCM_CK_GEMM is set to True by default on Linux - Updates USE_CK_FLASH_ATTENTION to USE_ROCM_CK_SDPA. - USE_ROCM_CK_SDPA is set to False by default - (USE_CK_FLASH_ATTENTION still works for now, but will be deprecated in a future release) - Prevents these CK libraries from being used unless pytorch has been built specifically with the functionality AND is running on a system architecture that supports it. - the getters for these library backends will also do some validity checking in case the user used an environment variable to change the backend. If invalid, (i.e. one of the cases mentioned above is false) the backend will be set as the current non-CK default Pull Request resolved: https://github.com/pytorch/pytorch/pull/152951 Approved by: https://github.com/eqy, https://github.com/jeffdaily, https://github.com/m-gallus Co-authored-by: Jeff Daily <jeff.daily@amd.com> Co-authored-by: Jithun Nair <jithun.nair@amd.com> Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>	2025-08-08 18:40:17 +00:00
Edward Yang	38d65c6465	Add a USE_NIGHTLY option to setup.py (#159965 ) If you run python setup.py develop with USE_NIGHTLY, instead of actually building PyTorch we will just go ahead and download the corresponding nightly version you specified and dump its binaries. This is intended to obsolete tools/nightly.py. There's some UX polish for detecting what the latest nightly is if you pass in a blank string. I only tested on OS X. Coded with claude code. Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/159965 Approved by: https://github.com/malfet	2025-08-07 01:44:20 +00:00

1 2 3 4 5 ...

941 Commits