pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2026-01-15 12:15:51 +00:00

Author	SHA1	Message	Date
jainapurva	fdccb593c8	Add normalization and activation ops to operator benchmarks (#169544 ) We're adding some more ops to the benchmarking: Normalization ops: - LayerNorm - RMSNorm - BatchNorm1d - BatchNorm2d - BatchNorm3d - GroupNorm Activation ops: - nn.GELU - nn.SiLU - nn.ReLU - nn.LeakyReLU Pull Request resolved: https://github.com/pytorch/pytorch/pull/169544 Approved by: https://github.com/slayton58	2025-12-22 16:41:45 +00:00
shunting314	96b3e7d789	[ez] log why fail to gen golden ref (#170843 ) Sometimes when doing accuracy test with a large batch size, it fail due to the golden ref can not be generated. It would be helpful to log why we fail to generate the golden ref. In my case, it's due to OOM. Pull Request resolved: https://github.com/pytorch/pytorch/pull/170843 Approved by: https://github.com/malfet, https://github.com/Skylion007	2025-12-19 21:00:07 +00:00
Markus Hoehnerbach	b9bf9a52f3	benchmarks: remove torchbench pin as it does not exist (#170304 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/170304 Approved by: https://github.com/eellison	2025-12-19 20:37:32 +00:00
eellison	e5b9bcedfe	[inductor] Fix cudagraph skip for index_put_ with boolean indices, graph partitioning logic (#170103 ) A few fixes: - we weren't partitioning around index put with boolean inputs - graph partitioning was skipping the whole graph, anytime we used V.graph.disable_cudagraphs_reason. There's no reason to use this for partitioning. I've updated the skip logic to encompass all of the reasons we would use disable cudagraphs reason. - Prune the deterministic disable cudagraphs reason. I'm not sure how this list of ops got there originally, but I've added opinfo tests showing they're cudagraphable. We run into errors with just part of these fixes, unrelated, so i'm doing this as one pr. Fixes #169951 Pull Request resolved: https://github.com/pytorch/pytorch/pull/170103 Approved by: https://github.com/BoyuanFeng	2025-12-18 22:43:04 +00:00
Tomasz Bohutyn	7d355795e4	Raise XPU tolerances for bf16 ResNet & BotNet TorchBench (#170552 ) Multiple TorchBench models on XPU fail accuracy tests due to numeric tolerance being too strict rather. Two contributing factors identified: 1. Measurement methodology change (PyTorch 2.6.0 enforcing cosine_similarity https://github.com/pytorch/pytorch/blob/main/benchmarks/dynamo/common.py#L2227) surfaced limitations and increased sensitivity in error checks for phlippe_resnet. 2. BatchNorm decomposition noise (~1e-5 RMSE per BN in fp16) accumulates through the iteration in botnet26t_256, pushing aggregate diffs beyond current thresholds. Analysis - phlippe_resnet failures reproduce across CPU and XPU; fp16 already uses higher tolerance, implying bf16 thresholds are misaligned. - Disabling BN decomposition brings botnet26t_256 outputs within tolerance; with decomposition enabled, cumulative numeric error is expected. - CI health indicates changes are non-disruptive; failures, where present, are unrelated to these PRs. Fixes https://github.com/intel/torch-xpu-ops/issues/1799 Fixes https://github.com/intel/torch-xpu-ops/issues/1305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/170552 Approved by: https://github.com/EikanWang, https://github.com/desertfire Co-authored-by: Tomasz Bohutyn <tbohutyn@habana.ai>	2025-12-17 21:04:17 +00:00
Jeff Daily	4fb120133c	[ROCm][CI] update ci expected dynamo benchmark results (#170469 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/170469 Approved by: https://github.com/desertfire	2025-12-16 03:34:07 +00:00
PyTorch MergeBot	6c75344a1d	Revert "[inductor] Fix cudagraph skip for index_put_ with boolean indices, graph partitioning logic (#170103 )" This reverts commit `d3008cfb3c`. Reverted https://github.com/pytorch/pytorch/pull/170103 on behalf of https://github.com/izaitsevfb due to breaks inductor:cudagraph_trees_expandable_segments tests internally, see [D89125296](https://www.internalfb.com/diff/D89125296) ([comment](https://github.com/pytorch/pytorch/pull/170103#issuecomment-3652993720))	2025-12-15 05:06:03 +00:00
Bin Bao	a773df9943	[CI] Another update to rocm expected result files (#170309 ) Summary: From `python benchmarks/dynamo/ci_expected_accuracy/update_expected.py 47b09ca1c35d507849c4eb37ac1524e395ce39a2` Pull Request resolved: https://github.com/pytorch/pytorch/pull/170309 Approved by: https://github.com/eellison ghstack dependencies: #170348	2025-12-14 21:23:41 +00:00
eellison	d3008cfb3c	[inductor] Fix cudagraph skip for index_put_ with boolean indices, graph partitioning logic (#170103 ) A few fixes: - we weren't partitioning around index put with boolean inputs - graph partitioning was skipping the whole graph, anytime we used V.graph.disable_cudagraphs_reason. There's no reason to use this for partitioning. I've updated the skip logic to encompass all of the reasons we would use disable cudagraphs reason. - Prune the deterministic disable cudagraphs reason. I'm not sure how this list of ops got there originally, but I've added opinfo tests showing they're cudagraphable. We run into errors with just part of these fixes, unrelated, so i'm doing this as one pr. Fixes #169951 Pull Request resolved: https://github.com/pytorch/pytorch/pull/170103 Approved by: https://github.com/BoyuanFeng	2025-12-13 01:45:14 +00:00
Bin Bao	877a3a56f3	[CI] Update update_expected.py to skip cuda-13 results (#170348 ) Summary: Since cuda-13 runs skip more tests, we should only use cuda-12 runs to update the expected result files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/170348 Approved by: https://github.com/eellison	2025-12-13 00:02:19 +00:00
PyTorch MergeBot	8b21f924c3	Revert "[Inductor XPU GEMM] Step 1/N: Refactor cutlass configuration. (#160174 )" This reverts commit `eabb7ad212`. Reverted https://github.com/pytorch/pytorch/pull/160174 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it seems to cause a perf regression ([comment](https://github.com/pytorch/pytorch/pull/160174#issuecomment-3644336456))	2025-12-12 00:13:49 +00:00
Jason Ansel	b83fc89703	[dynamo] Fix benchmarks/dynamo/common.py error (#170009 ) python benchmarks/dynamo/torchbench.py --performance --inference -k vision_maskrcnn was failing with: ``` Traceback (most recent call last): File "/home/jansel/pytorch/benchmarks/dynamo/torchbench.py", line 490, in <module> torchbench_main() File "/home/jansel/pytorch/benchmarks/dynamo/torchbench.py", line 486, in torchbench_main main(TorchBenchmarkRunner(), original_dir) File "/home/jansel/pytorch/benchmarks/dynamo/common.py", line 3730, in main process_entry(0, runner, original_dir, args) File "/home/jansel/pytorch/benchmarks/dynamo/common.py", line 3655, in process_entry result = run(runner, args, original_dir) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/benchmarks/dynamo/common.py", line 4387, in run runner.run_one_model( File "/home/jansel/pytorch/benchmarks/dynamo/common.py", line 2966, in run_one_model status = self.run_performance_test( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/benchmarks/dynamo/common.py", line 2873, in run_performance_test experiment( TypeError: coverage_experiment() got an unexpected keyword argument 'batch_size' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/170009 Approved by: https://github.com/Lucaskabela ghstack dependencies: #170004	2025-12-11 00:09:02 +00:00
Bin Bao	32324da2f5	[ci] Update expected result files for rocm (#170072 ) Summary: Updated by running "python benchmarks/dynamo/ci_expected_accuracy/update_expected.py afb173d9b9440d804b5f77d0c291e53c720d1fcf". Pull Request resolved: https://github.com/pytorch/pytorch/pull/170072 Approved by: https://github.com/jataylo, https://github.com/jeffdaily	2025-12-10 18:17:54 +00:00
Bin Bao	a764023aaf	[CI] Update expected result files (#169860 ) Summary: benchmarks/dynamo/ci_expected_accuracy/update_expected.py stopped working because it needs to deal with both cuda and rocm runs now. Fixed it and updated the result files for inductor-periodic jobs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/169860 Approved by: https://github.com/zou3519, https://github.com/huydhn	2025-12-10 14:34:14 +00:00
jainapurva	ab5b7fbfb9	Update operator benchmarks README (#168145 ) This pull request adds comprehensive documentation to the operator benchmark suite, detailing how CI regression tracking is performed for both CPU and GPU devices. The new section in the `README.md` explains the workflows, devices, operators tracked, schedules, triggers, and instructions for manually running benchmarks. This update will help contributors understand how performance regressions are monitored and how to interact with the CI workflows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/168145 Approved by: https://github.com/malfet, https://github.com/huydhn Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Joel Schlosser <75754324+jbschlosser@users.noreply.github.com>	2025-12-09 15:14:06 +00:00
Jack Taylor	b53d925706	Remove outdated flaky models and enable deterministic algorithms on ROCm (#169024 ) Remove outdated flaky models and enable deterministic algorithms on ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/169024 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-12-09 13:24:38 +00:00
Ivan Zaitsev	27bf9eb778	Update expected results for basic_modules_ListOfLinears_eager (#169895 ) This pull request updates the expected results for the compile time instruction count in the `benchmarks/dynamo/pr_time_benchmarks/expected_results.csv` file. The change reflects a lower instruction count for the `basic_modules_ListOfLinears_eager` benchmark. this reflects improvement after: https://github.com/pytorch/pytorch/pull/169553 that is outside the tolerance and fails the CI `e81262f1b9/1` Pull Request resolved: https://github.com/pytorch/pytorch/pull/169895 Approved by: https://github.com/seemethere, https://github.com/atalman	2025-12-09 01:45:42 +00:00
cyy	5213a72bd3	Enable ruff SIM115 check (#169437 ) This PR enables the ruff check for using context managers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/169437 Approved by: https://github.com/Lucaskabela, https://github.com/albanD	2025-12-05 01:58:13 +00:00
Simon Fan	e5fd7b7ac8	Add a single GPU variant of modded-nanogpt to torchbench (#169502 ) (#169505 ) Summary: ## Tests Standalone: `python -m torchbenchmark.models.modded_nanogpt.main` Through dynamo benchmarks: `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --only modded_nanogpt --disable-cudagraphs` This PR adds a tweaked version of the Aug 23rd record for the nanogpt speedrun (GPT-2 small variant): `9d9dc969c4/train_gpt.py`. The later records can not be ran without building FA3 from source, so we will ommit them until the dynamo FA3 PR is merged. The tweaks are to library-ify the script by commenting out everything other than the model class definitions, to change the pg initialization to use fake pg, and constant-ify some hyperparameters. The tests run locally, but this model specifically requires H100. I wasn't sure how to filter for that, so I skipped all the tests. This will be tested on the dynamo benchmark side: https://github.com/pytorch/pytorch/pull/169449. X-link: https://github.com/pytorch/benchmark/pull/2660 Differential Revision: D88233265 Pulled By: xmfan Pull Request resolved: https://github.com/pytorch/pytorch/pull/169505 Approved by: https://github.com/BoyuanFeng	2025-12-04 20:25:02 +00:00
atalman	a36e1d39eb	Triton 3.6 pin update (#168096 ) Required for release 2.10 Rocm wheel build fix provided by: https://github.com/pytorch/pytorch/pull/169369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/168096 Approved by: https://github.com/njriasan, https://github.com/malfet, https://github.com/huydhn	2025-12-04 15:09:20 +00:00
xinan.lin	eabb7ad212	[Inductor XPU GEMM] Step 1/N: Refactor cutlass configuration. (#160174 ) This PR is the first step toward implementing RFC #160175. Currently, all Cutlass-related Torch Inductor configs are located in `torch._inductor.config.cuda`. This PR refactors the device-agnostic Cutlass configurations into `torch._inductor.config.cutlass`, so they can be shared and reused by XPU as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160174 Approved by: https://github.com/EikanWang, https://github.com/mlazos, https://github.com/jansel	2025-12-04 08:53:43 +00:00
William Wen	c55b1e8f61	[dynamo, guards] cache Source hashing (#168886 ) Final fix for https://github.com/pytorch/pytorch/issues/168118. Decreases guard build time from 9s -> 0.5s on a local tlparse. On the guard build benchmark, time went from 50.18s -> 8.15s Pull Request resolved: https://github.com/pytorch/pytorch/pull/168886 Approved by: https://github.com/anijain2305 ghstack dependencies: #168131, #168203, #168386	2025-12-04 05:22:52 +00:00
Simon Fan	7eb6259200	[dynamo][benchmarks] add option to force amp to use bfloat16 instead of float16 (#169449 ) For models which hardcode bf16 like modded-nanogpt Tested on `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --only modded_nanogpt --disable-cudagraphs` w/ https://github.com/pytorch/benchmark/pull/2660 Pull Request resolved: https://github.com/pytorch/pytorch/pull/169449 Approved by: https://github.com/Lucaskabela	2025-12-03 23:32:07 +00:00
PyTorch MergeBot	fdf863d5e1	Revert "Triton 3.6 pin update (#168096 )" This reverts commit `93d0d6838c`. Reverted https://github.com/pytorch/pytorch/pull/168096 on behalf of https://github.com/atalman due to Causes timeouts https://github.com/pytorch/pytorch/issues/169492 ([comment](https://github.com/pytorch/pytorch/pull/168096#issuecomment-3609092057))	2025-12-03 22:23:29 +00:00
William Wen	7f55ba19c4	[dynamo, guards] add guard builder microbenchmark (#169087 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/169087 Approved by: https://github.com/anijain2305 ghstack dependencies: #167888	2025-12-02 23:28:02 +00:00
atalman	93d0d6838c	Triton 3.6 pin update (#168096 ) Required for release 2.10 Rocm wheel build fix provided by: https://github.com/pytorch/pytorch/pull/169369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/168096 Approved by: https://github.com/njriasan, https://github.com/malfet	2025-12-02 17:28:48 +00:00
Boyuan Feng	a7dc6dab9a	bump transformer pin to 4.57.3 (#169226 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/169226 Approved by: https://github.com/anijain2305	2025-12-01 18:58:14 +00:00
Yuanyuan Chen	f47dd0ddef	Enable SIM118 (#167399 ) This PR enables the `SIM118` rule of ruff, which checks for key-existence checks against dict.keys() calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/167399 Approved by: https://github.com/albanD	2025-11-28 08:00:09 +00:00
jainapurva	c9c8a8567d	Add optimizer tests in operator microbenchmarks (#168101 ) This PR adds comprehensive benchmarks for PyTorch optimizers to measure optimizer.step() performance across different parameter configurations. ### Optimizers benchmarked: - AdamW - Adam - SGD (with momentum=0.9) - RMSprop - Adagrad ### Test configurations: - num_params: [1, 10, 100] - param_size: [100K, 1M, 10M] Pull Request resolved: https://github.com/pytorch/pytorch/pull/168101 Approved by: https://github.com/slayton58	2025-11-23 20:13:33 +00:00
PyTorch MergeBot	3f0d46c8b0	Revert "[Inductor XPU GEMM] Step 1/N: Refactor cutlass configuration. (#160174 )" This reverts commit `008ac433b0`. Reverted https://github.com/pytorch/pytorch/pull/160174 on behalf of https://github.com/yangw-dev due to failed internal tests test_cpu_/test_cpu#link-tree/torch/utils/_config_module.py line 371, in _config = self._config[name] KeyError: 'cuda.cutlass_dir' Diff: D87660662 ([comment](https://github.com/pytorch/pytorch/pull/160174#issuecomment-3567237578))	2025-11-23 00:46:16 +00:00
linhaifeng	9fa3e6e513	[BugFix] Fix incorrect type hint. (#168892 ) tuple[int] -> tuple[int,...] 1 -> more Like shape, shape: tuple[int, ...] # [B, Hq, M, Hkv, N, D] Inspired by https://github.com/pytorch/pytorch/pull/168320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/168892 Approved by: https://github.com/cyyever, https://github.com/Skylion007	2025-11-22 23:50:13 +00:00
Jason Ansel	b565593c62	[dynamo] Add optree.tree_map microbenchmark (#168341 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/168341 Approved by: https://github.com/anijain2305 ghstack dependencies: #168340	2025-11-22 06:30:10 +00:00
xinan.lin	008ac433b0	[Inductor XPU GEMM] Step 1/N: Refactor cutlass configuration. (#160174 ) This PR is the first step toward implementing RFC #160175. Currently, all Cutlass-related Torch Inductor configs are located in `torch._inductor.config.cuda`. This PR refactors the device-agnostic Cutlass configurations into `torch._inductor.config.cutlass`, so they can be shared and reused by XPU as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160174 Approved by: https://github.com/EikanWang, https://github.com/mlazos, https://github.com/jansel	2025-11-21 15:44:24 +00:00
Fadi Arafeh	f97c3fc8e4	Re-enable ConvTranspose operator benchmarks for AArch64 (#166731 ) This was disabled by #165585 due to #165654 which was fixed by #165904 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166731 Approved by: https://github.com/malfet ghstack dependencies: #165904	2025-11-20 18:01:10 +00:00
shunting314	b288d0020b	[inductor] unittest for run2run determinism (#167482 ) Not sure if the path are already properly setup so I can call 'benchmarks/dynamo/huggingface.py' in unit test directly. Let's tell from CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/167482 Approved by: https://github.com/v0i0, https://github.com/mlazos	2025-11-17 20:12:15 +00:00
Adrian Abeyta	2f3bb7482c	Improve benchmarks/dynamo:check_perf_csv output and failure summary (#161728 ) Resolves https://github.com/pytorch/pytorch/issues/161290 ## Summary Expands `dynamo/check_perf_csv.py` output capabilities with latency, compile time and memory information: - Display's measured speedup and display % from target - Added clear messaging for all passing model tests when no regression is found - Added error handling if csv file is missing ### Example (Failing Check) ```bash python benchmarks/dynamo/check_perf_csv.py -f reports-dir/inductor_training_smoketest.csv -t 1.40 ``` Example Output: ``` Checking inductor_training_smoketest.csv (speedup threshold >= 1.40x) hf_Bert speedup=1.005x, latency=390.8 ms/iter, compile=1.526s, mem_ratio=1.02x (eager=360.6 GB, dynamo=369.3 GB) Error 1 model(s) performance regressed hf_Bert - hf_Bert: 1.005x (< 1.40x; -28.2% from target) ``` ### Example (Passing Check) ```bash python benchmarks/dynamo/check_perf_csv.py -f reports-dir/inductor_training_smoketest.csv -t 1.40 ``` Example Output: ``` Checking inductor_training_smoketest.csv (speedup threshold >= 1.00x) hf_Bert speedup=1.005x, latency=390.8 ms/iter, compile=1.526s, mem_ratio=1.02x (eager=360.6 GB, dynamo=369.3 GB) All 1 model(s) passed threshold check (>= 1.00x) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/161728 Approved by: https://github.com/isuruf	2025-11-17 17:54:29 +00:00
Richard Zou	bc60b86066	Skip stable diffusion models in torchbench, get tests and benchmarks green (#167896 ) Test Plan: - wait for CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/167896 Approved by: https://github.com/aorenste, https://github.com/shunting314 ghstack dependencies: #167609	2025-11-15 02:44:36 +00:00
Will Constable	782fc3c72b	[DTensor] Add CPU instruction count benchmark for dispatch (#167394 ) Following example from #149932 and doc in [README.md](benchmarks/dynamo/pr_time_benchmarks/README.md) cd benchmarks/dynamo/pr_time_benchmarks `PYTHONPATH=./:../../../ python benchmarks/dtensor.py a` Currently outputs: ``` collecting instruction count for dtensor_dispatch_detach instruction count for iteration 0 is 14919468 instruction count for iteration 1 is 136283 instruction count for iteration 2 is 133750 instruction count for iteration 3 is 133757 instruction count for iteration 4 is 133751 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/167394 Approved by: https://github.com/laithsakka	2025-11-13 06:54:08 +00:00
jainapurva	f9851af59b	Add Attention ops to CI (#165915 ) This pull request introduces a new attention operator microbenchmark workflow to the CI system, enabling automated benchmarking and reporting for attention-related operations. The main changes include adding a new GitHub Actions workflow, to add attention benchmarks to the existing Pytorch operator microbenchmark [dashboard](https://hud.pytorch.org/benchmark/v3/dashboard/pytorch_operator_microbenchmark?renderGroupId=main&time.start=2025-10-27T00%3A00%3A00.000Z&time.end=2025-10-29T01%3A00%3A00.000Z&filters.device=cuda&filters.arch=NVIDIA+A100-SXM4-40GB&filters.deviceName=cuda%7C%7CNVIDIA+A100-SXM4-40GB&filters.operatorName=&lcommit.commit=665df0bc7288996d638fcc3da750f8cb2addd6d0&lcommit.workflow_id=18888994873&lcommit.date=2025-10-29T00%3A00%3A00Z&lcommit.branch=refs%2Ftags%2Fciflow%2Fop-benchmark%2F165915&rcommit.commit=665df0bc7288996d638fcc3da750f8cb2addd6d0&rcommit.workflow_id=18888994873&rcommit.date=2025-10-29T00%3A00%3A00Z&rcommit.branch=refs%2Ftags%2Fciflow%2Fop-benchmark%2F165915&lbranch=refs%2Ftags%2Fciflow%2Fop-benchmark%2F165915&rbranch=refs%2Ftags%2Fciflow%2Fop-benchmark%2F165915) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165915 Approved by: https://github.com/jbschlosser	2025-11-13 05:30:04 +00:00
Jeff Daily	ed79693706	[ROCm][CI] dynamo benchmark repvgg_a2 is flaky (#167660 ) Update dynamo results due to flaky model https://github.com/pytorch/pytorch/actions/runs/19283051320/job/55139788014 Pull Request resolved: https://github.com/pytorch/pytorch/pull/167660 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-11-12 17:41:19 +00:00
Richard Zou	4714eb7021	Update dynamic_inductor_timm_training.csv (#167609 ) These tests were failing since they were added in https://github.com/pytorch/pytorch/pull/165381 Evidence: scroll back in HUD, on that commit they were failing. I'm going to (1) set the accuracy to get CI green and (2) file an issue for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/167609 Approved by: https://github.com/choijon5, https://github.com/desertfire	2025-11-12 15:15:46 +00:00
PyTorch MergeBot	5ce4a8b49f	Revert "fix wrong accuracy_status when exception. (#165731 )" This reverts commit `bfcdbd0a97`. Reverted https://github.com/pytorch/pytorch/pull/165731 on behalf of https://github.com/zou3519 due to broke inductor periodic ([comment](https://github.com/pytorch/pytorch/pull/165731#issuecomment-3519743601))	2025-11-12 03:36:27 +00:00
Jeff Daily	9ae62fcc18	[ROCm][CI] dynamo benchmarks update ci expected accuracy (#167574 ) repvgg_a2 IMPROVED: accuracy=pass, expected=fail_accuracy Pull Request resolved: https://github.com/pytorch/pytorch/pull/167574 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-11-11 22:54:55 +00:00
linhaifeng	3e7a66fae1	[BugFix][Refactor] fix several instances which use f = open(...) without a corresponding f.close() (#167423 ) This pattern can lead to potential file descriptor leaks, which can cause resource exhaustion or other unpredictable issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/167423 Approved by: https://github.com/cyyever, https://github.com/Skylion007	2025-11-11 11:27:59 +00:00
Xu Zhao	8d5cceeb6a	[torchbench][optimus] Add backend optimus (#167357 ) Summary: `--optimus [all \| vertical_opt \| horizontal_opt]` will kick off inductor compile with different fusion strategies. Test Plan: TorchBench Runner: ``` $ buck2 run mode/opt //pytorch/benchmark:run -- customized_optimus_illustrative -t train -d cuda GPU Time per batch: 56.254 milliseconds CPU Wall Time per batch: 56.326 milliseconds CPU Wall Time: 56.326 milliseconds Time to first batch: 420.0777 ms GPU 0 Peak Memory: 0.0695 GB CPU Peak Memory: 359.6362 GB ``` PT2 Benchmark Runner (comparing with eager): ``` buck2 run mode/opt //pytorch/benchmark:pt2 -- --only customized_optimus_illustrative --performance --training --inductor running benchmark: 100%\|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 30/30 [00:02<00:00, 14.37it/s] 4.509x ``` eager latency: ~56 ms inductor latency: ~11 ms Optimus backend: ``` $ buck2 run mode/opt //pytorch/benchmark:pt2 -- --only customized_optimus_illustrative --performance --training --optimus all 11.02923508733511 ms, 13.884015614166856 ms, 0.794x ``` ``` $ buck2 run mode/opt //pytorch/benchmark:pt2 -- --only customized_optimus_illustrative --performance --training --optimus vertical_opt 12.47156853787601 ms, 10.699485195800662 ms, 1.166x ``` ``` $ buck2 run mode/opt //pytorch/benchmark:pt2 -- --only customized_optimus_illustrative --performance --training --optimus horizontal_opt 11.078484123572707 ms, 10.797873372212052 ms, 1.026x ``` optimus latency ~10 ms Differential Revision: D86524903 Pull Request resolved: https://github.com/pytorch/pytorch/pull/167357 Approved by: https://github.com/mengluy0125	2025-11-11 00:35:30 +00:00
Nicolas De Carli	9d9e7c7b1c	[Pytorch] Extend OSS conversion benchmarks (#167099 ) Summary: We are extending OSS conversion benchmarks, to include all combinations between types Test Plan: CI Differential Revision: D86315975 Pull Request resolved: https://github.com/pytorch/pytorch/pull/167099 Approved by: https://github.com/mcfi	2025-11-10 23:36:57 +00:00
linhaifeng	71606b289c	[BugFix] Fix compute_error in coo_mean_time and csr_mean_time (#166795 ) The csr timing loop is nested inside the coo loop. duplicated and inconsistent measurements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166795 Approved by: https://github.com/cyyever, https://github.com/ezyang	2025-11-08 23:57:15 +00:00
Apurva Jain	85fab6c9b0	Fix duplicate benchmarking entries for addmm (#166652 ) There have been duplicate entries for addmm in dashboard. This PR fixes the duplicate entries issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/166652 Approved by: https://github.com/yangw-dev	2025-11-06 03:25:03 +00:00
jainapurva	b8855e7b0b	Add conv ops to operator microbenchmark (#166331 ) Adding `conv` (conv1d, conv2d, conv3d) to the list of operator microbenchmarks run in the CI script (`.ci/pytorch/test.sh`), ensuring convolution operators are now benchmarked alongside existing ones. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166331 Approved by: https://github.com/huydhn, https://github.com/jbschlosser	2025-11-03 20:54:52 +00:00
Shunting Zhang	9f9dbe0a9a	add a curve for customized compilation in the kernel benchmarking scripts (#166697 ) It's nice to add a curve with a customized compilation options so that we can compare side-by-side the perf improvement of new features. E.g. for mix-order-reduction, by running the following command ``` python benchmarks/dynamo/genai_layers/benchmark.py --tolerance=1e-2 --exit-on-accuracy-failure --visualize rmsnorm_backward --custom-compile-name="compiled-no-fusion" --custom-compile-options='{"triton.mix_order_reduction":false}' ``` I get following output: ``` Geomean speedup for benchmark RMSNormBackward eager 11 data points compiled 11 data points, 15.82x speedup quack 11 data points, 15.45x speedup liger 11 data points, 14.06x speedup compiled-no-fusion 11 data points, 10.26x speedup ``` The output shows that the feature on average improve perf by `15.82 / 10.26 = 1.54x` for all the shapes tested. (I remove a shape (32768, 32768) whose rnumel is too large and not representative). The new curve also shows up in the figure: <img width="3564" height="2368" alt="RMSNormBackward_bench" src="https://github.com/user-attachments/assets/1ffac2bc-e726-4f1e-806d-e9e5de711492" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/166697 Approved by: https://github.com/BoyuanFeng ghstack dependencies: #166053, #166382, #166461, #166585, #166675	2025-11-01 22:09:56 +00:00

1 2 3 4 5 ...

2255 Commits