Raise XPU tolerances for bf16 ResNet & BotNet TorchBench (#170552)

Multiple TorchBench models on XPU fail accuracy tests due to numeric tolerance being too strict rather. Two contributing factors identified: 1. Measurement methodology change (PyTorch 2.6.0 enforcing cosine_similarity https://github.com/pytorch/pytorch/blob/main/benchmarks/dynamo/common.py#L2227) surfaced limitations and increased sensitivity in error checks for phlippe_resnet. 2. BatchNorm decomposition noise (~1e-5 RMSE per BN in fp16) accumulates through the iteration in botnet26t_256, pushing aggregate diffs beyond current thresholds. **Analysis** - phlippe_resnet failures reproduce across CPU and XPU; fp16 already uses higher tolerance, implying bf16 thresholds are misaligned. - Disabling BN decomposition brings botnet26t_256 outputs within tolerance; with decomposition enabled, cumulative numeric error is expected. - CI health indicates changes are non-disruptive; failures, where present, are unrelated to these PRs. Fixes https://github.com/intel/torch-xpu-ops/issues/1799 Fixes https://github.com/intel/torch-xpu-ops/issues/1305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/170552 Approved by: https://github.com/EikanWang, https://github.com/desertfire Co-authored-by: Tomasz Bohutyn <tbohutyn@habana.ai>
2026-01-15 12:15:51 +00:00 · 2025-12-17 21:04:13 +00:00
parent 643d3a9676
commit 7d355795e4
2 changed files with 11 additions and 0 deletions
--- a/benchmarks/dynamo/timm_models.py
+++ b/benchmarks/dynamo/timm_models.py
@@ -71,6 +71,10 @@ REQUIRE_HIGHER_TOLERANCE = {
    "mobilenetv3_large_100",
 }

+REQUIRE_HIGHER_TOLERANCE_FP16_XPU = {
+    "botnet26t_256",
+}
+
 REQUIRE_HIGHER_TOLERANCE_AMP = {}

 REQUIRE_EVEN_HIGHER_TOLERANCE = {
@@ -366,6 +370,12 @@ class TimmRunner(BenchmarkRunner):
                self.args.amp and name in REQUIRE_HIGHER_TOLERANCE_AMP
            ):
                tolerance = 4 * 1e-2
+            elif (
+                name in REQUIRE_HIGHER_TOLERANCE_FP16_XPU
+                and self.args.float16
+                and current_device == "xpu"
+            ):
+                tolerance = 4 * 1e-2
            else:
                tolerance = 1e-2
        return tolerance, cosine
--- a/benchmarks/dynamo/torchbench.yaml
+++ b/benchmarks/dynamo/torchbench.yaml
@@ -52,6 +52,7 @@ tolerance:
  # These models need higher tolerance for xpu devices with bf16
  higher_bf16_xpu:
    - squeezenet1_1
+    - phlippe_resnet

  freezing:
    # Similar logic to timm_models.py:get_tolerance_and_cosine_flag