mirror of
https://github.com/zebrajr/pytorch.git
synced 2026-01-15 12:15:51 +00:00
## Summary This PR adds multi-architecture kernel compilation support for ROCm in PyTorch's AOT Inductor module, enabling a single compiled model to run across multiple AMD GPU architectures (MI200, MI300, MI350, etc.) without recompilation. ## Implementation - **Multi-arch compilation pipeline**: Compiles LLVM IR to multiple GPU architectures and bundles them using `clang-offload-bundler` - **Architecture detection**: Automatically detects target architectures from `torch.cuda.get_arch_list()`, with overrides via `PYTORCH_ROCM_ARCH` environment variable - **ROCm-specific utilities**: New `rocm_multiarch_utils.py` module handles ROCm toolchain integration - **Test infrastructure**: Adapted AOT Inductor tests to support both CUDA and ROCm compilation paths ## Testing Successfully tested on: - MI200 - MI300 **Enabled tests:** - `test_simple_multi_arch` - `test_compile_after_package_multi_arch` - `test_compile_with_exporter` - `test_compile_with_exporter_weights` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166357 Approved by: https://github.com/jeffdaily