Files
pytorch/.github/workflows/periodic-rocm-mi300.yml
Ivan Zaitsev 49f600e864 Remove concurrency limits in workflows for workflow_dispatches (#171132)
Autorevert can issue multiple dispatches without waiting for the last one to finish:
https://github.com/pytorch/pytorch/actions/workflows/pull.yml?query=branch%3Atrunk%2Faadd016020d718ae862361d23d98f61a5e6e3903
(this is expected behavior in certain cases, e.g. the specific job was already finished, but not the whole workflow)

But currently in pytorch workflows the concurrency policy cancels concurrent workflow runs, even if they are dispatches.

This PR:
1. removes the limit for dispatches (for the workflows that are monitored by autorevert). Note: there is still a hard cap for the total number of dispatches on autorevert side.

2. adds logging, so in the future we can change the concurrency to apply only to autorevert dispatches (we'll know what correct `actor` value to use)

3. removes garbage from the key in linux-aarch64.yml wf

----

Testing:

see my two manual concurrent dispatches here:
https://github.com/pytorch/pytorch/actions/workflows/pull.yml?query=branch%3Aunlimited-dispatches++
(also notice that concurrency correctly cancels wf on PR update)

new logging:
https://github.com/pytorch/pytorch/actions/runs/20444849087/job/58745963215#step:2:20
Pull Request resolved: https://github.com/pytorch/pytorch/pull/171132
Approved by: https://github.com/clee2000, https://github.com/jeanschmidt
2025-12-22 22:24:02 +00:00

83 lines
3.1 KiB
YAML

name: periodic-rocm-mi300
on:
schedule:
# We have several schedules so jobs can check github.event.schedule to activate only for a fraction of the runs.
# Also run less frequently on weekends.
- cron: 45 0,8,16 * * 1-5
- cron: 45 4 * * 0,6
- cron: 45 4,12,20 * * 1-5
- cron: 45 12 * * 0,6
- cron: 29 8 * * * # about 1:29am PDT, for mem leak check and rerun disabled tests
push:
tags:
- ciflow/periodic/*
- ciflow/periodic-rocm-mi300/*
branches:
- release/*
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref_name }}-${{ github.ref_type == 'branch' && github.sha }}-${{ github.event_name == 'workflow_dispatch' && github.run_id }}-${{ github.event_name == 'schedule' }}-${{ github.event.schedule }}
cancel-in-progress: true
permissions: read-all
jobs:
llm-td:
if: github.repository_owner == 'pytorch'
name: before-test
uses: ./.github/workflows/llm_td_retrieval.yml
permissions:
id-token: write
contents: read
target-determination:
name: before-test
uses: ./.github/workflows/target_determination.yml
needs: llm-td
permissions:
id-token: write
contents: read
get-label-type:
name: get-label-type
uses: pytorch/pytorch/.github/workflows/_runner-determinator.yml@main
if: (github.event_name != 'schedule' || github.repository == 'pytorch/pytorch') && github.repository_owner == 'pytorch'
with:
triggering_actor: ${{ github.triggering_actor }}
issue_owner: ${{ github.event.pull_request.user.login || github.event.issue.user.login }}
curr_branch: ${{ github.head_ref || github.ref_name }}
curr_ref_type: ${{ github.ref_type }}
linux-noble-rocm-py3_12-build:
name: linux-noble-rocm-py3.12-mi300
uses: ./.github/workflows/_linux-build.yml
needs: get-label-type
with:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build-environment: linux-noble-rocm-py3.12-mi300
docker-image-name: ci-image:pytorch-linux-noble-rocm-n-py3
test-matrix: |
{ include: [
{ config: "distributed", shard: 1, num_shards: 3, runner: "linux.rocm.gpu.gfx942.4.b", owners: ["module:rocm", "oncall:distributed"] },
{ config: "distributed", shard: 2, num_shards: 3, runner: "linux.rocm.gpu.gfx942.4.b", owners: ["module:rocm", "oncall:distributed"] },
{ config: "distributed", shard: 3, num_shards: 3, runner: "linux.rocm.gpu.gfx942.4.b", owners: ["module:rocm", "oncall:distributed"] },
]}
secrets: inherit
linux-noble-rocm-py3_12-test:
permissions:
id-token: write
contents: read
name: linux-noble-rocm-py3.12-mi300
uses: ./.github/workflows/_rocm-test.yml
needs:
- linux-noble-rocm-py3_12-build
- target-determination
with:
build-environment: ${{ needs.linux-noble-rocm-py3_12-build.outputs.build-environment }}
docker-image: ${{ needs.linux-noble-rocm-py3_12-build.outputs.docker-image }}
test-matrix: ${{ needs.linux-noble-rocm-py3_12-build.outputs.test-matrix }}
secrets: inherit