pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2026-01-15 12:15:51 +00:00

Files

fduwjj ca4df16fdd [c10d] Make DebugInfoWriter Singleton across all PG objects (#116489 )

Previously, we have the writer register to each NCCL PG(backend), so for every pg, we have a NCCL PG instance, so if we use some customized writer when multiple sub-PGs are used, we need to ensure user to register the writer for every backend which indicates a bad UX. Furthermore, the debug info is global, so it does not make sense to have the writer for each instance. We even have a static mutex in the `dumpDebuggingInfo` to ensure we serialize the write, that makes it more obvious that we can make the writer a singleton so that we only have one writer instance for all PG instances.

Although the rationale is clear, the implementation may vary a lot. So this PR is RFC for now to see if this implementation makes sense or not.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116489
Approved by: https://github.com/kwen2501

2024-01-03 03:42:54 +00:00

aot_inductor

[AOTI][refactor] Refactor model runner API (#116047 )

2023-12-21 01:05:37 +00:00

api

[CI] Update clang-format (#116002 )

2023-12-18 14:58:46 +00:00

c10d

[c10d] Make DebugInfoWriter Singleton across all PG objects (#116489 )

2024-01-03 03:42:54 +00:00

common

…