mirror of
https://github.com/zebrajr/pytorch.git
synced 2026-01-15 12:15:51 +00:00
[RESUBMIT] Cleanup error reporting for ProcessGroupNCCL (#112419)
Continuing some of the work from https://github.com/pytorch/pytorch/pull/108191, I realized majority of errors raised from ProcessGroupNCCL were just generic RuntimeError. In this PR, I've added appropriate error types to all the exceptions raised from ProcessGroupNCCL. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112419 Approved by: https://github.com/fduwjj
This commit is contained in:
committed by
PyTorch MergeBot
parent
cb942ef2b1
commit
e66ec5843f
@@ -224,7 +224,7 @@ TEST_F(ProcessGroupNCCLErrorsTest, testNCCLTimedoutErrorsBlocking) {
|
||||
// Now run all reduce with errors.
|
||||
pg.set_timedout_error();
|
||||
work = pg.allreduce(tensors_);
|
||||
EXPECT_THROW(work->wait(), std::runtime_error);
|
||||
EXPECT_THROW(work->wait(), c10::DistBackendError);
|
||||
|
||||
// Communicators might be aborted here, further operations would fail.
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user