Files
pytorch/caffe2/python
Aapo Kyrola 89c08334bb data_parallel_model support for sparse gradients and CPU ops
Summary:
Data parallel model did not support sparse operations, nor gradients computed on CPU ops.

Currently sparse operations are done on CPU, so there is no point of "data parallelizing" them. I had to make a few changes to data_parallel_model to support this:
 1. Model can have params that are added prior to adding the data parallel part. For example, a lookup table of word vectors would be a parameter that is non-parallel.
 2. Thus, when data parallel model is called, it will separate the non-parallel params and avoid working on them. Note: when we add distributed version, we need to explicitly handle them with AllGather!

This works nicely since Caffe2 automatically adds the backward concat-operator when multiple ops gather from the same blob.

I also added support for data parallel CPU ops, which might be necessary in cases when we don't have GPU implemenation of some ops.

Test in data_parallel_model_test validates the correctness of the code by running the same trainer on different number of gpus and checking the end result is same.

Reviewed By: jhcross

Differential Revision: D4649208

fbshipit-source-id: e3b7ae701ead468dc94c52a976eafec5c9831097
2017-03-09 13:48:41 -08:00
..
2017-02-15 16:00:44 -08:00
2017-03-08 11:39:29 -08:00
2017-01-04 20:58:35 -08:00
2017-02-22 12:47:15 -08:00
2017-02-21 20:47:27 -08:00
2017-02-21 20:47:27 -08:00
2017-01-11 16:59:22 -08:00
2017-02-13 19:45:35 -08:00
2017-03-06 14:48:16 -08:00
2017-02-21 14:02:48 -08:00
2017-01-19 16:14:23 -08:00
2017-03-08 19:37:32 -08:00
2017-03-08 13:49:45 -08:00
2016-12-15 19:59:24 -08:00
2017-02-21 20:17:40 -08:00
2017-02-07 13:03:54 -08:00
2017-03-07 18:46:47 -08:00
2017-02-28 17:46:33 -08:00
2017-02-06 13:47:58 -08:00
2017-01-23 09:59:30 -08:00
2017-02-21 14:02:48 -08:00