Summary:
(Work in progress). This diff will allow shifting of activations to other GPUs, in case the model does not fit into memory. To see the API, check the code in data_parallel_model_test, which tests shifting two activations from 0 and 1 to gpu 4, and from gpu 2 and 3 to gpu 5.
I will need to further test on ResNets, and probablly add copy operations to handle device change points.
Reviewed By: asaadaldien
Differential Revision: D5591674
fbshipit-source-id: eb12d23651a56d64fa4db91090c6474218705270