Files
pytorch/caffe2/python
Aapo Kyrola 1c7886701e lr_scale to loss_scale
Summary:
As per discussion in https://www.prod.facebook.com/groups/184236721951559/permalink/354591931582703/, KaimingHe pointed out that scaling LR is not same as scaling Loss, since LR scaling will affect the weight decay (which is implemented by modifying the gradient, which thus is not yet correctly 'averaged'). Actually prigoyal tried to convince me earlier that loss scaling is the way to go, but I was then not convinved :/.

So this diff removes the LR scaling parameter passed by data_parallel_model and instead passes a loss_scale parameter to the model creation function. Unfortunately, this will break all existing code that uses the data parallel model. But that is not only a bad thing, since it will bring awareness to this change. I will inform in the FB groups about this.

In this diff I modified all my models to work correctly.

Reviewed By: Yangqing

Differential Revision: D4507002

fbshipit-source-id: 16c7221663282f71a1b754b34de0c8ccd5c2ca90
2017-02-03 07:44:40 -08:00
..
2017-02-03 07:44:40 -08:00
2017-01-27 19:44:31 -08:00
2017-01-04 20:58:35 -08:00
2017-01-27 22:29:32 -08:00
2017-01-27 22:29:32 -08:00
2017-01-11 16:59:22 -08:00
2017-02-02 22:29:22 -08:00
2017-02-03 07:44:40 -08:00
2016-12-21 09:29:43 -08:00
2017-02-02 13:59:30 -08:00
2016-11-29 15:18:37 -08:00
2016-09-06 15:55:19 -07:00
2016-10-07 13:08:53 -07:00
2016-10-07 13:08:53 -07:00
2017-01-19 16:14:23 -08:00
2016-12-15 19:59:24 -08:00
2016-10-07 13:08:53 -07:00
2017-02-02 22:29:22 -08:00
2016-09-06 15:55:19 -07:00
2016-11-18 15:41:06 -08:00
2017-01-23 09:59:30 -08:00
2016-08-10 11:02:15 -07:00
2016-08-10 11:02:15 -07:00
2016-08-10 11:02:15 -07:00
2016-05-13 14:43:48 -07:00
2017-02-02 22:29:22 -08:00