pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2026-01-15 12:15:51 +00:00

Files

Aapo Kyrola 1c7886701e lr_scale to loss_scale

Summary:
As per discussion in https://www.prod.facebook.com/groups/184236721951559/permalink/354591931582703/, KaimingHe pointed out that scaling LR is not same as scaling Loss, since LR scaling will affect the weight decay (which is implemented by modifying the gradient, which thus is not yet correctly 'averaged'). Actually prigoyal tried to convince me earlier that loss scaling is the way to go, but I was then not convinved :/.

So this diff removes the LR scaling parameter passed by data_parallel_model and instead passes a loss_scale parameter to the model creation function. Unfortunately, this will break all existing code that uses the data parallel model. But that is not only a bad thing, since it will bring awareness to this change. I will inform in the FB groups about this.

In this diff I modified all my models to work correctly.

Reviewed By: Yangqing

Differential Revision: D4507002

fbshipit-source-id: 16c7221663282f71a1b754b34de0c8ccd5c2ca90

2017-02-03 07:44:40 -08:00

examples

lr_scale to loss_scale

2017-02-03 07:44:40 -08:00

layers

Rely on embedding size in split

2017-01-27 19:44:31 -08:00

mint

goodbye old brewery

2017-01-04 20:58:35 -08:00

models

add an option to use a resnet network instead of alexnet

2017-01-31 16:59:30 -08:00

operator_test

Shape and Type Inference Part1

2017-02-02 22:29:22 -08:00

_import_c_extension.py

fbsync. TODO: check if build files need update.

2016-11-15 00:00:46 -08:00

caffe_translator_test.py

protected legacy_pad_, replace DeleteDropout with is_test=True

2016-07-29 11:44:55 -07:00

caffe_translator.py

Make translator work as command line tool

2017-02-01 13:14:26 -08:00

checkpoint_test.py

Snapshot -> Checkpoint

2017-01-27 22:29:32 -08:00

checkpoint.py

Snapshot -> Checkpoint

2017-01-27 22:29:32 -08:00

CMakeLists.txt

CMake completions work

2017-01-11 16:59:22 -08:00

cnn.py

remove recurrent_inputs in a favor of recurrent_input_ids

2017-01-31 13:14:33 -08:00

context.py

fbsync. TODO: check if build files need update.

2016-11-15 00:00:46 -08:00

control_test.py

fbsync. TODO: check if build files need update.

2016-11-15 00:00:46 -08:00

control.py

Better visualization for gpu training plan

2016-12-21 09:29:43 -08:00

convnet_benchmarks_test.py

chunky sync - build scripts to be written

2016-07-21 10:16:42 -07:00

convnet_benchmarks.py

specify path to write htrace logs

2016-12-27 11:44:31 -08:00

core_gradients_test.py

Skip sparse tests if operators not available

2016-12-19 15:59:32 -08:00

core_test.py

Shape and Type Inference Part1

2017-02-02 22:29:22 -08:00

core.py

Nodes to support resource requirements and outputs

2017-01-30 11:29:25 -08:00

data_parallel_model_test.py

lr_scale to loss_scale

2017-02-03 07:44:40 -08:00

data_parallel_model.py

lr_scale to loss_scale

2017-02-03 07:44:40 -08:00

data_workers_test.py

a couple small reliability improvements

2016-12-15 21:29:29 -08:00

data_workers.py

fix code doc for data_workers

2016-12-21 09:29:43 -08:00

dataio_test.py

fbsync. TODO: check if build files need update.

2016-11-15 00:00:46 -08:00

dataio.py

Add numSample field for preComputing

2017-02-02 13:59:30 -08:00

dataset.py

fbsync. TODO: check if build files need update.

2016-11-15 00:00:46 -08:00

db_test.py

Fix db_test under tsan

2016-11-29 15:18:37 -08:00

device_checker.py

chunky sync

2016-09-06 15:55:19 -07:00

dyndep.py

fbsync. TODO: check if build files need update.

2016-11-15 00:00:46 -08:00

experiment_util.py

use Pieter-MPI and fb.distributed

2016-11-29 15:18:36 -08:00

extension_loader.py

fbsync

2016-10-07 13:08:53 -07:00

gradient_check_test.py

Fix test cases: tensor of size 0 not supported by GPU ops yet.

2016-12-15 19:59:24 -08:00

gradient_checker.py

fbsync

2016-10-07 13:08:53 -07:00

hsm_util.py

Generate huffman tree

2017-01-19 16:14:23 -08:00

hypothesis_test_util.py

Fix for gradient propagation for initial recurrent state for RecurrentNetwork

2017-01-30 18:59:32 -08:00

hypothesis_test.py

remove recurrent_inputs in a favor of recurrent_input_ids

2017-01-31 13:14:33 -08:00

layer_model_helper.py

Create only one instance of SigridTransform in DPerExample.

2017-01-22 19:29:16 -08:00

layer_model_instantiator.py

fbsync. TODO: check if build files need update.

2016-11-15 00:00:46 -08:00

load_save_test.py

fbsync. TODO: check if build files need update.

2016-11-15 00:00:46 -08:00

memonger_test.py

Gradient Input memory sharing using memonger blob sharing

2017-01-09 19:44:23 -08:00

memonger.py

Fixes to topological sort, canonical blob naming, sharing final blob

2017-01-25 15:14:26 -08:00

mkl_test_util.py

MKLDevice and MKLOperator

2016-12-15 19:59:24 -08:00

model_device_test.py

Comment out NHWC Alexnet test for now

2017-01-23 13:59:29 -08:00

model_helper.py

Latest fixes to Xray Flow workflows for Caffe2

2017-01-10 12:59:23 -08:00

mpi_python.cc

Move mpi_python.cc to the python folder to be more consistent about source file locations.

2017-01-09 10:59:39 -08:00

muji_test.py

chunky sync - build scripts to be written

2016-07-21 10:16:42 -07:00

muji.py

fbsync

2016-10-07 13:08:53 -07:00

net_builder_test.py

Improvements+fixes for NetBuilder

2017-01-03 16:59:24 -08:00

net_builder.py

Fix ops.stop_if() from inside processors

2017-02-02 15:14:27 -08:00

net_drawer.py

Caffe2 graph to json for visualization in flow

2017-01-25 19:44:20 -08:00

pipeline.py

Fix ops.stop_if() from inside processors

2017-02-02 15:14:27 -08:00

pybind_state_gpu.cc

fbsync. TODO: check if build files need update.

2016-11-15 00:00:46 -08:00

pybind_state_mkl.cc

Expose MKLMemory to the Python Feed and Fetch interface, and misc changes

2016-11-29 15:18:36 -08:00

pybind_state.cc

Shape and Type Inference Part1

2017-02-02 22:29:22 -08:00

pybind_state.h

Allow PythonOp to access the workspace

2016-12-05 11:53:26 -08:00

python_op_test.py

Allow PythonOp to access the workspace

2016-12-05 11:53:26 -08:00

queue_util.py

fbsync. TODO: check if build files need update.

2016-11-15 00:00:46 -08:00

record_queue.py

chunky sync

2016-09-06 15:55:19 -07:00

recurrent.py

remove recurrent_inputs in a favor of recurrent_input_ids

2017-01-31 13:14:33 -08:00

schema_test.py

Disallow duplicate field names in Struct

2017-01-30 14:44:28 -08:00

schema.py

Disallow duplicate field names in Struct

2017-01-30 14:44:28 -08:00

scope_test.py

fbsync. TODO: check if build files need update.

2016-11-15 00:00:46 -08:00

scope.py

fbsync. TODO: check if build files need update.

2016-11-15 00:00:46 -08:00

session_test.py

fbsync. TODO: check if build files need update.

2016-11-15 00:00:46 -08:00

session.py

fbsync at f5a877

2016-11-18 15:41:06 -08:00

sparse_to_dense_mask_test.py

Fix few more operators to handle empty batches correctly.

2016-11-29 15:18:37 -08:00

task.py

Nodes to support resource requirements and outputs

2017-01-30 11:29:25 -08:00

test_util.py

MKL convolution operator

2017-01-23 09:59:30 -08:00

text_file_reader.py

fix race condition in text_file_reader.py

2016-11-29 15:18:36 -08:00

timeout_guard.py

fbsync. TODO: check if build files need update.

2016-11-15 00:00:46 -08:00

toy_regression_test.py

sync

2016-08-10 11:02:15 -07:00

tt_core_test.py

sync

2016-08-10 11:02:15 -07:00

tt_core.py

sync

2016-08-10 11:02:15 -07:00

utils.py

@debug decorator to make it easier to use dropin debugger

2017-01-23 09:44:26 -08:00

visualize.py

chunky sync

2016-05-13 14:43:48 -07:00

workspace_test.py

Remove redundant and failing test of FeedBlob asserts

2016-12-22 14:59:28 -08:00

workspace.py

Shape and Type Inference Part1

2017-02-02 22:29:22 -08:00