pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2026-01-15 12:15:51 +00:00

Files

Luke Yeager 82e318cf8b Optimizer: one LR op per (device, optimizer)

Summary:
Try running this script through `nvprof`:
```py
import numpy as np
from caffe2.proto import caffe2_pb2
from caffe2.python import brew, core, optimizer, workspace
from caffe2.python.model_helper import ModelHelper

do = core.DeviceOption(caffe2_pb2.CUDA, 0)
with core.DeviceScope(do):
    model = ModelHelper(arg_scope={'order': 'NCHW'})
    conv1 = brew.conv(model, 'data', 'conv1', 1, 20, 5)
    pool1 = brew.max_pool(model, conv1, 'pool1', kernel=2, stride=2)
    conv2 = brew.conv(model, pool1, 'conv2', 20, 50, 5)
    pool2 = brew.max_pool(model, conv2, 'pool2', kernel=2, stride=2)
    fc3 = brew.fc(model, pool2, 'fc3', 50 * 4 * 4, 500)
    fc3 = brew.relu(model, fc3, fc3)
    pred = brew.fc(model, fc3, 'pred', 500, 10)
    softmax, loss = model.SoftmaxWithLoss([pred, 'label'], ['softmax', 'loss'])
    model.AddGradientOperators([loss])
    optimizer.build_sgd(model, 0.01,
                        policy='step', stepsize=1, gamma=0.999,
                        momentum=0.9, nesterov=False)
    workspace.FeedBlob('data', np.zeros((1, 1, 28, 28), dtype=np.float32))
    workspace.FeedBlob('label', np.zeros((1, 1), dtype=np.int32))

workspace.RunNetOnce(model.param_init_net)
workspace.CreateNet(model.net)

for _ in range(100):
    workspace.RunNet(model.net)
```
Before this change:
```
                    1.55%  1.4185ms       837  1.6940us  1.6630us  2.4000us  [CUDA memcpy HtoD]
                    0.72%  656.03us       200  3.2800us  3.1350us  3.5840us  [CUDA memcpy DtoD]
                    0.39%  7.1574ms      1034  6.9220us  3.8300us  18.677us  cudaMemcpyAsync
                    0.00%  34.180us         3  11.393us  9.0960us  12.910us  cudaMemcpy
```
And after it (look at the third column):
```
                    0.73%  657.15us       200  3.2850us  3.1040us  3.6160us  [CUDA memcpy DtoD]
                    0.26%  235.07us       137  1.7150us  1.6640us  2.3680us  [CUDA memcpy HtoD]
                    0.20%  3.4493ms       334  10.327us  6.4220us  16.958us  cudaMemcpyAsync
                    0.00%  37.376us         3  12.458us  9.4120us  15.412us  cudaMemcpy
```
That makes a pretty big difference in performance. Is there any particular reason you decided to have a separate `LearningRate` op for every parameter in 1317e3498c?
Closes https://github.com/caffe2/caffe2/pull/893

Reviewed By: kennyhorror

Differential Revision: D5372541

Pulled By: asaadaldien

fbshipit-source-id: 57357e1be2d58ce294058e9422fb3b1eddfca24d

2017-07-12 21:17:49 -07:00

docs

Dict fixes/improvements and unittest targets for Python 3 in caffe2 core

2017-06-29 17:05:41 -07:00

examples

Fixed typo

2017-06-23 14:02:40 -07:00

helpers

Adding tanh to brew

2017-07-11 18:17:52 -07:00

layers

Allow to import subclasses of layers

2017-07-12 20:19:47 -07:00

mint

doxygen python block added

2017-03-29 06:46:16 -07:00

mkl

Deprecate CNNModelHelper - Inception()

2017-06-15 14:03:27 -07:00

modeling

allow param_info to set optimizer

2017-07-12 08:49:48 -07:00

models

fast simple-net memonger for C++

2017-07-06 15:17:07 -07:00

operator_test

Implemented GRUCell

2017-07-10 17:52:25 -07:00

predictor

Dict fixes/improvements and unittest targets for Python 3 in caffe2 core

2017-06-29 17:05:41 -07:00

rnn

Moved sigmoid, tanh, and _prepare_lstm (renamed) to a util file.

2017-07-10 17:52:22 -07:00

_import_c_extension.py

doxygen python block added

2017-03-29 06:46:16 -07:00

attention.py

Unrolled test for AttentionCell

2017-06-25 17:21:24 -07:00

brew_test.py

quick fix for model_helper __init__

2017-07-12 08:49:48 -07:00

brew.py

Adding tanh to brew

2017-07-11 18:17:52 -07:00

caffe_translator_test.py

Allow test discovery in caffe2/python/

2017-03-14 18:16:41 -07:00

caffe_translator.py

Read pretrained weights using binary mode in caffe_translator.py

2017-07-08 10:17:57 -07:00

checkpoint_test.py

Allow tasks/execution_steps to be cloned at runtime

2017-06-20 22:32:07 -07:00

checkpoint.py

Adds interfaces to check the existence of a DB

2017-04-11 14:07:49 -07:00

CMakeLists.txt

…

cnn.py

cnnmodelhelper deprecate warning

2017-05-18 23:35:26 -07:00

context_test.py

…

context.py

doxygen python block added

2017-03-29 06:46:16 -07:00

control_test.py

…

control.py

Dict fixes/improvements and unittest targets for Python 3 in caffe2 core

2017-06-29 17:05:41 -07:00

convnet_benchmarks_test.py

…

convnet_benchmarks.py

brew API in convnet benchmark

2017-07-05 10:34:48 -07:00

core_gradients_test.py

add debug information when there is blob version mismatch

2017-06-30 16:22:46 -07:00

core_test.py

single trainer hybrid device

2017-06-27 22:06:30 -07:00

core.py

Fix communication_schema decoding

2017-07-02 13:04:20 -07:00

crf.py

Deprecate CNNModelHelper in python/crf.py

2017-06-14 08:49:27 -07:00

data_parallel_model_test.py

Added device scope checks to data_parallel_model and data_parallel_rendevous

2017-07-12 10:47:28 -07:00

data_parallel_model.py

Added device scope checks to data_parallel_model and data_parallel_rendevous

2017-07-12 10:47:28 -07:00

data_workers_test.py

fix a rare race condition by initializing scratch blobs beforehand

2017-06-26 10:18:18 -07:00

data_workers.py

Dict fixes/improvements and unittest targets for Python 3 in caffe2 core

2017-06-29 17:05:41 -07:00

dataio_test.py

Allow tasks/execution_steps to be cloned at runtime

2017-06-20 22:32:07 -07:00

dataio.py

Fix a few typos and grammars in comment

2017-06-14 18:22:39 -07:00

dataset.py

Add random shuffle through the data to the benchmark workflow

2017-06-16 13:22:46 -07:00

db_test.py

String-related fixes for Python 3

2017-05-26 16:04:32 -07:00

device_checker.py

Dict fixes/improvements and unittest targets for Python 3 in caffe2 core

2017-06-29 17:05:41 -07:00

dyndep.py

doxygen python block added

2017-03-29 06:46:16 -07:00

empty.so

Adding video data layer for caffe2

2017-05-05 14:16:38 -07:00

experiment_util.py

Dict fixes/improvements and unittest targets for Python 3 in caffe2 core

2017-06-29 17:05:41 -07:00

extension_loader.py

Make extension loader properly handle visibility.

2017-03-30 14:38:38 -07:00

gradient_check_test.py

Cos, Sin, and Abs operators

2017-07-03 22:18:32 -07:00

gradient_checker.py

Fix a few typos and grammars in comment

2017-06-14 18:22:39 -07:00

gru_cell.py

Implemented GRUCell

2017-07-10 17:52:25 -07:00

hsm_util.py

doxygen python block added

2017-03-29 06:46:16 -07:00

hypothesis_test_util.py

Add min_satisfying_examples

2017-06-29 12:48:01 -07:00

hypothesis_test.py

Cos, Sin, and Abs operators

2017-07-03 22:18:32 -07:00

layer_model_helper.py

Dict fixes/improvements and unittest targets for Python 3 in caffe2 core

2017-06-29 17:05:41 -07:00

layer_model_instantiator.py

Remove map() and filter() in favor of comprehensions

2017-05-30 15:32:58 -07:00

layer_test_util.py

Core unit test fixes for Python 3

2017-06-23 13:22:16 -07:00

layers_test.py

make functional layer return scalar if only one output

2017-07-12 11:34:31 -07:00

load_save_test.py

Allow Load operator to load into overriden names

2017-04-27 01:18:12 -07:00

lstm_benchmark.py

Added flags to lstm, convnet and sparse_nn_benchmarks to print out operators

2017-06-30 23:47:04 -07:00

memonger_test.py

fast simple-net memonger for C++

2017-07-06 15:17:07 -07:00

memonger.py

fix for back-and-forth models, pass reference instead of copy

2017-07-11 10:52:14 -07:00

mkl_test_util.py

doxygen python block added

2017-03-29 06:46:16 -07:00

model_device_test.py

Deprecate CNNModelHelper in caffe2/python/model_device_test.py

2017-06-22 15:37:17 -07:00

model_helper.py

quick fix for model_helper __init__

2017-07-12 08:49:48 -07:00

mpi_python.cc

Fix pybind11 module name for MPI helpers

2017-05-02 23:18:50 -07:00

muji_test.py

Fixes range/xrange for Python 3

2017-06-07 00:04:26 -07:00

muji.py

Fixes range/xrange for Python 3

2017-06-07 00:04:26 -07:00

net_builder_test.py

Allow tasks/execution_steps to be cloned at runtime

2017-06-20 22:32:07 -07:00

net_builder.py

Allow tasks/execution_steps to be cloned at runtime

2017-06-20 22:32:07 -07:00

net_drawer.py

Dict fixes/improvements and unittest targets for Python 3 in caffe2 core

2017-06-29 17:05:41 -07:00

net_printer_test.py

Allow tasks/execution_steps to be cloned at runtime

2017-06-20 22:32:07 -07:00

net_printer.py

Fix net_printer.py

2017-07-11 15:26:52 -07:00

optimizer_context.py

allow param_info to set optimizer

2017-07-12 08:49:48 -07:00

optimizer_test_util.py

Fp16 training initializers

2017-06-01 08:34:46 -07:00

optimizer_test.py

allow param_info to set optimizer

2017-07-12 08:49:48 -07:00

optimizer.py

Optimizer: one LR op per (device, optimizer)

2017-07-12 21:17:49 -07:00

parallelize_gpu_bmuf_distributed_test.py

Add distributed BMUF implementation.

2017-06-21 16:18:11 -07:00

pipeline.py

Enable runtime cloning of tasks.

2017-06-21 03:18:20 -07:00

predictor_constants.py

Re-apply #266

2017-04-25 21:17:04 -07:00

pybind_state_gpu.cc

…

pybind_state_mkl.cc

…

pybind_state.cc

fast simple-net memonger for C++

2017-07-06 15:17:07 -07:00

pybind_state.h

fast simple-net memonger for C++

2017-07-06 15:17:07 -07:00

python_op_test.py

Fix some typos

2017-06-28 13:50:48 -07:00

queue_util.py

doxygen python block added

2017-03-29 06:46:16 -07:00

record_queue.py

Fix a few typos and grammars in comment

2017-06-14 18:22:39 -07:00

recurrent.py

Dict fixes/improvements and unittest targets for Python 3 in caffe2 core

2017-06-29 17:05:41 -07:00

rnn_cell.py

Dict fixes/improvements and unittest targets for Python 3 in caffe2 core

2017-06-29 17:05:41 -07:00

schema_test.py

Add __sub__ function for schema.Struct

2017-06-28 11:24:01 -07:00

schema.py

IndexHash

2017-07-07 23:06:11 -07:00

scope_test.py

Fix corruption of NameScope when exception is thrown

2017-04-24 22:46:27 -07:00

scope.py

Dict fixes/improvements and unittest targets for Python 3 in caffe2 core

2017-06-29 17:05:41 -07:00

session_test.py

Warn on setting blob on Scalar

2017-05-01 20:18:30 -07:00

session.py

Allow tasks/execution_steps to be cloned at runtime

2017-06-20 22:32:07 -07:00

sparse_to_dense_mask_test.py

String-related fixes for Python 3

2017-05-26 16:04:32 -07:00

task.py

Dict fixes/improvements and unittest targets for Python 3 in caffe2 core

2017-06-29 17:05:41 -07:00

test_util.py

doxygen python block added

2017-03-29 06:46:16 -07:00

text_file_reader.py

doxygen python block added

2017-03-29 06:46:16 -07:00

timeout_guard.py

Dict fixes/improvements and unittest targets for Python 3 in caffe2 core

2017-06-29 17:05:41 -07:00

toy_regression_test.py

…

tt_core_test.py

…

tt_core.py

Fix a few typos and grammars in comment

2017-06-14 18:22:39 -07:00

utils.py

Fast path for serializing large floating-point tensors to protobuf

2017-07-10 17:52:22 -07:00

visualize.py

Python 3 compatible integer division

2017-07-06 11:47:12 -07:00

workspace_test.py

Core unit test fixes for Python 3

2017-06-23 13:22:16 -07:00

workspace.py

Dict fixes/improvements and unittest targets for Python 3 in caffe2 core

2017-06-29 17:05:41 -07:00