pytorch/caffe2/python at 89c08334bb5439bea4742e97847523459b6a97f8 - pytorch - Carlos Sousa's Git

OSSForks/pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2026-01-15 12:15:51 +00:00

Files

History

Aapo Kyrola 89c08334bb data_parallel_model support for sparse gradients and CPU ops

Summary:
Data parallel model did not support sparse operations, nor gradients computed on CPU ops.

Currently sparse operations are done on CPU, so there is no point of "data parallelizing" them. I had to make a few changes to data_parallel_model to support this:
 1. Model can have params that are added prior to adding the data parallel part. For example, a lookup table of word vectors would be a parameter that is non-parallel.
 2. Thus, when data parallel model is called, it will separate the non-parallel params and avoid working on them. Note: when we add distributed version, we need to explicitly handle them with AllGather!

This works nicely since Caffe2 automatically adds the backward concat-operator when multiple ops gather from the same blob.

I also added support for data parallel CPU ops, which might be necessary in cases when we don't have GPU implemenation of some ops.

Test in data_parallel_model_test validates the correctness of the code by running the same trainer on different number of gpus and checking the end result is same.

Reviewed By: jhcross

Differential Revision: D4649208

fbshipit-source-id: e3b7ae701ead468dc94c52a976eafec5c9831097

2017-03-09 13:48:41 -08:00

..

Documenation generation to wiki

2017-02-15 16:00:44 -08:00

Distrubited Multi-GPU resnet50

2017-03-08 11:39:29 -08:00

clean old unit test, add sum processor and sqrt pooling

2017-03-08 23:04:19 -08:00

goodbye old brewery

2017-01-04 20:58:35 -08:00

Added model downloader

2017-02-22 12:47:15 -08:00

Implement recurrent attention in C2

2017-03-08 11:21:28 -08:00

_import_c_extension.py

…

attention.py

Implement recurrent attention in C2

2017-03-08 11:21:28 -08:00

caffe_translator_test.py

…

caffe_translator.py

translator fix to solve Aaron's issue

2017-02-13 11:19:13 -08:00

checkpoint_test.py

Fix issues pickling jobs

2017-02-21 20:47:27 -08:00

checkpoint.py

Fix issues pickling jobs

2017-02-21 20:47:27 -08:00

CMakeLists.txt

CMake completions work

2017-01-11 16:59:22 -08:00

cnn.py

Do not initialize BN params if init_params is false.

2017-02-27 20:19:03 -08:00

context_test.py

Make ContextManager thread-safe

2017-02-13 19:45:35 -08:00

context.py

Make ContextManager thread-safe

2017-02-13 19:45:35 -08:00

control_test.py

…

control.py

Better visualization for gpu training plan

2016-12-21 09:29:43 -08:00

convnet_benchmarks_test.py

…

convnet_benchmarks.py

Convnet benchmark cudnn_ws

2017-03-02 15:32:37 -08:00

core_gradients_test.py

add inference for gradient ops + a couple of missing shape inference functions + fix to scalars

2017-02-28 23:33:32 -08:00

core_test.py

NextScopedBlob with well-defined behavior and respect namescope

2017-02-16 17:16:36 -08:00

core.py

New approach to metrics.

2017-03-06 14:48:16 -08:00

data_parallel_model_test.py

data_parallel_model support for sparse gradients and CPU ops

2017-03-09 13:48:41 -08:00

data_parallel_model.py

data_parallel_model support for sparse gradients and CPU ops

2017-03-09 13:48:41 -08:00

data_workers_test.py

close blobs queues when stopping + test

2017-02-27 10:07:57 -08:00

data_workers.py

Remove use of logging module and np.random.randint() due to deadlocks with forks

2017-03-01 03:32:56 -08:00

dataio_test.py

NextScopedBlob with well-defined behavior and respect namescope

2017-02-16 17:16:36 -08:00

dataio.py

fix typo in TextFileReader

2017-02-21 14:02:48 -08:00

dataset.py

…

db_test.py

…

device_checker.py

…

dyndep.py

…

experiment_util.py

…

extension_loader.py

…

gradient_check_test.py

Fix test cases: tensor of size 0 not supported by GPU ops yet.

2016-12-15 19:59:24 -08:00

gradient_checker.py

…

hsm_util.py

Generate huffman tree

2017-01-19 16:14:23 -08:00

hypothesis_test_util.py

CUDA version of elementwise power + rename to Pow + gradient

2017-03-07 10:20:40 -08:00

hypothesis_test.py

add AccumulateHistogramOp

2017-03-08 19:37:32 -08:00

introspect_vis.py

User input (Conv out, etc.)

2017-03-08 13:49:45 -08:00

layer_model_helper.py

Use new metric intefaces in trainer workflows.

2017-03-07 12:46:52 -08:00

layer_model_instantiator.py

Migrate realtime training workflows to use new metrics.

2017-03-08 23:49:41 -08:00

layers_test.py

Add a way do describe layers in a more AdHoc manner.

2017-02-27 23:30:39 -08:00

load_save_test.py

Add validation checks to load op

2017-03-06 09:46:35 -08:00

lstm_benchmark.py

LSTM benchmark (Caffe2 RNN based)

2017-02-28 23:17:26 -08:00

memonger_test.py

Gradient Input memory sharing using memonger blob sharing

2017-01-09 19:44:23 -08:00

memonger.py

Fixes to topological sort, canonical blob naming, sharing final blob

2017-01-25 15:14:26 -08:00

mkl_test_util.py

MKLDevice and MKLOperator

2016-12-15 19:59:24 -08:00

model_device_test.py

Comment out NHWC Alexnet test for now

2017-01-23 13:59:29 -08:00

model_helper.py

Added editDistance helper to caffe2 operators

2017-02-28 13:31:56 -08:00

mpi_python.cc

Move mpi_python.cc to the python folder to be more consistent about source file locations.

2017-01-09 10:59:39 -08:00

muji_test.py

…

muji.py

…

net_builder_test.py

Improvements+fixes for NetBuilder

2017-01-03 16:59:24 -08:00

net_builder.py

Improve "reporter net" design

2017-02-21 20:17:40 -08:00

net_drawer.py

Add model graph to dper_example

2017-02-07 13:03:54 -08:00

net_printer_test.py

Debug/Analysis tools for Jobs/ExecutionSteps

2017-02-06 17:31:20 -08:00

net_printer.py

Add task outputs and stop signals to net_printer

2017-03-07 01:21:40 -08:00

optimizer_test_util.py

refactor and modulize optimizers

2017-03-07 18:46:47 -08:00

optimizer_test.py

refactor and modulize optimizers

2017-03-07 18:46:47 -08:00

optimizer.py

refactor and modulize optimizers

2017-03-07 18:46:47 -08:00

pipeline.py

Better names for nets, steps and tasks

2017-02-09 16:33:54 -08:00

pybind_state_gpu.cc

Cudnn v6

2017-02-28 17:46:33 -08:00

pybind_state_mkl.cc

…

pybind_state.cc

Make ModelExporter.load_from_db() load to specific workspace

2017-03-08 09:31:42 -08:00

pybind_state.h

…

python_op_test.py

…

queue_util.py

Better names for nets, steps and tasks

2017-02-09 16:33:54 -08:00

record_queue.py

…

recurrent.py

Implement recurrent attention in C2

2017-03-08 11:21:28 -08:00

schema_test.py

schema.Struct.__add__

2017-02-06 13:47:58 -08:00

schema.py

Add a way do describe layers in a more AdHoc manner.

2017-02-27 23:30:39 -08:00

scope_test.py

…

scope.py

…

session_test.py

NextScopedBlob with well-defined behavior and respect namescope

2017-02-16 17:16:36 -08:00

session.py

Default LocalSession to current workspace.

2017-03-01 16:03:18 -08:00

sparse_to_dense_mask_test.py

…

task.py

Gather perf counters for distributed jobs

2017-02-21 22:06:25 -08:00

test_util.py

MKL convolution operator

2017-01-23 09:59:30 -08:00

text_file_reader.py

fix typo in TextFileReader

2017-02-21 14:02:48 -08:00

timeout_guard.py

Euthanize a process with timeout

2017-03-01 11:38:11 -08:00

toy_regression_test.py

…

tt_core_test.py

…

tt_core.py

…

utils.py

Add a create your own dataset tutorial

2017-02-22 03:31:47 -08:00

visualize.py

…

workspace_test.py

Remove redundant and failing test of FeedBlob asserts

2016-12-22 14:59:28 -08:00

workspace.py

backup functions for non-cuda cases

2017-02-28 22:07:54 -08:00