Commit Graph

971 Commits

Author SHA1 Message Date
Ahmed Taei
804ebf7c41 Populate learning rate blob name into data_parallel_model and fix resnet50_trainer example.
Reviewed By: akyrola

Differential Revision: D5463772

fbshipit-source-id: 10b8963af778503a3de6edbabb869747bd1e986d
2017-07-21 16:24:10 -07:00
Alisson Gusatti Azzolini
8e80ef7e6d s/CopyGPUToGPU/Copy
Summary: CopyGPUToGPU does not exist. Copy seems to do the trick. Didn't go into details of how copy works, not sure if it ends up triggering UVA.

Reviewed By: akyrola

Differential Revision: D5471014

fbshipit-source-id: d8bc1aed9b19070c92f3ffc76f5617bdd0054563
2017-07-21 13:51:11 -07:00
Junjie Bai
efe2d01a3e Fix some bugs in CPU version of BooleanMask and add GPU version
Reviewed By: akyrola

Differential Revision: D5397208

fbshipit-source-id: 0314cc181e315f3b6cda846292b2e2ea73bb015b
2017-07-21 11:38:49 -07:00
Aapo Kyrola
cbb85545ec warn about orphan StopGradient output
Summary: Quite common confusion is how to use StopGradient, and typical bug is to forget to specify input=output. This adds a sanity check to gradient builder that checks if some StopGradient outputs are orphaned.

Reviewed By: dzhulgakov

Differential Revision: D5458341

fbshipit-source-id: 056fef4f0ee53eb10e66e9be0ecb55b55f9cc3d7
2017-07-20 21:41:41 -07:00
Ahmed Taei
bcce1bd04a Fix optimizer_context OSS test
Summary:
This will fix the test by querying how many instances of the optimizer are already created.
Because OSS tests doesn't run in isolation causing number of created instances of optimizer to be >= 0.

Reviewed By: akyrola

Differential Revision:
D5462433

Tags: easy

fbshipit-source-id: 7a9ab4fe5345f5d5138abb461ba7a990d9ace840
2017-07-20 12:21:09 -07:00
Honghao Wei
290acab2c7 implement drelu and unittest
Summary:
In this revision, I mainly implemented the DRelu activation. See https://arxiv.org/pdf/1706.06978v1.pdf for details.
To sum up, different from standard relu and purely, which divide the scope into two parts with boundary at zero, DRelu calculate another value p to divide the activation into two part. P is the softmax value of the output of Batch Normalization. For f(x)=x part in relu, you can find similar patten in f(x)=px, and for f(x)=0 part in rely, you can find similar pattern in f(x)=a(1-p)x, in which a is a parameter to tune. Drelu activation result is the sum of these two parts, f(x) = a(1-p)x + px.

To implement DRelu, I take BatchNormalization as super class and then use the above formula for computation. In order to allow users to choose activation methods, which usually takes place when calling add_mlp function in processor_util.py, I pass the parameter transfer in model_option from UI to the details, just as what dropout do. Currently, I place it in extra_option, but can modify it if AML team needs to redesign the UI.

I also add units test for DRelu. We check the shape of output and also do the numeric unit tests.
For Unit test, I first check the numeric value of BatchNormalization, since there is no similar test before. I then compute the value of DRelu outputs and compare the results with current DRelu layer.

Reviewed By: chocjy

Differential Revision: D5341464

fbshipit-source-id: 896b4dcc49cfd5493d97a8b448401b19e9c80630
2017-07-20 11:50:08 -07:00
Tao Wu
4a81b0f24a make SparseLookup support None pooling
Summary: Adding pooling option as None, and SparseLookup will gather the embedding for each id.

Reviewed By: kittipatv

Differential Revision: D5421667

fbshipit-source-id: 1e8e2b550893ff3869dab12f8eb1fe24a063c3d5
2017-07-18 16:39:55 -07:00
Geet Sethi
11c4647447 Allow CPU device scope in data_parallel_model and data_parallel_rendevous device scope checks
Summary: Allowing CPU device scope instead of enforcing no device scope in data_parallel_model and data_parallel_rendevous.

Reviewed By: akyrola

Differential Revision: D5440492

fbshipit-source-id: bcd4344d64c710ea50ec8a65e3e9d102e35c66ea
2017-07-18 15:47:41 -07:00
Jacqueline Xu
3cc03568da Fixing error message for layer model helper
Summary: - Minor fix for error message in layer model helper file

Reviewed By: chocjy

Differential Revision: D5440768

fbshipit-source-id: df47bfe68a0caa750f0d3c8def28a5585e465ee0
2017-07-18 09:52:45 -07:00
Bangsheng Tang
e5a7891038 dot product using matmul
Summary:
1. PairwiseDotProduct in layers
2. add_axis argument in Concat and Split(just for backward propagtion)

Reviewed By: xianjiec

Differential Revision: D5383208

fbshipit-source-id: 8e18ce371fff2da2da77b1a728142d69cd48e9c3
2017-07-17 23:20:37 -07:00
Tao Wu
427cc68ba2 added TensorInferenceFunction for ExpandDims operator; deleted Reshape layer.
Summary: The diff added TensorInferenceFunction for ExpandDims operator, so that ExpandDims layer is no longer needed (it can be handled by functional layer)

Reviewed By: kittipatv

Differential Revision: D5430889

fbshipit-source-id: 4f895f2751663c45db4cc4f87e5114c63cda9fbb
2017-07-17 21:03:00 -07:00
Tao Wu
78c4c4f885 handle RecurrentNetwork operator when clone net
Summary: added support of passing remap_funcs to clone_and_bind_net, so that it can pass it to clone method. Added other utils to ensure RecurrentNetwork operator is correctly cloned based on the remap_blob. The reason that RecurrentNetwork operator needs special treatment is that its arguments contain proto and blobs.

Reviewed By: kittipatv

Differential Revision: D5421532

fbshipit-source-id: 5de68365ce97df2de483f02ad260d78c8d35eead
2017-07-17 17:33:21 -07:00
Victor Gao
f7a92145d4 comment out unused parameter in pybind_state.cc
Summary:
This removes/comments out/silences one or more unused parameters in the files.
We are going to enable `-Wunused-parameter` in fbcode and this fixes a case that automated tooling can't handle.

This diff is automatically generated.
Reviewers are added heuristically.

Reviewed By: dzhulgakov

Differential Revision: D5437217

fbshipit-source-id: c2fc5ed30e7ee47b8c40248f89a9f4304ce7c098
2017-07-17 15:57:49 -07:00
Aapo Kyrola
baef769035 add code comments to memonger
Summary: Add some comments to dag-memonger to help asaadaldien with his C++ port.

Reviewed By: asaadaldien

Differential Revision: D5435459

fbshipit-source-id: dd5d482efb017418d22f42ee79fbd4668bd31bdd
2017-07-17 13:07:33 -07:00
Geet Sethi
2dc8851206 RNN Workspace Blob Extraction
Summary:
Added operator RecurrentNetworkBlobFetcherOp that takes as input a scratch workspace name and prefix, and copies over all blobs in the scratch workspace into the global workspace. This essentially extracts all intermediate recurrent network computation for each timestep.

Added a wrapper in recurrent.py - retrieve_step_blobs(net, prefix='rnn') - which, when called after an rnn is run, will return a list of all blobs extracted from the net.

Reviewed By: akyrola

Differential Revision: D5421926

fbshipit-source-id: 0f35b466d77d3c719fb0e32de7dbcafc6c0d5225
2017-07-17 10:24:18 -07:00
Huazhong Ning
9e2c74cc58 Use scope name for dataset cursor
Summary: Currently the dataset cursor blob is using a fixed name. When we read from multi input tables, the dataset cursor of each table is using the same blob. This messed up the split queue and crashed the reader pipelines (see the errors and failures in https://fb.quip.com/uzbIA7K0PgVe)

Reviewed By: dragonxlwang, rayleichen

Differential Revision: D5419863

fbshipit-source-id: 5983a3d8d2e286dc47c2ec38ed1dbbe30c7c9b49
2017-07-15 19:22:32 -07:00
Yangqing Jia
b6691277f5 binary size util
Summary: This would allow us to inspect the binary size of the builds more easily.

Reviewed By: jonmorton

Differential Revision: D4553515

fbshipit-source-id: 95371bf67e66490a8653b874e1ff79cc987805e6
2017-07-14 17:49:24 -07:00
Honghao Wei
b68adec7bb adding model loss logic
Summary: Add api model.add_loss(), which allows adding loss, such as optimization and regularization. See change in sparse_nn.py, in which 'model.loss = loss' is changed to 'model.add_loss(loss)'.

Reviewed By: xianjiec

Differential Revision: D5399056

fbshipit-source-id: 13b2ced4b75d129a5ee4a9b0e989606c04d2ca8b
2017-07-14 16:25:23 -07:00
Alexander Sidorov
bd29260f47 hyposesis_test grad_reference bug fixes
Summary:
1. it was easy to pass grad_reference which was just ignored due to missing output_to_grad
2. threshold was not passed to the gradient checkinglogic

Reviewed By: dzhulgakov

Differential Revision: D5425226

fbshipit-source-id: 2eb41f2601d5e356f7872e57724d08ab2e742329
2017-07-14 14:41:23 -07:00
Jacqueline Xu
2aa8fc7e8d Implementing Semi-Random Features Layer
Summary:
- (Split diff from Arc Cosine)
- Implemented [[ https://arxiv.org/pdf/1702.08882.pdf | Semi-Random Features ]] Layer
- Created a buck unit test for SRF Layer

Reviewed By: chocjy

Differential Revision: D5374803

fbshipit-source-id: 0293fd91ed5bc19614d418c2fce9c1cfdd1128ae
2017-07-14 13:15:50 -07:00
Junjie Bai
a305ce3ece Fix broken seq2seq example
Reviewed By: harouwu

Differential Revision: D5423060

fbshipit-source-id: 4537b020546503a1f9cb237257ab3c42665ae07f
2017-07-13 23:31:54 -07:00
Aapo Kyrola
f44991b398 add timeout argument to DequeueBlobs; use 10 min timeout for data workers
Summary: As title. This helps with (quite common) cases where data input is stuck for reason or another, and the net execution never proceeds and is stuck forever.

Reviewed By: andrewwdye

Differential Revision: D5409885

fbshipit-source-id: 840261fd5964408f788fc0f50ece0d74193694ac
2017-07-13 18:52:03 -07:00
Honghao Wei
34f7acbedf Report bugs in BatchNormalization, the dimension is wrong for second order
Summary: The number input dimension for NHWC should be the last dimension C. Since batch size is omitted, it should be 2 instead of 3.

Reviewed By: chocjy

Differential Revision: D5418538

fbshipit-source-id: a6939a863817b7566198ea2a665a1d236a2cf63d
2017-07-13 18:31:18 -07:00
Ahmed Taei
13980d2bb5 Set device to the default device(CPU) when DeviceContext is None.
Summary:
Fix case when optimizer isn't called within a device scope context.
Fix OptimizerContext lr blob names

Reviewed By: volkhin

Differential Revision: D5421046

fbshipit-source-id: 186a0d05f40d4442c5ba5736084626da73a0c0f1
2017-07-13 17:54:36 -07:00
Geet Sethi
ab0d631d6d Adding AllCompare-like function to data_parallel_model
Summary: Added function _RunComparison to data_parallel_model that checks if all shards in a given rendevous have the same value for a given blob_name

Reviewed By: wesolwsk

Differential Revision: D5394164

fbshipit-source-id: c2b07d0f8d5846fa9887d53b0be091a8c057f106
2017-07-13 13:03:57 -07:00
Aapo Kyrola
59c0bb9e5a fix for duplicate input case
Summary: Fix a bug reported by dzhulgakov that occurs when input blobs is used twice in a same op --> it was released to the recycled blobs pool twice.

Reviewed By: dzhulgakov, volkhin

Differential Revision: D5414023

fbshipit-source-id: 861bb46fe901023cb9a496401736e6ecb77d5fae
2017-07-13 01:51:30 -07:00
Jiyan Yang
043640c3eb Return top K classes
Reviewed By: kittipatv

Differential Revision: D5363481

fbshipit-source-id: 27ce37878434917c1a7c5f325ed77c989a1448af
2017-07-13 00:20:00 -07:00
Ahmed Taei
3faca65adf Add a unit-test to validate sharing learning rate between
Reviewed By: kennyhorror

Differential Revision: D5413387

fbshipit-source-id: ff4022375183394ca9cee6faea5ac46e56079b86
2017-07-12 21:53:25 -07:00
Luke Yeager
82e318cf8b Optimizer: one LR op per (device, optimizer)
Summary:
Try running this script through `nvprof`:
```py
import numpy as np
from caffe2.proto import caffe2_pb2
from caffe2.python import brew, core, optimizer, workspace
from caffe2.python.model_helper import ModelHelper

do = core.DeviceOption(caffe2_pb2.CUDA, 0)
with core.DeviceScope(do):
    model = ModelHelper(arg_scope={'order': 'NCHW'})
    conv1 = brew.conv(model, 'data', 'conv1', 1, 20, 5)
    pool1 = brew.max_pool(model, conv1, 'pool1', kernel=2, stride=2)
    conv2 = brew.conv(model, pool1, 'conv2', 20, 50, 5)
    pool2 = brew.max_pool(model, conv2, 'pool2', kernel=2, stride=2)
    fc3 = brew.fc(model, pool2, 'fc3', 50 * 4 * 4, 500)
    fc3 = brew.relu(model, fc3, fc3)
    pred = brew.fc(model, fc3, 'pred', 500, 10)
    softmax, loss = model.SoftmaxWithLoss([pred, 'label'], ['softmax', 'loss'])
    model.AddGradientOperators([loss])
    optimizer.build_sgd(model, 0.01,
                        policy='step', stepsize=1, gamma=0.999,
                        momentum=0.9, nesterov=False)
    workspace.FeedBlob('data', np.zeros((1, 1, 28, 28), dtype=np.float32))
    workspace.FeedBlob('label', np.zeros((1, 1), dtype=np.int32))

workspace.RunNetOnce(model.param_init_net)
workspace.CreateNet(model.net)

for _ in range(100):
    workspace.RunNet(model.net)
```
Before this change:
```
                    1.55%  1.4185ms       837  1.6940us  1.6630us  2.4000us  [CUDA memcpy HtoD]
                    0.72%  656.03us       200  3.2800us  3.1350us  3.5840us  [CUDA memcpy DtoD]
                    0.39%  7.1574ms      1034  6.9220us  3.8300us  18.677us  cudaMemcpyAsync
                    0.00%  34.180us         3  11.393us  9.0960us  12.910us  cudaMemcpy
```
And after it (look at the third column):
```
                    0.73%  657.15us       200  3.2850us  3.1040us  3.6160us  [CUDA memcpy DtoD]
                    0.26%  235.07us       137  1.7150us  1.6640us  2.3680us  [CUDA memcpy HtoD]
                    0.20%  3.4493ms       334  10.327us  6.4220us  16.958us  cudaMemcpyAsync
                    0.00%  37.376us         3  12.458us  9.4120us  15.412us  cudaMemcpy
```
That makes a pretty big difference in performance. Is there any particular reason you decided to have a separate `LearningRate` op for every parameter in 1317e3498c?
Closes https://github.com/caffe2/caffe2/pull/893

Reviewed By: kennyhorror

Differential Revision: D5372541

Pulled By: asaadaldien

fbshipit-source-id: 57357e1be2d58ce294058e9422fb3b1eddfca24d
2017-07-12 21:17:49 -07:00
Jiyan Yang
d6f5452240 Allow to import subclasses of layers
Summary:
We want it to be able to register children of layers who
are not direct children of ModelLayer.
This requires us to find subclasses of ModelLayer recursively.

Reviewed By: kittipatv, kennyhorror

Differential Revision: D5397120

fbshipit-source-id: cb1e03d72e3bedb960b1b865877a76e413218a71
2017-07-12 20:19:47 -07:00
Tao Wu
02aa5ad9fb make functional layer return scalar if only one output
Summary: This diff makes functional layer return scalar if only one output. This diff also corrects all other corresponding implementations.

Reviewed By: kittipatv

Differential Revision: D5386853

fbshipit-source-id: 1f00582f6ec23384b2a6db94e19952836755ef42
2017-07-12 11:34:31 -07:00
Geet Sethi
a68bb5e3f9 Added device scope checks to data_parallel_model and data_parallel_rendevous
Summary:
Added device scope checks to data_parallel_model and data_parallel_rendevous

Added test to check that checks are working correctly to data_parallel_model_test

Fixed device_scope error in test_synchronization_barrier

Reviewed By: akyrola

Differential Revision: D5403936

fbshipit-source-id: 849c1cd7452692efbc5ef74d2d60ede090c9c017
2017-07-12 10:47:28 -07:00
Tao Wu
74fd4bf9e4 quick fix for model_helper __init__
Summary: the init method should also make _parameters_info shared between self and param_model, since params is shared. Otherwise it can cause a inconsistence between _parameters_info and params. Examples of using param_model can be find in rnn_cell.py.

Reviewed By: kennyhorror

Differential Revision: D5405327

fbshipit-source-id: ca8079058e898f529906452163cda234cb30a7df
2017-07-12 08:49:48 -07:00
Tao Wu
b9e64ecef1 allow param_info to set optimizer
Summary: this diff adds optimizer into param_info, and the associated implementations for modelhelper and brew to set optimizer for each individual parameter.

Reviewed By: kennyhorror

Differential Revision: D5385432

fbshipit-source-id: 5d682f9d1ab077e04a5d76a24d71470f4e64fc92
2017-07-12 08:49:48 -07:00
Mitchell Wortsman
823869ba79 Adding tanh to brew
Summary: Added tanh to brew.

Reviewed By: harouwu

Differential Revision: D5395358

fbshipit-source-id: 8eb5303f503e10aec4c59b42055933198d67e9b3
2017-07-11 18:17:52 -07:00
Dmytro Dzhulgakov
67d2f45e2f Fix net_printer.py
Summary: Fix the unprintable characters fix :)

Reviewed By: akyrola

Differential Revision: D5398914

fbshipit-source-id: 2c607c497f15e324e863ff1dae7bb16199d4074e
2017-07-11 15:26:52 -07:00
Aapo Kyrola
192e0546bf fix for back-and-forth models, pass reference instead of copy
Summary:
akirillov again presented me with a memonger-bug: his model that has kind of a 'back-and-forth structure' where blobs are passed left and right in a ladder-like structure, revealed a bug in memonger: I should pass the set of free blobs as a reference, not a copy so that the recyclings are properly accounted for. Hard to explain.

Since we have the graph verifier, we can be more confident with these changes.

I also added some helpful debug to the graph verifier.

Differential Revision: D5396925

fbshipit-source-id: 0bffb3a0bf8532afcd6b5bc9331c779768a8c5c5
2017-07-11 10:52:14 -07:00
Jacqueline Xu
e89e71c595 Simplifying Random Fourier Features and layer test
Summary:
- Condensed operators in RFF layer
- Adjusted RFF layer test; made test code more concise

Reviewed By: chocjy

Differential Revision: D5391436

fbshipit-source-id: 08748861cd6fb4a9e4cc9c8762996371492020a1
2017-07-11 00:40:53 -07:00
Robert Verkuil
97193478c7 Implemented GRUCell
Summary: Implemented python logic and tests to create an RNNCell for GRU.  Uses the preexisting GRU Unit Op code.

Reviewed By: salexspb

Differential Revision: D5364893

fbshipit-source-id: 2451d7ec8c2eacb8d8c9b7c893bfd21b65fb9d18
2017-07-10 17:52:25 -07:00
Robert Verkuil
2409c2e359 GRUUnit Op Backwards Pass
Summary:
Just an implementation of the forward pass of the GRU Unit Op, not the full RNNCell.
Functions were created to mimic LSTM implementation as closely as possible.
Backwards pass implementations are defined in GRU_unit_op.{h, cc}
assertGradientChecks call added to gru_cell_test.py

Reviewed By: salexspb

Differential Revision: D5364856

fbshipit-source-id: 09cff4478091827763b40cc331e4e0abf0ec258f
2017-07-10 17:52:24 -07:00
Robert Verkuil
279f3f095e Implemented Gated Recurrent Unit (GRU) c++ operator forward pass
Summary:
Just an implementation of the forward pass of the GRU Unit Op, not the full RNNCell.
Functions were created to mimic LSTM implementation as closely as possible.
Implementation defined in GRU_unit_op.{h, cc}
tests put in gru_cell_test.py, which import rnn_cell_test_util.py for sigmoid, tanh, and _prepare_rnn functions.

Reviewed By: jamesr66a

Differential Revision: D5363697

fbshipit-source-id: f9ba9fe0be01ffc868dd22027be8be4975b84998
2017-07-10 17:52:23 -07:00
Robert Verkuil
48bd102b95 Moved sigmoid, tanh, and _prepare_lstm (renamed) to a util file.
Summary:
Moved sigmoid, tanh, and _prepare_lstm (renamed) to a util file.
Also renamed _prepare_lstm to _preapare_rnn since it is being used for both setting up and LSTM and GRU model.

The reason for this commit is to allow the creation of GRU Op and testing code without copying and pasting code for sigmoid, tanh, and setting up an rnn unit op mode.

Reviewed By: jamesr66a

Differential Revision: D5363675

fbshipit-source-id: 352bd70378031f1d81606c9267e625c6728b18fd
2017-07-10 17:52:22 -07:00
Kevin Matzen
4b1ebd2f65 Fast path for serializing large floating-point tensors to protobuf
Summary: Our existing serialization routines take a significant amount of time for large numpy arrays in order to verify the type of each element in the array as well as converting each element to a canonical type.  For large floating-point tensors, such as model parameters, this checking and converting takes a significant amount of time.  Adding a fast track path for just float32 arrays as this is the most common use case to worry about.

Reviewed By: akyrola

Differential Revision: D5389953

fbshipit-source-id: 26f44cb2426ea3efb849e7707b27d5485f69956c
2017-07-10 17:52:22 -07:00
Kevin Matzen
c096c188c3 minor leaky relu bug fixes
Summary:
numpy.random.rand generates samples from [0, 1) and therefore, the leaky relu test cases weren't testing negative inputs.  Tests still pass after change.

Leaky relu can be used in-place, but gradient took X rather than Y.  Technically, the result is no different as it's just used for a sign test in the gradient, but updated it to take Y to reduce confusion.

Differential Revision: D5390126

fbshipit-source-id: d0c428abbb2797eb33902a7d2a2f59d5e85daaa6
2017-07-10 16:04:45 -07:00
Kevin Matzen
720db19fa2 make GetComputedParams work like GetParams
Summary: GetComputedParams tests namescopes with equality while GetParams tests with a prefix.  Switching GetComputedParams to also use a prefix so that both functions have similar usages.

Reviewed By: akyrola

Differential Revision: D5389816

fbshipit-source-id: 0e43e4b491fccbad3b855b6b735dc2b91d7626c9
2017-07-10 12:30:44 -07:00
Junjie Bai
ff3996acb9 Add NormalizeL1Op for doing L1 nomalization along given axis
Reviewed By: salexspb

Differential Revision: D5380220

fbshipit-source-id: 38fc56a1013c25b0c8b0fc161ca54fea412fb8b2
2017-07-10 10:10:36 -07:00
Jacqueline Xu
6ea71155c1 Implementing Arc Cosine Layer
Summary:
- Implemented the [[ http://cseweb.ucsd.edu/~saul/papers/nips09_kernel.pdf | Arc Cosine ]] layer
  - Developed buck unit test for Arc Cosine

Reviewed By: chocjy

Differential Revision: D5367604

fbshipit-source-id: ffd3ee081bc055b06c075c34aa6ce329b62ce2e0
2017-07-10 10:10:36 -07:00
Jiyan Yang
3598bdd044 Modify samplingTrain layer to take more general inputs
Summary: As desc.

Reviewed By: kittipatv

Differential Revision: D5363486

fbshipit-source-id: cb8fa65d750e80d2bf3e9909ca9b2d83a5548099
2017-07-08 22:19:55 -07:00
Guillaume Dumont
dc13345eb3 Read pretrained weights using binary mode in caffe_translator.py
Summary:
Binary mode must be explicitly specified when reading binary files under windows.
Closes https://github.com/caffe2/caffe2/pull/883

Differential Revision: D5373073

Pulled By: Yangqing

fbshipit-source-id: afedebdc74c954dbb6d24c0bccc192c8712c4c88
2017-07-08 10:17:57 -07:00
Bangsheng Tang
5f63f5697a IndexHash
Summary:
1. IndexHashOp
2. Helper class SparseFeatureHash
3. FeatureSpec changes to add desired_hash_size

Reviewed By: kennyhorror

Differential Revision: D5361370

fbshipit-source-id: bf02e3ca12b3654f1d291f77c8af9248b6c4ac55
2017-07-07 23:06:11 -07:00