Commit Graph

945 Commits

Author SHA1 Message Date
Jiyan Yang
043640c3eb Return top K classes
Reviewed By: kittipatv

Differential Revision: D5363481

fbshipit-source-id: 27ce37878434917c1a7c5f325ed77c989a1448af
2017-07-13 00:20:00 -07:00
Ahmed Taei
3faca65adf Add a unit-test to validate sharing learning rate between
Reviewed By: kennyhorror

Differential Revision: D5413387

fbshipit-source-id: ff4022375183394ca9cee6faea5ac46e56079b86
2017-07-12 21:53:25 -07:00
Luke Yeager
82e318cf8b Optimizer: one LR op per (device, optimizer)
Summary:
Try running this script through `nvprof`:
```py
import numpy as np
from caffe2.proto import caffe2_pb2
from caffe2.python import brew, core, optimizer, workspace
from caffe2.python.model_helper import ModelHelper

do = core.DeviceOption(caffe2_pb2.CUDA, 0)
with core.DeviceScope(do):
    model = ModelHelper(arg_scope={'order': 'NCHW'})
    conv1 = brew.conv(model, 'data', 'conv1', 1, 20, 5)
    pool1 = brew.max_pool(model, conv1, 'pool1', kernel=2, stride=2)
    conv2 = brew.conv(model, pool1, 'conv2', 20, 50, 5)
    pool2 = brew.max_pool(model, conv2, 'pool2', kernel=2, stride=2)
    fc3 = brew.fc(model, pool2, 'fc3', 50 * 4 * 4, 500)
    fc3 = brew.relu(model, fc3, fc3)
    pred = brew.fc(model, fc3, 'pred', 500, 10)
    softmax, loss = model.SoftmaxWithLoss([pred, 'label'], ['softmax', 'loss'])
    model.AddGradientOperators([loss])
    optimizer.build_sgd(model, 0.01,
                        policy='step', stepsize=1, gamma=0.999,
                        momentum=0.9, nesterov=False)
    workspace.FeedBlob('data', np.zeros((1, 1, 28, 28), dtype=np.float32))
    workspace.FeedBlob('label', np.zeros((1, 1), dtype=np.int32))

workspace.RunNetOnce(model.param_init_net)
workspace.CreateNet(model.net)

for _ in range(100):
    workspace.RunNet(model.net)
```
Before this change:
```
                    1.55%  1.4185ms       837  1.6940us  1.6630us  2.4000us  [CUDA memcpy HtoD]
                    0.72%  656.03us       200  3.2800us  3.1350us  3.5840us  [CUDA memcpy DtoD]
                    0.39%  7.1574ms      1034  6.9220us  3.8300us  18.677us  cudaMemcpyAsync
                    0.00%  34.180us         3  11.393us  9.0960us  12.910us  cudaMemcpy
```
And after it (look at the third column):
```
                    0.73%  657.15us       200  3.2850us  3.1040us  3.6160us  [CUDA memcpy DtoD]
                    0.26%  235.07us       137  1.7150us  1.6640us  2.3680us  [CUDA memcpy HtoD]
                    0.20%  3.4493ms       334  10.327us  6.4220us  16.958us  cudaMemcpyAsync
                    0.00%  37.376us         3  12.458us  9.4120us  15.412us  cudaMemcpy
```
That makes a pretty big difference in performance. Is there any particular reason you decided to have a separate `LearningRate` op for every parameter in 1317e3498c?
Closes https://github.com/caffe2/caffe2/pull/893

Reviewed By: kennyhorror

Differential Revision: D5372541

Pulled By: asaadaldien

fbshipit-source-id: 57357e1be2d58ce294058e9422fb3b1eddfca24d
2017-07-12 21:17:49 -07:00
Jiyan Yang
d6f5452240 Allow to import subclasses of layers
Summary:
We want it to be able to register children of layers who
are not direct children of ModelLayer.
This requires us to find subclasses of ModelLayer recursively.

Reviewed By: kittipatv, kennyhorror

Differential Revision: D5397120

fbshipit-source-id: cb1e03d72e3bedb960b1b865877a76e413218a71
2017-07-12 20:19:47 -07:00
Tao Wu
02aa5ad9fb make functional layer return scalar if only one output
Summary: This diff makes functional layer return scalar if only one output. This diff also corrects all other corresponding implementations.

Reviewed By: kittipatv

Differential Revision: D5386853

fbshipit-source-id: 1f00582f6ec23384b2a6db94e19952836755ef42
2017-07-12 11:34:31 -07:00
Geet Sethi
a68bb5e3f9 Added device scope checks to data_parallel_model and data_parallel_rendevous
Summary:
Added device scope checks to data_parallel_model and data_parallel_rendevous

Added test to check that checks are working correctly to data_parallel_model_test

Fixed device_scope error in test_synchronization_barrier

Reviewed By: akyrola

Differential Revision: D5403936

fbshipit-source-id: 849c1cd7452692efbc5ef74d2d60ede090c9c017
2017-07-12 10:47:28 -07:00
Tao Wu
74fd4bf9e4 quick fix for model_helper __init__
Summary: the init method should also make _parameters_info shared between self and param_model, since params is shared. Otherwise it can cause a inconsistence between _parameters_info and params. Examples of using param_model can be find in rnn_cell.py.

Reviewed By: kennyhorror

Differential Revision: D5405327

fbshipit-source-id: ca8079058e898f529906452163cda234cb30a7df
2017-07-12 08:49:48 -07:00
Tao Wu
b9e64ecef1 allow param_info to set optimizer
Summary: this diff adds optimizer into param_info, and the associated implementations for modelhelper and brew to set optimizer for each individual parameter.

Reviewed By: kennyhorror

Differential Revision: D5385432

fbshipit-source-id: 5d682f9d1ab077e04a5d76a24d71470f4e64fc92
2017-07-12 08:49:48 -07:00
Mitchell Wortsman
823869ba79 Adding tanh to brew
Summary: Added tanh to brew.

Reviewed By: harouwu

Differential Revision: D5395358

fbshipit-source-id: 8eb5303f503e10aec4c59b42055933198d67e9b3
2017-07-11 18:17:52 -07:00
Dmytro Dzhulgakov
67d2f45e2f Fix net_printer.py
Summary: Fix the unprintable characters fix :)

Reviewed By: akyrola

Differential Revision: D5398914

fbshipit-source-id: 2c607c497f15e324e863ff1dae7bb16199d4074e
2017-07-11 15:26:52 -07:00
Aapo Kyrola
192e0546bf fix for back-and-forth models, pass reference instead of copy
Summary:
akirillov again presented me with a memonger-bug: his model that has kind of a 'back-and-forth structure' where blobs are passed left and right in a ladder-like structure, revealed a bug in memonger: I should pass the set of free blobs as a reference, not a copy so that the recyclings are properly accounted for. Hard to explain.

Since we have the graph verifier, we can be more confident with these changes.

I also added some helpful debug to the graph verifier.

Differential Revision: D5396925

fbshipit-source-id: 0bffb3a0bf8532afcd6b5bc9331c779768a8c5c5
2017-07-11 10:52:14 -07:00
Jacqueline Xu
e89e71c595 Simplifying Random Fourier Features and layer test
Summary:
- Condensed operators in RFF layer
- Adjusted RFF layer test; made test code more concise

Reviewed By: chocjy

Differential Revision: D5391436

fbshipit-source-id: 08748861cd6fb4a9e4cc9c8762996371492020a1
2017-07-11 00:40:53 -07:00
Robert Verkuil
97193478c7 Implemented GRUCell
Summary: Implemented python logic and tests to create an RNNCell for GRU.  Uses the preexisting GRU Unit Op code.

Reviewed By: salexspb

Differential Revision: D5364893

fbshipit-source-id: 2451d7ec8c2eacb8d8c9b7c893bfd21b65fb9d18
2017-07-10 17:52:25 -07:00
Robert Verkuil
2409c2e359 GRUUnit Op Backwards Pass
Summary:
Just an implementation of the forward pass of the GRU Unit Op, not the full RNNCell.
Functions were created to mimic LSTM implementation as closely as possible.
Backwards pass implementations are defined in GRU_unit_op.{h, cc}
assertGradientChecks call added to gru_cell_test.py

Reviewed By: salexspb

Differential Revision: D5364856

fbshipit-source-id: 09cff4478091827763b40cc331e4e0abf0ec258f
2017-07-10 17:52:24 -07:00
Robert Verkuil
279f3f095e Implemented Gated Recurrent Unit (GRU) c++ operator forward pass
Summary:
Just an implementation of the forward pass of the GRU Unit Op, not the full RNNCell.
Functions were created to mimic LSTM implementation as closely as possible.
Implementation defined in GRU_unit_op.{h, cc}
tests put in gru_cell_test.py, which import rnn_cell_test_util.py for sigmoid, tanh, and _prepare_rnn functions.

Reviewed By: jamesr66a

Differential Revision: D5363697

fbshipit-source-id: f9ba9fe0be01ffc868dd22027be8be4975b84998
2017-07-10 17:52:23 -07:00
Robert Verkuil
48bd102b95 Moved sigmoid, tanh, and _prepare_lstm (renamed) to a util file.
Summary:
Moved sigmoid, tanh, and _prepare_lstm (renamed) to a util file.
Also renamed _prepare_lstm to _preapare_rnn since it is being used for both setting up and LSTM and GRU model.

The reason for this commit is to allow the creation of GRU Op and testing code without copying and pasting code for sigmoid, tanh, and setting up an rnn unit op mode.

Reviewed By: jamesr66a

Differential Revision: D5363675

fbshipit-source-id: 352bd70378031f1d81606c9267e625c6728b18fd
2017-07-10 17:52:22 -07:00
Kevin Matzen
4b1ebd2f65 Fast path for serializing large floating-point tensors to protobuf
Summary: Our existing serialization routines take a significant amount of time for large numpy arrays in order to verify the type of each element in the array as well as converting each element to a canonical type.  For large floating-point tensors, such as model parameters, this checking and converting takes a significant amount of time.  Adding a fast track path for just float32 arrays as this is the most common use case to worry about.

Reviewed By: akyrola

Differential Revision: D5389953

fbshipit-source-id: 26f44cb2426ea3efb849e7707b27d5485f69956c
2017-07-10 17:52:22 -07:00
Kevin Matzen
c096c188c3 minor leaky relu bug fixes
Summary:
numpy.random.rand generates samples from [0, 1) and therefore, the leaky relu test cases weren't testing negative inputs.  Tests still pass after change.

Leaky relu can be used in-place, but gradient took X rather than Y.  Technically, the result is no different as it's just used for a sign test in the gradient, but updated it to take Y to reduce confusion.

Differential Revision: D5390126

fbshipit-source-id: d0c428abbb2797eb33902a7d2a2f59d5e85daaa6
2017-07-10 16:04:45 -07:00
Kevin Matzen
720db19fa2 make GetComputedParams work like GetParams
Summary: GetComputedParams tests namescopes with equality while GetParams tests with a prefix.  Switching GetComputedParams to also use a prefix so that both functions have similar usages.

Reviewed By: akyrola

Differential Revision: D5389816

fbshipit-source-id: 0e43e4b491fccbad3b855b6b735dc2b91d7626c9
2017-07-10 12:30:44 -07:00
Junjie Bai
ff3996acb9 Add NormalizeL1Op for doing L1 nomalization along given axis
Reviewed By: salexspb

Differential Revision: D5380220

fbshipit-source-id: 38fc56a1013c25b0c8b0fc161ca54fea412fb8b2
2017-07-10 10:10:36 -07:00
Jacqueline Xu
6ea71155c1 Implementing Arc Cosine Layer
Summary:
- Implemented the [[ http://cseweb.ucsd.edu/~saul/papers/nips09_kernel.pdf | Arc Cosine ]] layer
  - Developed buck unit test for Arc Cosine

Reviewed By: chocjy

Differential Revision: D5367604

fbshipit-source-id: ffd3ee081bc055b06c075c34aa6ce329b62ce2e0
2017-07-10 10:10:36 -07:00
Jiyan Yang
3598bdd044 Modify samplingTrain layer to take more general inputs
Summary: As desc.

Reviewed By: kittipatv

Differential Revision: D5363486

fbshipit-source-id: cb8fa65d750e80d2bf3e9909ca9b2d83a5548099
2017-07-08 22:19:55 -07:00
Guillaume Dumont
dc13345eb3 Read pretrained weights using binary mode in caffe_translator.py
Summary:
Binary mode must be explicitly specified when reading binary files under windows.
Closes https://github.com/caffe2/caffe2/pull/883

Differential Revision: D5373073

Pulled By: Yangqing

fbshipit-source-id: afedebdc74c954dbb6d24c0bccc192c8712c4c88
2017-07-08 10:17:57 -07:00
Bangsheng Tang
5f63f5697a IndexHash
Summary:
1. IndexHashOp
2. Helper class SparseFeatureHash
3. FeatureSpec changes to add desired_hash_size

Reviewed By: kennyhorror

Differential Revision: D5361370

fbshipit-source-id: bf02e3ca12b3654f1d291f77c8af9248b6c4ac55
2017-07-07 23:06:11 -07:00
Geet Sethi
86b6a6e2f8 Added PiecewiseLinearTransform CUDA Op
Summary: Added a CUDA implementation of the PiecewiseLinearTransformOp.

Differential Revision: D5378537

fbshipit-source-id: 38857f59f5cc52e16e1ecc97983a0b0b82a46c74
2017-07-07 15:20:00 -07:00
Clément Godard
cb7f17ab64 added gradients for ResizeNearest (CPU + CUDA) and ref
Summary:
# Added the gradients of the operation for both CPU and CUDA kernels.
  # Unified variable names across all ops.
  # Added reference implementation in numpy.
  # The gradient check needs a larger stepsize to succeed, is that normal?

Reviewed By: akyrola

Differential Revision: D5313682

fbshipit-source-id: aceb92649e01c5caeba8774e678f9095502d396c
2017-07-07 14:19:42 -07:00
Ralph Mao
febae7b20b fix a bug in the report function of Data_Parallel
Summary: replace params with sp, otherwise it will report an empty list

Reviewed By: akyrola

Differential Revision: D5382716

fbshipit-source-id: 34d8e6ee00cbe1718702e3d1f23ea12f8d65063e
2017-07-07 13:03:46 -07:00
Jacqueline Xu
8cedf35d55 Adding Random Fourier Features to SparseNN Model and Flow
Summary:
- Integrated RFF into the preprocessing workflow for dense features
- Developed Flow interface to input RFF parameters
- Created unit test for using RFF with sparseNN

Reviewed By: chocjy

Differential Revision: D5367534

fbshipit-source-id: 07307259c501a614d9ee68a731f0cc8ecd17db68
2017-07-07 09:39:32 -07:00
Aapo Kyrola
ad62e82179 fast simple-net memonger for C++
Summary:
To be used with predictor "online": C++ version of memonger for simple nets. Very simple greedy algorithm. Works well at least on Resnet-50 inference graph: only 3 shared blobs are used.

Next I will integrate this with predictor and run canary (separate diff).

Reviewed By: asaadaldien

Differential Revision: D5375392

fbshipit-source-id: d36e419e39a32e568e105657c27fb00c85a2535d
2017-07-06 15:17:07 -07:00
Guillaume Dumont
e8689dda8f Python 3 compatible integer division
Summary:
As the title says.
Closes https://github.com/caffe2/caffe2/pull/879

Differential Revision: D5372787

Pulled By: akyrola

fbshipit-source-id: 0ff469c0d227f1b2252c1a0c4f6f8bebaac5580f
2017-07-06 11:47:12 -07:00
Andrew Dye
31f394f8b3 Add synchronization barrier API to data parallel model
Summary: Add synchronization barrier API with configurable timeout. Users can call Synchronize() to join variable length execution before resuming multi-machine communication steps, i.e., resuming distributed training iterations after validation on a single machine.

Reviewed By: akyrola

Differential Revision: D5348387

fbshipit-source-id: 5826da10e6a60c50394c36c7cf47624f10191d11
2017-07-06 09:21:19 -07:00
Aapo Kyrola
21ba0ff560 small fix to when input blob is input to multiple ops
Summary: Memonger had a bug that it crashes if an input blob was input to multiple ops. This fixes that and adds a test.

Reviewed By: asaadaldien

Differential Revision: D5374860

fbshipit-source-id: 1d5044001eacdbe6db43f69727da9297558f5c5c
2017-07-05 22:37:26 -07:00
Aapo Kyrola
2d133d4627 increase concurrency default
Summary: Huge improvement in my tests, and it does not really hurt either.

Reviewed By: wesolwsk

Differential Revision: D5374925

fbshipit-source-id: c96a4ed2ca653120a82233c0037cbfded8a2d2a1
2017-07-05 21:46:31 -07:00
Luke Yeager
be7725b0ba Tests: fix dpm test when only 1 GPU present
Summary:
b33894e95d removed this line:
```py
unittest.skipIf(workspace.NumCudaDevices() < 2, "Need at least 2 GPUs.")
```
but forgot to add it back later.
```
_________________________________ DataParallelModelTest.test_equiv __________________________________
...
            if p2p_access_pattern is not None and not p2p_access_pattern[
>               devices[0], peer
            ]:
E           IndexError: index 1 is out of bounds for axis 1 with size 1
...
WARNING:data_parallel_model:** Only 1 GPUs available, GPUs [0, 1] requested
```

/cc akyrola
Closes https://github.com/caffe2/caffe2/pull/888

Reviewed By: akyrola

Differential Revision: D5341310

Pulled By: harouwu

fbshipit-source-id: 8d7f06913c7b5a42009a4033dbb6a48a8e812822
2017-07-05 14:32:12 -07:00
Yiming Wu
60e4607106 brew API in convnet benchmark
Summary: upgrade convnet_benchmarks to brew api

Reviewed By: salexspb

Differential Revision: D5341829

fbshipit-source-id: f34c6dd4aae5f0c8db51e7600eb1f0e1cdc72ea3
2017-07-05 10:34:48 -07:00
Jacqueline Xu
25bd5dda27 Implementing random fourier features layer
Summary:
- Created the random fourier features layer
- Generated a unit test to test the random fourier features layer is built correctly
- Inspired by the paper [[ https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf |   Random Features for Large-Scale Kernel Machines]]

Reviewed By: chocjy

Differential Revision: D5318105

fbshipit-source-id: c3885cb5ad1358853d4fc13c780fec3141609176
2017-07-04 23:48:42 -07:00
Jiyan Yang
00e5afea6a Adding dedup aggregator options to sgd optimizer
Summary: As desc.

Reviewed By: xianjiec

Differential Revision: D5324671

fbshipit-source-id: 27f3a58f618cd5ea11c2ea2e756df3f73635c2c8
2017-07-04 02:10:18 -07:00
Marat Dukhan
2ac9ff5c96 Cos, Sin, and Abs operators
Summary: add Cos, Sin, and Abs operators

Reviewed By: akyrola

Differential Revision: D5307632

fbshipit-source-id: 743c9d289e4d3fd439e4b5385841cdff87d9247a
2017-07-03 22:18:32 -07:00
Simon Layton
090506ac87 Add NCCLBroadcast to correct net
Summary:
Otherwise was always added to main net instead of param_init_net when
desired (i.e. initial param sync)
Closes https://github.com/caffe2/caffe2/pull/894

Differential Revision: D5367451

Pulled By: akyrola

fbshipit-source-id: 3d82be6da687c736bd15f4852dbd272266eb4811
2017-07-03 16:54:44 -07:00
Dmytro Dzhulgakov
b6c1c0ac4e Fix communication_schema decoding
Summary: Allows to override the input/output record as long as the field blobs are the same.

Reviewed By: yangyangyyy

Differential Revision: D5362132

fbshipit-source-id: 3ac2ac22802902b7eed5c226b00a7e1971ad264c
2017-07-02 13:04:20 -07:00
Dmytro Dzhulgakov
c0cebc3578 Added flags to lstm, convnet and sparse_nn_benchmarks to print out operators
Summary: pass flags directly to C2

Reviewed By: salexspb

Differential Revision: D5345869

fbshipit-source-id: 22b0e791526c7b0caf1e6a13dd29900df0db8fe8
2017-06-30 23:47:04 -07:00
Aapo Kyrola
ab0fe0a5f4 add debug information when there is blob version mismatch
Summary:
It is quite common question when users get some variant of "blob has version 2 but gradient expects version 1" in their backward pass. The error message is completely unhelpful.
To remedy this, I added proper debug information which tells user how the version number of a blob was incremented over time. i.e which ops caused the version to go op. This should help
understand the issue.

Reviewed By: dzhulgakov

Differential Revision: D5358227

fbshipit-source-id: bc09d048ac33200c35d56460e44e86c2f2888f3f
2017-06-30 16:22:46 -07:00
Tao Wu
5aa147f273 added PackRNNSequence and UnpackRNNSequence operators
Summary: Added two operators that can be used to tranfer data into the input format of RNN and back.

Reviewed By: kittipatv

Differential Revision: D5329886

fbshipit-source-id: 07eac29416427b08c49989d4eeed50a6f18493a1
2017-06-30 09:53:31 -07:00
Aapo Kyrola
8c74c36626 fix reducing device option
Summary: This was broken in a previous diff, fixing it to use model device type.

Reviewed By: asaadaldien

Differential Revision: D5356005

fbshipit-source-id: a4fcc932bae772076b57625a5fcc0d38eb702cc9
2017-06-30 09:19:57 -07:00
Thomas Dudziak
5355634dac Dict fixes/improvements and unittest targets for Python 3 in caffe2 core
Summary: As title

Reviewed By: salexspb

Differential Revision: D5316104

fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30
2017-06-29 17:05:41 -07:00
Alexander Sidorov
a6dee1da32 Make args.fixed_shape in lstm_benchmark work in a library mode
Summary:
this works as a standalone python script because args are
global. When used from Flow for monitoring purposes it doesn't
work. This diff fixes it

Reviewed By: zem7

Differential Revision: D5349996

fbshipit-source-id: f73842901d975b783e09e9db0565eb81880bbea1
2017-06-29 14:55:26 -07:00
Aapo Kyrola
dd6e170b8d fix LSTM benchmark reporting
Summary:
A couple of fixes to fix broken rerporting of lstm_benchmark:
- last_time must be recorded after warm up
- entry count was incorectly removed

Reviewed By: salexspb

Differential Revision: D5349890

fbshipit-source-id: 5dd5bdf46594c520b61bc3b57b153f90a6a17903
2017-06-29 13:53:17 -07:00
Andrew Tulloch
6c67a753c7 Fix test_pair_wise_loss_predictions
Summary: Increase absolute error tolerance.

Reviewed By: tomdz

Differential Revision: D5349604

fbshipit-source-id: 8e04001b0b6a6e83083f341e265ab3c0d2b06918
2017-06-29 12:48:04 -07:00
Andrew Tulloch
912ee4e40a Fix test_sparse_to_dense precision failures
Summary: ..

Reviewed By: tomdz

Differential Revision: D5349561

fbshipit-source-id: 4c510905515eb03a64abc36f33d59a1d998c2ab1
2017-06-29 12:48:03 -07:00
Andrew Tulloch
83765906c6 Add min_satisfying_examples
Summary:
Eliminates failures from overloaded machines from only
running a few examples before being timed out.

Reviewed By: tomdz

Differential Revision: D5349555

fbshipit-source-id: 89d1db063f58c72656b37157225a586c9e3f24bc
2017-06-29 12:48:01 -07:00