pytorch/caffe2/python/operator_test at 58f7f2b4410338edc2c54a0b99ccef045d7e580f - pytorch - Carlos Sousa's Git

OSSForks/pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2026-01-15 12:15:51 +00:00

Files

History

Aapo Kyrola 1ed746df45 BatchMatMulOp: use cuBLAS batched strided gemm for CUDA

Summary:
Instead of doing gemms in a for-loop (which is not parallelized), it is much better to do the batched matmuls using CUDA 8's new batched-striped version of gemm.

With the MT team's test, we get 5-10% improvement in overall walltime, so it is significant improvement:

----

Without batched gemm:

I0328 10:46:48.118605 58068 prof_dag_net.cc:136]    424.757 ms/iter (   283.878 ms/iter) RecurrentNetwork
I0328 10:46:48.118609 58068 prof_dag_net.cc:136]    352.603 ms/iter (    265.85 ms/iter) RecurrentNetworkGradient

With batched gemm:
I0328 10:53:48.169996 85617 prof_dag_net.cc:136]    407.438 ms/iter (   269.564 ms/iter) RecurrentNetwork
I0328 10:53:48.169999 85617 prof_dag_net.cc:136]    322.393 ms/iter (   287.625 ms/iter) RecurrentNetworkGradient

Reviewed By: jamesr66a

Differential Revision: D4788272

fbshipit-source-id: 210e8b94c1e036b6ef0f039ce000d455258651f4

2017-03-28 11:54:09 -07:00

..

activation_ops_test.py

Caffe2: CUDA implementation for LeakyReluOp

2017-03-28 08:48:25 -07:00

atomic_ops_test.py

…

checkpoint_test.py

…

conv_test.py

Conv-ND NCHW CUP/CUDA implementation

2017-03-20 14:01:07 -07:00

conv_transpose_test.py

…

copy_ops_test.py

Reset workspace after each test in copy_ops_test

2017-03-24 12:20:34 -07:00

cosine_embedding_criterion_op_test.py

…

counter_ops_test.py

AtomicCounter to return previous value on Reset.

2017-02-02 14:59:30 -08:00

crf_test.py

CRF layer in caffe2

2017-03-23 22:02:02 -07:00

cross_entropy_ops_test.py

delete redundant comment lines.

2017-02-24 11:04:36 -08:00

dataset_ops_test.py

NextScopedBlob with well-defined behavior and respect namescope

2017-02-16 17:16:36 -08:00

duplicate_operands_test.py

…

elementwise_op_broadcast_test.py

…

elementwise_ops_test.py

Sqr op and gradient

2017-03-07 03:03:07 -08:00

emptysample_ops_test.py

…

extend_tensor_op_test.py

…

fc_operator_test.py

Test for FC operator + fix for docs

2017-01-27 10:44:24 -08:00

filler_ops_test.py

add exception for empty shape param

2017-03-10 00:33:59 -08:00

gather_ops_test.py

…

gather_ranges_op_test.py

…

given_tensor_fill_op_test.py

support fill bool tensors in GivenTensorFill

2017-03-02 20:18:59 -08:00

group_conv_test.py

…

hsm_test.py

…

index_ops_test.py

Change the schema of IndexLoad & IndexFreeze so that state change is captured by the framework

2017-02-14 10:05:12 -08:00

instance_norm_test.py

instance norm test fix

2017-02-25 14:31:42 -08:00

margin_ranking_criterion_op_test.py

…

matmul_op_test.py

BatchMatMulOp: use cuBLAS batched strided gemm for CUDA

2017-03-28 11:54:09 -07:00

mkl_conv_op_test.py

MKL convolution operator

2017-01-23 09:59:30 -08:00

mkl_packed_fc_op_test.py

MKL convolution operator

2017-01-23 09:59:30 -08:00

mkl_speed_test.py

MKL convolution operator

2017-01-23 09:59:30 -08:00

momentum_sgd_test.py

SparseMomentumSGDUpdateOp

2017-03-28 07:47:46 -07:00

mpi_test.py

…

one_hot_ops_test.py

…

pack_ops_test.py

Registering GPU version of PackSegments using GPUFallbackOp

2017-03-24 16:01:53 -07:00

partition_ops_test.py

…

piecewise_linear_transform_test.py

PiecewiseLinearTransformOp transform binary predictions specially

2017-02-15 16:00:44 -08:00

pooling_test.py

…

pow_op_test.py

CUDA version of elementwise power + rename to Pow + gradient

2017-03-07 10:20:40 -08:00

python_op_test.py

…

rank_loss_operator_test.py

…

record_queue_test.py

…

recurrent_network_test.py

RNN: avoid copy for gradients of inputs to the rnn cell and save more memory!

2017-03-28 10:02:25 -07:00

reduce_ops_test.py

ReduceBack{Sum|Mean}Op CPU & GPU implementation

2017-03-13 16:19:58 -07:00

relu_op_test.py

…

reshape_ops_test.py

Allow test discovery in caffe2/python/

2017-03-14 18:16:41 -07:00

resize_op_test.py

Add ResizeNearest operator

2017-03-16 18:49:01 -07:00

segment_ops_test.py

Allow test discovery in caffe2/python/

2017-03-14 18:16:41 -07:00

sequence_ops_test.py

add gpu support for caffe2-seq2seq

2017-03-17 05:19:14 -07:00

shape_inference_test.py

Bugfix: type not being set when inferring types+shapes

2017-03-15 18:48:40 -07:00

softmax_ops_test.py

add soft label functionality to softmax with loss op

2017-02-10 09:01:53 -08:00

sparse_gradient_checker_test.py

…

sparse_ops_test.py

…

spatial_bn_op_test.py

…

square_root_divide_op_test.py

…

stats_ops_test.py

Performance counters

2017-02-21 16:31:24 -08:00

string_ops_test.py

…

text_file_reader_test.py

…

tile_op_test.py

Caffe2: Tile operator

2017-02-28 23:17:26 -08:00

top_k_test.py

Implement TopK op in caffe2

2017-03-16 17:32:20 -07:00

unique_uniform_fill_op_test.py

UniqueUniformFillOp

2017-02-15 16:00:44 -08:00

utility_ops_test.py

Add gradient operator for SumElements

2017-03-07 20:03:07 -08:00