Proto commits in apache/singa

These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)

Commit:0b1d73c
Author:zhaojing

Update the proto folder for the hfl example

The documentation is generated from this commit.

Commit:3d68345
Author:Zrealshadow

Add the interface proto file for the bank HFL example

Commit:52a4a45
Author:jixin
Committer:jixin

SINGA-434 Support tensor broadcasting Support broadcasting following: https://github.com/onnx/onnx/blob/master/docs/Broadcasting.md cuda supports unidirection broadcasting due to cudnn https://docs.nvidia.com/deeplearning/sdk/cudnn-archived/cudnn_750/cudnn-developer-guide/index.html#cudnnOpTensor; cpp supports multidirectional broadcasting. Passed all tests on cudnn 7.4.2

Commit:a88efa0
Author:Vaan Ng

Singa-341 Added stride functionality to tensors for CPP

Commit:2d1dd42
Author:wangwei
Committer:wangwei

Fix a typo for cudnn_prefer and change the default workspace limit size to 1024MB

Commit:e527db9
Author:wangwei

Enable configuration of cudnn workspace size limit in PySINGA

Commit:b2e21c6
Author:wang wei
Committer:Wei Wang

preparing for v1.1-rc1: remove outdated files; fix links

Commit:32b4368
Author:Xiangrui

SINGA-253 Net converter for caffe model Modification of pooling layer add a 'ceil' field in proto to be compatible with caffe train alexnet from caffe, achieving 82% accuracy.

Commit:77405ec
Author:XiangruiCAI
Committer:Xiangrui

SINGA-253 Net converter for caffe model Convert caffe model into singa model. It is a very basic implementation. Now it can convert feed forward net and supports pysinga. Implementation method: 1. read proto file of caffe model using caffe.proto, and serialize it to string. 2. parse the string use singa model.proto and get the layer config. 3. setup each layer and add it to a feed forward net. Update and example: 1. put conversion funcions into layer.py and converter.py; 2. add 'caffe' option in cifar example which converts alexnet from caffe to singa.

Commit:c967169
Author:Wei Wang
Committer:Wei Wang

SINGA-259 Add maven pom file for building java classes This ticket would refactor some API in *.i files and create the pom.xml to use maven to compile the java classes. Java unittest is also enabled with a simple example. To test this ticket, 1. compile singa with java (cuda and cudnn are optional) in `build/` cmake -DUSE_JAVA=ON -DUSE_PYTHON=OFF .. make 2. goto build/java and run the unit tests export LD_LIBRARY_PATH=<path to build>/java/lib:$LD_LIBRARY_PATH mvn test The folder structure for the java project is java/ pom.xml lib/libsinga_wrap.so src/ main/java/org/apache/singa/ swig/ proto/ test/org/apache/singa/ Fixed a bug in tensor.py::from_numpy() which handled int array incorrectly.

Commit:7c12f40
Author:WANG Ji

SINGA-252 Use the snapshot methods to dump and load models for pysinga Turn on _packed_ option of tenser protobuf message to make them more compact in size. Now, the vgg-16 model reported in SINGA-248 is 528 MB.

Commit:32ba40d
Author:aaronwwf

SINGA-244 Separating swig interface and python binding files - add java binding cmake files - todo: add test code

Commit:a54c889
Author:Wei Wang

SINGA-227 Add Split and Merge Layer and add ResNet Implementation Update the resnet implementation by adding Merge and Split layers in layer.py, and enable net.py to process merge/split layers. Update the transpose setting in Dense.cc TODO(wangwei) update test_dense.cc

Commit:7ebea53
Author:jixin
Committer:Wei Wang

SINGA-227 Add Split and Merge Layer and add ResNet Implementation Add python resnet implementation and add Split and Merge Layer. Discard split and merge layer in resnet implementation. Add add_split and add_merge function where multiple inputs or outputs involved when creating net. change comments in resnet.py

Commit:dfc422e
Author:Wei Wang
Committer:Wei Wang

SINGA-218 Implementation for RNN CUDNN version Add an example using the char-rnn model. The trained model (with 2 stacks of lstm) over linux kernel source code could generate source code with some meaning full patterns, e.g., indention, comments, variable definition, assignments.

Commit:c51f944
Author:zhaojing
Committer:Wei Wang

SINGA-218 Implementation for RNN CUDNN version - cudnn rnn implementation (cudnn_rnn,h, cudnn_rnn.cc, rnn.cc, rnn.h, test_cudnn_rnn.cc). - The weight shape now are manually calculated instead of using API provided by CUDNN. - Test for RNN_cudnn_Tanh (unidirectional, 1 hidden layer).

Commit:8e0b108
Author:Wei Wang
Committer:Wei Wang

SINGA-218 Implementation for RNN CUDNN version Finish the CudnnRNN layer. Pass test for tanh rnn. RNN forward accepts a vector of input tensors: <x0, x1, ... x(n-1), hx, cx> x(i) is the i-th input tensor, hx is the init hidden tensor which could be a dummy tensor. A dummy tensor is a tensor created without shape/device/data_type, during compuation, cudnnRNN would use 0s for this tensor. cx is not necessary for relu/tanh/gru rnn. For lstm, it could also be a dummy tensor like hx. The output is: <y0, y1, ... y(n-1), hy, cy>. relu/tanh/gru rnns does not have cy. lstm have both hy and cy. RNN backward accepts a vector of input gradient tensors: <dy0, dy1, ... dy(n-1), dhy, dcy>. dhy is necessry for all rnns, but could be a dummy tensor, in which case a tensor with 0s would be used for dhy during computation. dcy is used only for lstm, which could also be a dummy tensor. The output is: <dw, <dx0, dx1, ... dx(n-1), dhx, dcx>>, where dhx is a tensor for the gradient of hx. dcx is only used for lstm. The CudnnRNN must be moved onto cuda, otherwise memory error would happen (the weight is on cpu).

Commit:7e050c1
Author:jixin
Committer:jixin

SINGA-215 Implement Image Transformation for Image Pre-processing Implement some of the image transformations (i.e., crop, resize, horizontal_mirror) for pre-processing stage. Image resize need to use opencv library. Currently image transformation only support one sample input each time. Add static function resize, crop and mirror for image transformation. Training phase use random crop and random mirror, Evaluation phase use central crop and no mirror. Cannot test phases for mirror because of a random factor. Fix bugs in test files involving memory allocation and access.

Commit:7444f0a
Author:jixin
Committer:jixin

SINGA-213 Implement Encoder and Decoder for CSV TextEncoder encodes a Tensor into a 1D string where data are splited by comma. TextDecoder decodes a string of data into Tensors (data and [optional]label).

Commit:dc013f3
Author:Wei Wang
Committer:Wei Wang

SINGA-194 Add a Platform singleton Add the Platform class whose methods are all static. It includes methods for query hardware GPUs, e.g., num of gpus, memory of each gpu. It also creats a vector of singa::Device, e.g., CudaGPU. NOTE: If multiple CudaGPU devices are created, and they all use CnMemPool, then they must share the same CnMemPool instance (otherwise there would be problems in desctructing the CnMemPool). It is preferred to use Platform to create CudaGPU devices which would avoid the above problem. Updated the APIs for Device and DeviceMemPool. The GetAllocateMem is not accurate (which is larger than actually requested memory size). NOTE: The swig would report errors if USE_CUDA is OFF as the CudaGPU is not avaiable. TODO(wangwei) resolve this problem in later PR for python setup.

Commit:62c6603
Author:WANG Ji
Committer:WANG Ji

SINGA-210 Enable checkpoint and resume for v1.0 This ticket is going to add code for dumping the model parameters as checkpoint files, which could be used for fine-tuning and deployment. Serialize Tensor into TensorProto and save it in BinFile, which is stored as <prefix>.model, and generate description about parameters in <prefix>.desc. Unit test cases passed for kFloat, kInt and kDouble data type.

Commit:97648a6
Author:Wei Wang
Committer:Wei Wang

SINGA-204 Support the training of feed-forward neural nets Implement Alexnet model for Cifar10 https://code.google.com/p/cuda-convnet/ The accuracy of the test data is 0.82 (the same as reported by in the above link). NOTE: 1. do not convert from uint_8 data to float directly. Instead, cast the data to uint8_t and then to float. 2. it is necessary to substract the max value for softmax. Numeric error (nan) is likely to happen otherwise.

Commit:71eb059
Author:Wei Wang
Committer:Wei Wang

SINGA-204 Support the training of feed-forward neural nets Implement Alexnet model for Cifar10 https://code.google.com/p/cuda-convnet/ But the test accuracy is low 0.72 (which should be 0.82).

Commit:dd08f41
Author:Wei Wang
Committer:Wei Wang

Merge PR #165 for CnMeM Fixbugs from device type (Device* -> std::shared_ptr<Device>).

Commit:14d31a4
Author:Wei Wang

SINGA-200 - Implement Encoder and Decoder for data pre-processing Add src/proto/io.proto, which was missed in the previous commit.

Commit:833f461
Author:Wei Wang

SINGA-200 - Implement Encoder and Decoder for data pre-processing Improve the JPG2ProtoDecoder and Proto2JPGEncoder to consider the image dimension order "HWC" or "CHW". Change the input image Tensor's data type to kUChar, i.e., unsigned char. Change Tensor::data() API, to cont SType* data<DType>() const; TODO add other encoding functions to accept other inputs, e.g., cv::Mat.

Commit:13d60b0
Author:jixin
Committer:jixin

SINGA-200 - Implement Encoder and Decoder for data pre-processing Add base class Encoder and Decoder. Add inheritance class Image2JPG_Encoder and Image2JPG_Decoder, which uses opencv to encode and decode image source files. Add test which contains partial pre-processing procedures (read_raw_image, encode, decode) for images. Reader and Writer, which uses user specified offline store mechanisms (i.e., kvfile, lmdb, ...), are not implemented in this ticket. Use random value initialized image for test. Add derived encoder/decoder class definition in its base class head file (decoder.h/encoder.h).

Commit:bef1db0
Author:jixin
Committer:jixin

SINGA-200 - Implement Encoder and Decoder for data pre-processing Add CMake scripts for singa io libs.

Commit:077d13e
Author:liyuchenmike@gmail.com

SINGA-175 Add memory management APIs and implement a subclass using CNMeM Add base memory pool class. Implement two subclasses, CnMemPool and CudaMemPool. Add test for the memory pools. TODO replace Device* to std::shared_ptr<Device> to avoid memory error because the order of destructing device and tensor are dynamic (device may be freed before tensors)

Commit:b91002b
Author:jixin
Committer:Wei Wang

SINGA-180 Add Activation layer and Softmax layer Fix a bug in cudnn softmax and let softmax support 1D or 2D tensor as input.

Commit:74f0214
Author:Wei Wang
Committer:Wei Wang

SINGA-198 - Change Layer::Setup API to include input Tensor shapes Update the setup function from ``` void Setup(const LayerConf& conf); ``` to ``` void Setup(const Shape& in_sample_shape, const LayerConf& conf); // for single input void Setup(const vector<Shape>& in_sample_shapes, const LayerConf& conf); // for multiple outputs ``` functions for getting output sample shape are added ``` const Shape GetOutputSampleShape() const; // used for single output const Shape GetOutputSampleShape(int k) const; // used for multiple outputs ``` pass tests.

Commit:0cd9663
Author:WANG Ji
Committer:Wei Wang

SINGA-192 Implement optimization algorithms for v1 Change interface of Apply() from Optimizer class, now Apply() method will receive a ``const& tensor grad'' argument rather than ``tensor* grad'', that is to say, this method will not cause side effect on grad tensor.

Commit:58be3f8
Author:Wei Wang

SINGA-190 - Add prelu layer and flatten layer Format code. Fix warning info from compilation.

Commit:5784bff
Author:Wei Wang
Committer:Wei Wang

SINGA-192 Implement optimization algorithms for v1 Merge branch PR#164 into dev Fix the bugs in test adagrad and rmsprop. Note, expect near (with diff 1e-5) is used to avoid numeric bugs. Need to do test on more machines.

Commit:5afd81b
Author:jixin
Committer:jixin

SINGA-190 - Add prelu layer and flatten layer Implement prelu layer and flatten layer for cpu version. Write gtest for prelu and flatten layer. Pass all tests.

Commit:178db01
Author:WANG Ji
Committer:WANG Ji

SINGA-192 Implement optimization algorithms for v1 implement optimization algorithms for Singa v1 including nesterov, adagrad, rmsprop. Add unit test cases for these algorithms. However, only nesterov passed the test case, adagrad and rmsprop need Sqrt() operation for tensor which has not been implemented yet.

Commit:eadd3f9
Author:WANG Ji
Committer:Wei Wang

SINGA-174 Add Batch Normalization layer and Local Response Normalization layer. Implemented Batch Normalization Layer and Local Response Normalization Layer using CuDNN. Passed test.

Commit:64ea206
Author:Wei Wang

SINGA-188 Add Dense layer Minor change to format code and update IDs of DenseConf fields.

Commit:73d4a34
Author:zhaojing
Committer:zhaojing

SINGA-188 Add Dense layer Add implementation for dense layer Add test files for dense layer, both cpp version and cuda version Pass all tests

Commit:2dac380
Author:Wei Wang
Committer:Wei Wang

SINGA0-183 Add the base classes for optimizer, constraint and regularizer Draft base optimizer, constraint and regularizer classes. The API for local all reduce is also added (in comments). Test sgd with/without momentum using cpp and cuda.

Commit:7d149ec
Author:Wei Wang
Committer:Wei Wang

SINGA-178 Add Convolution layer and Pooling layer Minor update on variable names and InitCudnn arguments. Fix compiling warnings about signed and unsigned number comparison. Format code. Pass all tests.

Commit:152056d
Author:XiangruiCAI
Committer:XiangruiCAI

SINGA-178 Add Convolution layer and Pooling layer Add CudnnConvolution layer and CudnnPooling layer. Add tests for these two layers. Passed tests and cpplint.py. (Add on May 30th) Process the parameters and specs in this way, 1. Each layer is responsible for pushing its own parameters into the param_value vector, and pushing its own param specs into the param_specs vector. 2. Each layer is responsible for deleting the tensors it created including params and buffer tensors.

Commit:3171459
Author:Wei Wang
Committer:Wei Wang

SINGA-176 - Add loss and metric base classes Rename layer.proto to model.proto, which includes proto messages for classes under model/

Commit:d680079
Author:wangwei
Committer:Wei Wang

SINGA-176 - Add loss and metric base classes Add loss and metric base classes, and implement the MSE as a sub-class of Loss and the Accuracy as a subclass of Metric. Add math functions to support the metric/loss classes. Draft test files for MSE and Accuracy.

Commit:9d1bcb4
Author:Wei Wang
Committer:Wei Wang

SINGA-171 - Create CppDevice and CudaDevice Rename Device subclasses based on the programming language and hardware, e.g., CppCPU indicates the device is a CPU which runs cpp code, CudaGPU indicates the device is a NvidiaGPU which runs cuda code, and CudaCPU indicates the device is a CPU which uses cuda to malloc and free pinned memory for the CudaGPU. Corrspondingly, we rename the lib namepace to lang. and Device type() to lang().

Commit:282712c
Author:Wei Wang
Committer:Wei Wang

SINGA-171 - Create CppDevice and CudaDevice Add CppDevice and CudaDevice API. Implement CppDevice and add test for it. There is link error for cudnn.

Commit:99e0d24
Author:Wei Wang
Committer:wangwei

SINGA-170 Add Dropout layer and CudnnDropout layer pass compilation. there is link error for cudnn.

Commit:02851fa
Author:Wei Wang
Committer:wangwei

SINGA-167 - Add Tensor Math function APIs Add basic linalg functions for Tensor Add blas functions for Tensor. Unify gemm and gemv in Tensor::Mult this commit also contains code for Param class, which woud be removed in the next commit.

Commit:e36bc92
Author:Wei Wang
Committer:wangwei

SINGA-169 Add base Layer class for V1.0 Adapt caffe.proto to create layer.proto. As long as we support the similar set of layers as other systems, e.g., caffe/torch, we would have the similar hyper-parameters. If we define our own layer.proto, we only create new names for the same hyper-parameters. Considering caffe has implemented a wide range of layers, it would be conveient to adapt based caffe.proto. More over, we can then run caffe's model configurations or pre-trained model in SINGA with little changes. We change the 'parameter' to 'conf' to distinguish parameter and hyper-parameter (use configuration). Some fields are not used, but their tags are reserved. We will later write a converter program/script to run caffe models. The base Layer class API is drafted.

Commit:dc5aa6e
Author:Wei Wang
Committer:wangwei

SINGA-164 - Add the base Tensor class Implement Tensor member functions. Test the constructors and member classes. A simple Device implementation is provided to pass the compilation. TODO implement Tensor math functions.

Commit:dd1e4af
Author:Wei Wang
Committer:Wei Wang

SINGA-163 - Reorganize the project folder layout Remove files in V0.x that will not be used or need major updates. Reorganize the layout to have the following modules in src core, utils, proto, layer, model, python. The test module is separated into the root folder.

Commit:fde87ed
Author:Wei Wang
Committer:Wei Wang

SINGA-159 Rewrite safe_queue.h to avoid the license issue Rewrite the safe_queue.h. Add a PriorityQueue which is a thread safe priority queue. It is implemented using the base SafeQueue. Fixed a bug from the protobuf extension field ID in rnnlm example. Remove "Copyright 2015 The Apache Software Foundation" from Apache headers in all files.

Commit:991c6ab
Author:Wei Wang

SINGA-130 Data Prefetching Fix a bug due to thread_ not joined and deleted clearly. Always allocate the buf_key and buf_val as they are used no matter prefetching is enabled or not.

Commit:a0bdd0b
Author:Anh Dinh
Committer:Anh Dinh

SINGA-130 Data prefetching layer Extended StoreInputLayer to support prefetching of data. It maintains a buffer for (key,value) pairs read from the storage layer. In Setup(), it launches a new thread for reading data into the buffer. This thread stores data into the buffer. The ComputeFeature() method waits for thread to finish (join) before parsing it into data_ and aux_ field. Finally, it launches another thread. In terms of memory consumption, this prefetching use extra (batchsize*recordsize) bytes for the buffer. However, we observe no visible runtime improvement, as I/O time is very small (in order of milliseconds without prefetching, and tens of microsecond with prefetching) compared to CPU time.

Commit:efdf6d7
Author:Wei Wang

SINGA-136 Support cuDNN v4 Merge for SINGA-136. Conflicts: Makefile.am src/proto/job.proto

Commit:a91e82f
Author:Wei Wang
Committer:Wei Wang

SINGA-131 Implement and optimize hybrid training using both CPU and GPU Allow users set the batchsize of instances of StoreInputLayer manually. We can then assign different workload for GPU and CPU workers who have different input layers, e.g., ``` batchsize: 128 batchsize: 16 ``` If the first worker is GPU and the second worker is CPU, then the above setting would assign 128 images per mini-batch to the GPU worker and assign 16 images to the CPU worker. Internally, StoreInputLayer gets its batchsize based on its partition ID. Currently, it works for MNIST example which has no Conv layers. For Conv layers, since the GPU and CPU implementations have different layer names, we cannot using the single net config.

Commit:4b4ad05
Author:seaok

SINGA-136 Support cuDNN v4 Add cuDNN BatchNorm Layer

Commit:1c8e0dc
Author:Wei Wang

SINGA-126 Python Binding for Interactive Training 1. Replace 'x != None' with 'x is not None' 2. Fixed the bug from type mismatch: debug should be set to False before passing it to SINGA's loss layer. 3. Set default value for SingaProto's zookeeper endpoint. Then we can ignore `-singa_conf xxx'. SINGA would assume that the zookeeper endpoint is the default one ('localhost:2181'), and glog would use its default logging dir.

Commit:8130b7e
Author:chonho
Committer:chonho

SINGA-126 Python Binding for Interactive Training - add 2 example python codes for interactive training . train_mnist.py . train_cifar10.py - add methods/class in singa/layer.py . ComputeFeature, ComputeGradient, Feed, Setup . GetParams, SetParams, GetData . Dummy() - add methods in singa/model.py . save_model_parameter . load_model_parameter - add Feed fucntion in src/neuralnet/neuron_layer/dummy.cc . correspond to class Dummy() in layer.py note: DummyInputLayer and kDummyInput are removed - add functions in src/worker.cc . Checkpoint . InitNetParams - add CreateXXX functions to set up singa::XXX from string proto . XXX are Layer, Updater, Worker - update tool/python/singa/driver.i for wrapper - include cifar10_mean_image in examples/datasets/

Commit:e32e70c
Author:ijingo
Committer:jinyangturbo

SINGA-145 New SGD based optimization Updaters: AdaDelta, Adam, AdamMax New Updaters: AdaDelta, Adam, AdamMax. To implement AdamMax, add two new operators for Tensor in cxxnet_op.h i.e. op::abs and op::max.

Commit:c72ef0f
Author:Wei Wang
Committer:Wei Wang

SINGA-120 - Implemented GRU and BPTT Add configuration fields (vocab_size) for OneHotLayer. Configure gpu to 0 for all examples.

Commit:959ef70
Author:Wei Wang
Committer:Wei Wang

SINGA-120 - Implemented GRU and BPTT Add input layers for char rnn example. Fix the bug from worker.cc for flag setting in computegradient Run with GPU; Loss decreases slowly to 3 per unit; Todo add RNNDummyLayer and train with RMSProp

Commit:6a4c996
Author:Wei Wang
Committer:Wei Wang

SINGA-120 - Implemented GRU and BPTT Change new memory computation formula following char-rnn (i.e., element-wise multiplication before matrix multiplication)

Commit:473c985
Author:Ju Fan
Committer:Wei Wang

SINGA-120 - Implemented GRU and BPTT: 1) Updated Driver.cc to register GRU; 2) Updated job.proto to include configuration of GRU; 3) Updated configure.ac to fix some compliation errors

Commit:bb75a0b
Author:Wei Wang
Committer:Wei Wang

SINGA-98 Add Support for AlexNet ImageNet Classification Model Update the CudnnActivationLayer to share the data and grad blob with conv layer for memory space reduction. It is controlled by the share_src_blobs field in the job config file. The loss reduces after 3000 iterations using 256 mini-batch like Caffe. cpplint check; updte job conf for cpu training;

Commit:6e815db
Author:Wei Wang
Committer:Wei Wang

SINGA-98 Add Support for AlexNet ImageNet Classification Model Update job.conf for alexnet: learning rate, layer order, lr_scale/wd_scale; add cudnn.conf. Fix a bug in image_preprocess.cc which sets the dst pointer incorrectly. It leads to the observation that the loss and accuracy does not improve after a few iterations; (the loss is about 6.90x, tested for about 10k iterations); Cafffe's performance starts improving after 3000 iterations (is around 6.90x during 200-3500 iterations). After fixing the bug, training using mini-batch 128 works, but the loss starts reducing after around 10k steps.

Commit:7d43e27
Author:chonho
Committer:chonho

SINGA-81 Add Python Helper, which enables users to construct a model (JobProto) and run Singa in Python - Add wrapper API "Driver::Train(const std::string Job_conf)" for python. - Add wrapper API "Driver::Test(const std::string Job_conf)" for python. - Python codes (1) construct a model (JobProto), and (2) run singa - Users are supposed to generate 'usermodel.py' . examples are provided, e.g., cifar10_cnn.py, mnist_mlp.py, mnist_rbm.py . 'cluster.conf' is required to maintain cluster information - Users are supposed to run it as follows. e.g., {code} cd SINGA_ROOT bin/singa-run.sh -conf tool/python/examples/cluster.conf -exe tool/python/examples/mnist_mlp.py {code} - Note: in job.proto, 'required' rule of the following fields should be changed to 'optional' . JobProto: name, neuralnet, train_one_batch, updater, train_steps . ClusterProto: workspace . workspace field can be set in either (i) cluster.conf or (ii) python code - __init__.py is required in the following directories . singa . singa/utils . singa/datasets . examples - Add StoreResult() that takes care of training results . in SingaRun() called by fit() or evaluate() . read logfile . store accuracy, loss, ppl, se, etc. in dictionary format - Parameter initialization . Parameter class is internally used . Weight follows gaussian distribution at default . Bias follows constant at default . As an option, users can explicitly specify parameter (e.g., *_parameter.py) - Removed dataset/ae.py and dataset/rbm.py . RBM and Autoencoder examples use Mnist dataset

Commit:1bc5007
Author:Wei Wang

SINGA-118 Make protobuf LayerType field id easy to assign Add comments for layer specific configurations.

Commit:b8ac2b8
Author:WANG Sheng

SINGA-118 Make protobuf LayerType field id easy to assign clean ids in job.conf: - Input Layer: 100 - 199 - Neuron Layer: 200 - 299 - Loss Layer: 300 - 399 - Output Layer: 400 - 499 - Connection Layer: 500 - 599

Commit:8af565c
Author:WANG Sheng
Committer:Wei Wang

SINGA-113 Model/Hybrid Partition Support minor change after rebase to master branch * pass compilation * change Blob A.CopyFrom(B) to Copy(A, B) * make LayerProto.partition_dim stand for own partition dimension only * change InnerProduct layer connection to OneToAll

Commit:2d38cb3
Author:WANG Sheng
Committer:Wei Wang

SINGA-113 Model/Hybrid Partition Support NeuralNet how will automatically add connection layers for distributed training. The partition is transparent to users. User just need to configure partition_dim field in NetProto: partition_dim = 0 : data partition partition_dim = 1 : model partition User can also overwrite partition_dim for a specific layer in LayerProto. This will result in hybrid partition.

Commit:f8be9af
Author:Wei Wang
Committer:Wei Wang

SINGA-100 Implement layers using CUDNN for GPU training 1. Fix bugs from rebase onto latest master. The LOG(FATAL) inside mutable_grad and mutable_data for input/output/loss layers are commmented out. Because other layers may call check mutable_xxx() == nullptr. 2. Update Makefile.am; run make test;

Commit:8174760
Author:Wei Wang
Committer:Wei Wang

SINGA-100 Implement layers using CUDNN for GPU training test with multi-gpus on cifar10; setting batchszie=500, it takes one iteration 3s on single gpu, and 2s on 2 gpus fix bug from cudnnsoftmax and cudnnsoftmaxloss; todo debug accuracy problem. the accuracy improves slower than that from caffe and cannot reach 0.8 finally.

Commit:8cacd83
Author:seaokcs
Committer:Wei Wang

SINGA-100 Implement layers using CUDNN for GPU training tmp commit; setup device in driver.cc, register cudnn layers; tmp commit; 1.Add kernel_sum_by_row() cuda kernel functions 2.Fixed neuron_layer.h bug: cudnn.h should be placed outside the namespace 3.Register CudnnSoftmaxLoss class (job.proto)

Commit:6e56334
Author:seaokcs
Committer:Wei Wang

SINGA-100 Implement layers using CUDNN for GPU training Pass cublas handle from math_blob to math_addr. Test configure-make for cpu code. Compile success for Makefile.gpu. Todo set up Context when creating worker threads.

Commit:af1bf50
Author:Wei Wang
Committer:Wei Wang

SINGA-100 Implement layers using CUDNN for GPU training * Add cudnn layers, including convoltuion, pooling, lrn, and activation (relu, sigmoid, tanh) * move declarations of layers from single file into <category>_layer.h under include/singa/neuralnet/ TODO compile the code with '#' will be ignored, and an empty message aborts the commit.

Commit:49293a6
Author:Wei Wang
Committer:Wei Wang

SINGA-100 Implement layers using CUDNN for GPU training Fix errors from compiling cpu code and cudnn code. tmp commit. Trying to run cudnn for gpu training and Mshadow for cpu training. 1. finish the CudnnSoftmaxLossLayer 2. finish the DropoutLayer 3. add ActivationLayer, which leads to link errors.

Commit:7cdb22f
Author:WANG Sheng
Committer:WANG Sheng

SINGA-111 Add slice, concate and split layers remove dedicated conf fields (slice_conf, concate_conf, split_conf) instead, use partition_dim, num_partitions in LayerProto for configuration

Commit:7e414e5
Author:WANG Sheng
Committer:WANG Sheng

SINGA-111 Add slice, concate and split layers Add split layer implementation and test case

Commit:28e48a6
Author:WANG Sheng
Committer:WANG Sheng

SINGA-111 Add slice, concate and split layers Add slice layer implementation and test case

Commit:4664b6b
Author:WANG Sheng
Committer:WANG Sheng

SINGA-106 Add dummy layer for test purpose Dummy layer can be used as input, neuron, output layer to construct a simple neuralnet when only focusing on testing a specific layer * Use as neuron layer (default): In compute feautre, it copy srclayer's data to its own data blob In compute feature, it copy its own grad blob to srclayer's grad * Use as input layer: It will create data and config blobs as configed. In compute feature, it randomly set values in data_ to [0,1] To use: - set dummy_conf.input as true - add dummy_conf.shape * Use as output layer: In compute feature, it randomly set values in grad_ to [0,1] To use: - set dummy_conf.output as true

Commit:e1a6f14
Author:jixin
Committer:jixin

SINGA-87 Replace exclude field to inlcude field for layer configuration (still support exclude field). Exclude field is deprecated, please use include field instead. Notice that include field only includes the specified layer. neuralnet.cc - add flag to indicate state, e.g., flag=0, no exclude and no include - add checks for exclude and include field, they cannot appear in the same .conf file job.proto - add include field definition in LayerProto

Commit:059ae9a
Author:jixin
Committer:jixin

SINGA-87 Replace exclude field to inlcude field for layer configuration (still support exclude field). Add include field definition in LayerProto.

Commit:5d076e5
Author:Wei Wang

SINGA-11 Start SINGA on Apache Mesos Merge branch 'feature-mesos'

Commit:f0e1629
Author:Anh Dinh
Committer:Anh Dinh

SINGA-11 Start SINGA on Apache Mesos Apache Mesos is a fine-grained cluster management framework which enables resource sharing in the same cluster. Mesos abstracts out the physical configurations of cluster nodes, and presents resources to the users in the form of "offers". SINGA uses Mesos for two purposes: 1. To acquire necessary resources for training the model. 2. To launch and monitor progress of the training task. To this end, we implement a "SINGA Scheduler" which interacts with Mesos master. The scheduler assumes that SINGA has been installed at the Mesos slave nodes. The scheduler is called when the user wants to start a new SINGA job, and it performs the following steps: Step 1. Read the job configuration file to determine necessary resources in terms of CPUs, memory and storage. Step 2. Wait for resource offers from the Mesos master. Step 3. Determine if the offers meet the requirement of resources. Step 4. Prepare the task to launch at each slave: + Deliver the job configuration file to the slave node. + Specify the command to run on the slave: "singa -conf ./job.conf" Step 5: Launch and monitor the progress For step 3, we currently implement a simple scheme: the number of CPUs offered by each Mesos slave exceed the total number of SINGA worker and SINGA server per process. In other words, each Mesos slave must be able to run the entire worker group or server group. For step 4, we currently relies on HDFS to deliver the configuration file to each slave. Particularly, we write the file to a known directory (different for each job) on HDFS and ask the slave to use its Fetcher utility to download the file before executing the task. The source code and README.md files are in the `tool/mesos` directory.

Commit:ef9a111
Author:Wei Wang
Committer:Wei Wang

SINGA-91 - Add SoftmaxLayer and ArgSortLayer SoftmaxLayer applies the Softmax function against its source layer to compute its probability distribution over all labels. ArgSortLayer sorts labels based on their scores (e.g., probability) in descending order. Configuration for ArgSortLayer is like argsort_conf{ topk: 1}. Topk results will be extracted. Connecting ArgSortLayer to a CSVOutputLayer, we can dump the topk labels of each instance into one line.

Commit:e928ebb
Author:Wei Wang
Committer:Wei Wang

SINGA-85 Add functions for extracting features and test new data * to test pre-trained model, 1. add checkpoint_path in job configuration to the checkpoint files 2. setup input layer properly to read test data 3. run singa with -test argument (e.g., ./bin/singa-run.sh -conf JOB_CONF -test * to extract features of new data, 1. add checkpoint_path in job configuration to the checkpoint files 2. setup input layer properly to read raw data 3. add outputl layers (e.g., RecordOutputLayer or CSVOutputLayer) after the layer whose data you want to extract. 4. run singa with -test argument (e.g., ./bin/singa-run.sh -conf JOB_CONF -test A test net (i.e., kTest phase is passed to creat the net) will be create to test performance and extract features. job.conf of cifar10 example contains instructions for extracting features and testing performance with checkpoint files. Rename ProtoRecordLayer and CSVRecordLayer to RecordInputLayer and CSVInputLayer respectively. Add RecordOutputLayer and CSVOutpuLayer for outputing data as RecordProto and CSV format.

Commit:5f010ca
Author:Wei Wang
Committer:wang sheng

SINGA-82 Refactor input layers using data store abstraction * Add StoreLayer to read data from Store, e.g., KVFile, TextFile (will add support for HDFS later). * Implemente subclasses of StoreLayer to parse different format tuples, e.g., SingleLabelImageRecord or CSV line. * Update examples to use the new input layers. * Add unit tests. * Add a function for Layer class, which returns a vector<AuxType> for auxiliary data (e.g., label). TODO 1. make AuxType a template argument of Layer class, and extend data() to return a vector of Blob for multiple dense features. 2. separate layer classeses into different files to make the structure of the source folder clear.

Commit:321ef96
Author:Wei Wang
Committer:Wei Wang

SINGA-70 Refactor API of Layer, Worker, Server and Driver For Layer class * Setup, ComputeFeature and ComputeGradient are updated to accept one argument which represents all source layers. * DebugString() is changed to ToString() for displaying debug info and other info. For example, the performance values can be aggregated in ComputeFeature function and then converted into string in ToString(). * the srclayer and dstlayer fields are removed. For Worker class * Report function is removed. The performance is now collected via Layer::ToString() and is reported by each worker. The Trainer class is renamed to Stub * Only the Run() function and message handling/generation functions are remained. Functions for creating servers and workers are moved into Driver. The Driver class * Rename function Submit to Train. * Add functions for creating workers and servers. All files under trainer folder are moved outside to be under src/ or include/.

Commit:3dc1eee
Author:chonho
Committer:Wei Wang

SINGA-10 Add Support for Recurrent Neural Networks (RNN) Revise Makefile.example; Add job.conf; Update src/utils/tool.cc to properly parse job.conf;

Commit:ba3b1a5
Author:Wei Wang
Committer:Wei Wang

SINGA-10 Add Support for Recurrent Neural Networks (RNN) * Move the functions of WordLayer into the EmbeddingLayer. * Make DatLayer a subclass of both RNNLayer and DataLayer. * create_shard.cc wraps the WordRecord inside singa::Record, and inserts singa::Record into the DataShard. * Make the inheritance of base layer classes like InputLayer, NeuronLayer, etc. virutal from Layer to compilation problems if a future layer is declare to inherit two base layer. * Update the documentation on the website for RNNLM example. Need to optimize the training speed. The PPL is similar to that from RNNLM Toolkit.

Commit:c1c6a2e
Author:chonho
Committer:Wei Wang

SINGA-10 Add Support for Recurrent Neural Networks (RNN) Add README.md to show instructions of RNNLM example; Revise the following files to remove warnings by a code checker, cpplint.py; create_shard.cc, rnnlm.h, rnnlm.cc, rnnlm.proto Add license paragraph to the following files; create_shard.cc, rnnlm.h, rnnlm.cc, rnnlm.proto, Makefile.example

Commit:13b1c08
Author:Wei Wang
Committer:Wei Wang

SINGA-10 Add Support for Recurrent Neural Networks (RNN) Draft upper layers for rnnlm; Compile using Makefile.example;

Commit:ad86f72
Author:kaiping
Committer:Wei Wang

SINGA-10 Add Support for Recurrent Neural Networks (RNN) Add user-defined records for word (word string, word id, class id, startpos, endpos); Implement RnnDataLayer, WordLayer and RnnLabelLayer; Implement create_shard.cc for the sample dataset of rnnlmlib;

Commit:7486647
Author:jinyangturbo
Committer:wang sheng

SINGA-68 Preparation for Release -- License etc. Added ASF License to all source files. Removed gflags from LICENSE Prepared NOTICE and DISCLAIMER file.

Commit:ed9e373
Author:Wei Wang
Committer:Wei Wang

SINGA-57 Improve Distributed Hogwild The ClusterProto::sync_freq field controls the frequency of sync between server groups. After updating of Param (slice), the server checks the num of updates since last sync. It also checks the num of pending syncs (i.e., requests haven't received reponses) to avoid sending too many msgs to stopped servers (the msgs would be occupy the memory of the sending buffer) The server respones to every sync requests with the latest Param values. Note: current does not support (there is bug) multiple worker groups in one process for the distributed hogwild framework. We recommend to replace this cluster topology with in-memory hogwild, i.e., launching one worker group with multiple workers and one server group.

Commit:6d59eec
Author:Wei Wang

SINGA-51 Improve the convolution and pooling operations Caffe's im2col is adopted to speed up the convolution operation. The max pooling operation is accelerated by book-keeping the max neuron position like Caffe.

Commit:50deedd
Author:wangwei

SINGA-66 Fix bugs in Worker::RunOneBatch function and ClusterProto Add checks for test_net_ and validation_net_ (!=nullptr) before conduct testing and validation. Set default values (1) to nworker_groups and nserver_groups.

Commit:ae20303
Author:zhaojing
Committer:wangwei

SINGA-21 Code review 4 Update layers for RBM. The CD algorithm follows Hinton's science paper to do sampling (only hidden layer is sampled). May add configuration fields to control the sampling of each layer. Note. The first gibbs iteration samples the postivie data of the hidden layer (not the negative data, which is uninitialized).

Commit:53de92b
Author:Wei Wang
Committer:wangwei

SINGA-21 Code review 4 categorize all layers into 5 types: input, loss, output, neuron and connection layers. remove all is_xxlayer functions. Worker::TrainOneBatch() and Worker::TestOneBatch() have to check the layer type using other methods (e.g., typeinfo from typeid()). A function like type_id() may be added to return the type ID of a specific layer (which can be overridden by users). tested with example cnn, mlp, and rbm models.