These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)
Commit: | 0b1d73c | |
---|---|---|
Author: | zhaojing |
Update the proto folder for the hfl example
The documentation is generated from this commit.
Commit: | 3d68345 | |
---|---|---|
Author: | Zrealshadow |
Add the interface proto file for the bank HFL example
Commit: | 52a4a45 | |
---|---|---|
Author: | jixin | |
Committer: | jixin |
SINGA-434 Support tensor broadcasting Support broadcasting following: https://github.com/onnx/onnx/blob/master/docs/Broadcasting.md cuda supports unidirection broadcasting due to cudnn https://docs.nvidia.com/deeplearning/sdk/cudnn-archived/cudnn_750/cudnn-developer-guide/index.html#cudnnOpTensor; cpp supports multidirectional broadcasting. Passed all tests on cudnn 7.4.2
Commit: | a88efa0 | |
---|---|---|
Author: | Vaan Ng |
Singa-341 Added stride functionality to tensors for CPP
Commit: | 2d1dd42 | |
---|---|---|
Author: | wangwei | |
Committer: | wangwei |
Fix a typo for cudnn_prefer and change the default workspace limit size to 1024MB
Commit: | e527db9 | |
---|---|---|
Author: | wangwei |
Enable configuration of cudnn workspace size limit in PySINGA
Commit: | b2e21c6 | |
---|---|---|
Author: | wang wei | |
Committer: | Wei Wang |
preparing for v1.1-rc1: remove outdated files; fix links
Commit: | 32b4368 | |
---|---|---|
Author: | Xiangrui |
SINGA-253 Net converter for caffe model Modification of pooling layer add a 'ceil' field in proto to be compatible with caffe train alexnet from caffe, achieving 82% accuracy.
Commit: | 77405ec | |
---|---|---|
Author: | XiangruiCAI | |
Committer: | Xiangrui |
SINGA-253 Net converter for caffe model Convert caffe model into singa model. It is a very basic implementation. Now it can convert feed forward net and supports pysinga. Implementation method: 1. read proto file of caffe model using caffe.proto, and serialize it to string. 2. parse the string use singa model.proto and get the layer config. 3. setup each layer and add it to a feed forward net. Update and example: 1. put conversion funcions into layer.py and converter.py; 2. add 'caffe' option in cifar example which converts alexnet from caffe to singa.
Commit: | c967169 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-259 Add maven pom file for building java classes This ticket would refactor some API in *.i files and create the pom.xml to use maven to compile the java classes. Java unittest is also enabled with a simple example. To test this ticket, 1. compile singa with java (cuda and cudnn are optional) in `build/` cmake -DUSE_JAVA=ON -DUSE_PYTHON=OFF .. make 2. goto build/java and run the unit tests export LD_LIBRARY_PATH=<path to build>/java/lib:$LD_LIBRARY_PATH mvn test The folder structure for the java project is java/ pom.xml lib/libsinga_wrap.so src/ main/java/org/apache/singa/ swig/ proto/ test/org/apache/singa/ Fixed a bug in tensor.py::from_numpy() which handled int array incorrectly.
Commit: | 7c12f40 | |
---|---|---|
Author: | WANG Ji |
SINGA-252 Use the snapshot methods to dump and load models for pysinga Turn on _packed_ option of tenser protobuf message to make them more compact in size. Now, the vgg-16 model reported in SINGA-248 is 528 MB.
Commit: | 32ba40d | |
---|---|---|
Author: | aaronwwf |
SINGA-244 Separating swig interface and python binding files - add java binding cmake files - todo: add test code
Commit: | a54c889 | |
---|---|---|
Author: | Wei Wang |
SINGA-227 Add Split and Merge Layer and add ResNet Implementation Update the resnet implementation by adding Merge and Split layers in layer.py, and enable net.py to process merge/split layers. Update the transpose setting in Dense.cc TODO(wangwei) update test_dense.cc
Commit: | 7ebea53 | |
---|---|---|
Author: | jixin | |
Committer: | Wei Wang |
SINGA-227 Add Split and Merge Layer and add ResNet Implementation Add python resnet implementation and add Split and Merge Layer. Discard split and merge layer in resnet implementation. Add add_split and add_merge function where multiple inputs or outputs involved when creating net. change comments in resnet.py
Commit: | dfc422e | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-218 Implementation for RNN CUDNN version Add an example using the char-rnn model. The trained model (with 2 stacks of lstm) over linux kernel source code could generate source code with some meaning full patterns, e.g., indention, comments, variable definition, assignments.
Commit: | c51f944 | |
---|---|---|
Author: | zhaojing | |
Committer: | Wei Wang |
SINGA-218 Implementation for RNN CUDNN version - cudnn rnn implementation (cudnn_rnn,h, cudnn_rnn.cc, rnn.cc, rnn.h, test_cudnn_rnn.cc). - The weight shape now are manually calculated instead of using API provided by CUDNN. - Test for RNN_cudnn_Tanh (unidirectional, 1 hidden layer).
Commit: | 8e0b108 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-218 Implementation for RNN CUDNN version Finish the CudnnRNN layer. Pass test for tanh rnn. RNN forward accepts a vector of input tensors: <x0, x1, ... x(n-1), hx, cx> x(i) is the i-th input tensor, hx is the init hidden tensor which could be a dummy tensor. A dummy tensor is a tensor created without shape/device/data_type, during compuation, cudnnRNN would use 0s for this tensor. cx is not necessary for relu/tanh/gru rnn. For lstm, it could also be a dummy tensor like hx. The output is: <y0, y1, ... y(n-1), hy, cy>. relu/tanh/gru rnns does not have cy. lstm have both hy and cy. RNN backward accepts a vector of input gradient tensors: <dy0, dy1, ... dy(n-1), dhy, dcy>. dhy is necessry for all rnns, but could be a dummy tensor, in which case a tensor with 0s would be used for dhy during computation. dcy is used only for lstm, which could also be a dummy tensor. The output is: <dw, <dx0, dx1, ... dx(n-1), dhx, dcx>>, where dhx is a tensor for the gradient of hx. dcx is only used for lstm. The CudnnRNN must be moved onto cuda, otherwise memory error would happen (the weight is on cpu).
Commit: | 7e050c1 | |
---|---|---|
Author: | jixin | |
Committer: | jixin |
SINGA-215 Implement Image Transformation for Image Pre-processing Implement some of the image transformations (i.e., crop, resize, horizontal_mirror) for pre-processing stage. Image resize need to use opencv library. Currently image transformation only support one sample input each time. Add static function resize, crop and mirror for image transformation. Training phase use random crop and random mirror, Evaluation phase use central crop and no mirror. Cannot test phases for mirror because of a random factor. Fix bugs in test files involving memory allocation and access.
Commit: | 7444f0a | |
---|---|---|
Author: | jixin | |
Committer: | jixin |
SINGA-213 Implement Encoder and Decoder for CSV TextEncoder encodes a Tensor into a 1D string where data are splited by comma. TextDecoder decodes a string of data into Tensors (data and [optional]label).
Commit: | dc013f3 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-194 Add a Platform singleton Add the Platform class whose methods are all static. It includes methods for query hardware GPUs, e.g., num of gpus, memory of each gpu. It also creats a vector of singa::Device, e.g., CudaGPU. NOTE: If multiple CudaGPU devices are created, and they all use CnMemPool, then they must share the same CnMemPool instance (otherwise there would be problems in desctructing the CnMemPool). It is preferred to use Platform to create CudaGPU devices which would avoid the above problem. Updated the APIs for Device and DeviceMemPool. The GetAllocateMem is not accurate (which is larger than actually requested memory size). NOTE: The swig would report errors if USE_CUDA is OFF as the CudaGPU is not avaiable. TODO(wangwei) resolve this problem in later PR for python setup.
Commit: | 62c6603 | |
---|---|---|
Author: | WANG Ji | |
Committer: | WANG Ji |
SINGA-210 Enable checkpoint and resume for v1.0 This ticket is going to add code for dumping the model parameters as checkpoint files, which could be used for fine-tuning and deployment. Serialize Tensor into TensorProto and save it in BinFile, which is stored as <prefix>.model, and generate description about parameters in <prefix>.desc. Unit test cases passed for kFloat, kInt and kDouble data type.
Commit: | 97648a6 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-204 Support the training of feed-forward neural nets Implement Alexnet model for Cifar10 https://code.google.com/p/cuda-convnet/ The accuracy of the test data is 0.82 (the same as reported by in the above link). NOTE: 1. do not convert from uint_8 data to float directly. Instead, cast the data to uint8_t and then to float. 2. it is necessary to substract the max value for softmax. Numeric error (nan) is likely to happen otherwise.
Commit: | 71eb059 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-204 Support the training of feed-forward neural nets Implement Alexnet model for Cifar10 https://code.google.com/p/cuda-convnet/ But the test accuracy is low 0.72 (which should be 0.82).
Commit: | dd08f41 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
Merge PR #165 for CnMeM Fixbugs from device type (Device* -> std::shared_ptr<Device>).
Commit: | 14d31a4 | |
---|---|---|
Author: | Wei Wang |
SINGA-200 - Implement Encoder and Decoder for data pre-processing Add src/proto/io.proto, which was missed in the previous commit.
Commit: | 833f461 | |
---|---|---|
Author: | Wei Wang |
SINGA-200 - Implement Encoder and Decoder for data pre-processing Improve the JPG2ProtoDecoder and Proto2JPGEncoder to consider the image dimension order "HWC" or "CHW". Change the input image Tensor's data type to kUChar, i.e., unsigned char. Change Tensor::data() API, to cont SType* data<DType>() const; TODO add other encoding functions to accept other inputs, e.g., cv::Mat.
Commit: | 13d60b0 | |
---|---|---|
Author: | jixin | |
Committer: | jixin |
SINGA-200 - Implement Encoder and Decoder for data pre-processing Add base class Encoder and Decoder. Add inheritance class Image2JPG_Encoder and Image2JPG_Decoder, which uses opencv to encode and decode image source files. Add test which contains partial pre-processing procedures (read_raw_image, encode, decode) for images. Reader and Writer, which uses user specified offline store mechanisms (i.e., kvfile, lmdb, ...), are not implemented in this ticket. Use random value initialized image for test. Add derived encoder/decoder class definition in its base class head file (decoder.h/encoder.h).
Commit: | bef1db0 | |
---|---|---|
Author: | jixin | |
Committer: | jixin |
SINGA-200 - Implement Encoder and Decoder for data pre-processing Add CMake scripts for singa io libs.
Commit: | 077d13e | |
---|---|---|
Author: | liyuchenmike@gmail.com |
SINGA-175 Add memory management APIs and implement a subclass using CNMeM Add base memory pool class. Implement two subclasses, CnMemPool and CudaMemPool. Add test for the memory pools. TODO replace Device* to std::shared_ptr<Device> to avoid memory error because the order of destructing device and tensor are dynamic (device may be freed before tensors)
Commit: | b91002b | |
---|---|---|
Author: | jixin | |
Committer: | Wei Wang |
SINGA-180 Add Activation layer and Softmax layer Fix a bug in cudnn softmax and let softmax support 1D or 2D tensor as input.
Commit: | 74f0214 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-198 - Change Layer::Setup API to include input Tensor shapes Update the setup function from ``` void Setup(const LayerConf& conf); ``` to ``` void Setup(const Shape& in_sample_shape, const LayerConf& conf); // for single input void Setup(const vector<Shape>& in_sample_shapes, const LayerConf& conf); // for multiple outputs ``` functions for getting output sample shape are added ``` const Shape GetOutputSampleShape() const; // used for single output const Shape GetOutputSampleShape(int k) const; // used for multiple outputs ``` pass tests.
Commit: | 0cd9663 | |
---|---|---|
Author: | WANG Ji | |
Committer: | Wei Wang |
SINGA-192 Implement optimization algorithms for v1 Change interface of Apply() from Optimizer class, now Apply() method will receive a ``const& tensor grad'' argument rather than ``tensor* grad'', that is to say, this method will not cause side effect on grad tensor.
Commit: | 58be3f8 | |
---|---|---|
Author: | Wei Wang |
SINGA-190 - Add prelu layer and flatten layer Format code. Fix warning info from compilation.
Commit: | 5784bff | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-192 Implement optimization algorithms for v1 Merge branch PR#164 into dev Fix the bugs in test adagrad and rmsprop. Note, expect near (with diff 1e-5) is used to avoid numeric bugs. Need to do test on more machines.
Commit: | 5afd81b | |
---|---|---|
Author: | jixin | |
Committer: | jixin |
SINGA-190 - Add prelu layer and flatten layer Implement prelu layer and flatten layer for cpu version. Write gtest for prelu and flatten layer. Pass all tests.
Commit: | 178db01 | |
---|---|---|
Author: | WANG Ji | |
Committer: | WANG Ji |
SINGA-192 Implement optimization algorithms for v1 implement optimization algorithms for Singa v1 including nesterov, adagrad, rmsprop. Add unit test cases for these algorithms. However, only nesterov passed the test case, adagrad and rmsprop need Sqrt() operation for tensor which has not been implemented yet.
Commit: | eadd3f9 | |
---|---|---|
Author: | WANG Ji | |
Committer: | Wei Wang |
SINGA-174 Add Batch Normalization layer and Local Response Normalization layer. Implemented Batch Normalization Layer and Local Response Normalization Layer using CuDNN. Passed test.
Commit: | 64ea206 | |
---|---|---|
Author: | Wei Wang |
SINGA-188 Add Dense layer Minor change to format code and update IDs of DenseConf fields.
Commit: | 73d4a34 | |
---|---|---|
Author: | zhaojing | |
Committer: | zhaojing |
SINGA-188 Add Dense layer Add implementation for dense layer Add test files for dense layer, both cpp version and cuda version Pass all tests
Commit: | 2dac380 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA0-183 Add the base classes for optimizer, constraint and regularizer Draft base optimizer, constraint and regularizer classes. The API for local all reduce is also added (in comments). Test sgd with/without momentum using cpp and cuda.
Commit: | 7d149ec | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-178 Add Convolution layer and Pooling layer Minor update on variable names and InitCudnn arguments. Fix compiling warnings about signed and unsigned number comparison. Format code. Pass all tests.
Commit: | 152056d | |
---|---|---|
Author: | XiangruiCAI | |
Committer: | XiangruiCAI |
SINGA-178 Add Convolution layer and Pooling layer Add CudnnConvolution layer and CudnnPooling layer. Add tests for these two layers. Passed tests and cpplint.py. (Add on May 30th) Process the parameters and specs in this way, 1. Each layer is responsible for pushing its own parameters into the param_value vector, and pushing its own param specs into the param_specs vector. 2. Each layer is responsible for deleting the tensors it created including params and buffer tensors.
Commit: | 3171459 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-176 - Add loss and metric base classes Rename layer.proto to model.proto, which includes proto messages for classes under model/
Commit: | d680079 | |
---|---|---|
Author: | wangwei | |
Committer: | Wei Wang |
SINGA-176 - Add loss and metric base classes Add loss and metric base classes, and implement the MSE as a sub-class of Loss and the Accuracy as a subclass of Metric. Add math functions to support the metric/loss classes. Draft test files for MSE and Accuracy.
Commit: | 9d1bcb4 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-171 - Create CppDevice and CudaDevice Rename Device subclasses based on the programming language and hardware, e.g., CppCPU indicates the device is a CPU which runs cpp code, CudaGPU indicates the device is a NvidiaGPU which runs cuda code, and CudaCPU indicates the device is a CPU which uses cuda to malloc and free pinned memory for the CudaGPU. Corrspondingly, we rename the lib namepace to lang. and Device type() to lang().
Commit: | 282712c | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-171 - Create CppDevice and CudaDevice Add CppDevice and CudaDevice API. Implement CppDevice and add test for it. There is link error for cudnn.
Commit: | 99e0d24 | |
---|---|---|
Author: | Wei Wang | |
Committer: | wangwei |
SINGA-170 Add Dropout layer and CudnnDropout layer pass compilation. there is link error for cudnn.
Commit: | 02851fa | |
---|---|---|
Author: | Wei Wang | |
Committer: | wangwei |
SINGA-167 - Add Tensor Math function APIs Add basic linalg functions for Tensor Add blas functions for Tensor. Unify gemm and gemv in Tensor::Mult this commit also contains code for Param class, which woud be removed in the next commit.
Commit: | e36bc92 | |
---|---|---|
Author: | Wei Wang | |
Committer: | wangwei |
SINGA-169 Add base Layer class for V1.0 Adapt caffe.proto to create layer.proto. As long as we support the similar set of layers as other systems, e.g., caffe/torch, we would have the similar hyper-parameters. If we define our own layer.proto, we only create new names for the same hyper-parameters. Considering caffe has implemented a wide range of layers, it would be conveient to adapt based caffe.proto. More over, we can then run caffe's model configurations or pre-trained model in SINGA with little changes. We change the 'parameter' to 'conf' to distinguish parameter and hyper-parameter (use configuration). Some fields are not used, but their tags are reserved. We will later write a converter program/script to run caffe models. The base Layer class API is drafted.
Commit: | dc5aa6e | |
---|---|---|
Author: | Wei Wang | |
Committer: | wangwei |
SINGA-164 - Add the base Tensor class Implement Tensor member functions. Test the constructors and member classes. A simple Device implementation is provided to pass the compilation. TODO implement Tensor math functions.
Commit: | dd1e4af | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-163 - Reorganize the project folder layout Remove files in V0.x that will not be used or need major updates. Reorganize the layout to have the following modules in src core, utils, proto, layer, model, python. The test module is separated into the root folder.
Commit: | fde87ed | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-159 Rewrite safe_queue.h to avoid the license issue Rewrite the safe_queue.h. Add a PriorityQueue which is a thread safe priority queue. It is implemented using the base SafeQueue. Fixed a bug from the protobuf extension field ID in rnnlm example. Remove "Copyright 2015 The Apache Software Foundation" from Apache headers in all files.
Commit: | 991c6ab | |
---|---|---|
Author: | Wei Wang |
SINGA-130 Data Prefetching Fix a bug due to thread_ not joined and deleted clearly. Always allocate the buf_key and buf_val as they are used no matter prefetching is enabled or not.
Commit: | a0bdd0b | |
---|---|---|
Author: | Anh Dinh | |
Committer: | Anh Dinh |
SINGA-130 Data prefetching layer Extended StoreInputLayer to support prefetching of data. It maintains a buffer for (key,value) pairs read from the storage layer. In Setup(), it launches a new thread for reading data into the buffer. This thread stores data into the buffer. The ComputeFeature() method waits for thread to finish (join) before parsing it into data_ and aux_ field. Finally, it launches another thread. In terms of memory consumption, this prefetching use extra (batchsize*recordsize) bytes for the buffer. However, we observe no visible runtime improvement, as I/O time is very small (in order of milliseconds without prefetching, and tens of microsecond with prefetching) compared to CPU time.
Commit: | efdf6d7 | |
---|---|---|
Author: | Wei Wang |
SINGA-136 Support cuDNN v4 Merge for SINGA-136. Conflicts: Makefile.am src/proto/job.proto
Commit: | a91e82f | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-131 Implement and optimize hybrid training using both CPU and GPU Allow users set the batchsize of instances of StoreInputLayer manually. We can then assign different workload for GPU and CPU workers who have different input layers, e.g., ``` batchsize: 128 batchsize: 16 ``` If the first worker is GPU and the second worker is CPU, then the above setting would assign 128 images per mini-batch to the GPU worker and assign 16 images to the CPU worker. Internally, StoreInputLayer gets its batchsize based on its partition ID. Currently, it works for MNIST example which has no Conv layers. For Conv layers, since the GPU and CPU implementations have different layer names, we cannot using the single net config.
Commit: | 4b4ad05 | |
---|---|---|
Author: | seaok |
SINGA-136 Support cuDNN v4 Add cuDNN BatchNorm Layer
Commit: | 1c8e0dc | |
---|---|---|
Author: | Wei Wang |
SINGA-126 Python Binding for Interactive Training 1. Replace 'x != None' with 'x is not None' 2. Fixed the bug from type mismatch: debug should be set to False before passing it to SINGA's loss layer. 3. Set default value for SingaProto's zookeeper endpoint. Then we can ignore `-singa_conf xxx'. SINGA would assume that the zookeeper endpoint is the default one ('localhost:2181'), and glog would use its default logging dir.
Commit: | 8130b7e | |
---|---|---|
Author: | chonho | |
Committer: | chonho |
SINGA-126 Python Binding for Interactive Training - add 2 example python codes for interactive training . train_mnist.py . train_cifar10.py - add methods/class in singa/layer.py . ComputeFeature, ComputeGradient, Feed, Setup . GetParams, SetParams, GetData . Dummy() - add methods in singa/model.py . save_model_parameter . load_model_parameter - add Feed fucntion in src/neuralnet/neuron_layer/dummy.cc . correspond to class Dummy() in layer.py note: DummyInputLayer and kDummyInput are removed - add functions in src/worker.cc . Checkpoint . InitNetParams - add CreateXXX functions to set up singa::XXX from string proto . XXX are Layer, Updater, Worker - update tool/python/singa/driver.i for wrapper - include cifar10_mean_image in examples/datasets/
Commit: | e32e70c | |
---|---|---|
Author: | ijingo | |
Committer: | jinyangturbo |
SINGA-145 New SGD based optimization Updaters: AdaDelta, Adam, AdamMax New Updaters: AdaDelta, Adam, AdamMax. To implement AdamMax, add two new operators for Tensor in cxxnet_op.h i.e. op::abs and op::max.
Commit: | c72ef0f | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-120 - Implemented GRU and BPTT Add configuration fields (vocab_size) for OneHotLayer. Configure gpu to 0 for all examples.
Commit: | 959ef70 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-120 - Implemented GRU and BPTT Add input layers for char rnn example. Fix the bug from worker.cc for flag setting in computegradient Run with GPU; Loss decreases slowly to 3 per unit; Todo add RNNDummyLayer and train with RMSProp
Commit: | 6a4c996 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-120 - Implemented GRU and BPTT Change new memory computation formula following char-rnn (i.e., element-wise multiplication before matrix multiplication)
Commit: | 473c985 | |
---|---|---|
Author: | Ju Fan | |
Committer: | Wei Wang |
SINGA-120 - Implemented GRU and BPTT: 1) Updated Driver.cc to register GRU; 2) Updated job.proto to include configuration of GRU; 3) Updated configure.ac to fix some compliation errors
Commit: | bb75a0b | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-98 Add Support for AlexNet ImageNet Classification Model Update the CudnnActivationLayer to share the data and grad blob with conv layer for memory space reduction. It is controlled by the share_src_blobs field in the job config file. The loss reduces after 3000 iterations using 256 mini-batch like Caffe. cpplint check; updte job conf for cpu training;
Commit: | 6e815db | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-98 Add Support for AlexNet ImageNet Classification Model Update job.conf for alexnet: learning rate, layer order, lr_scale/wd_scale; add cudnn.conf. Fix a bug in image_preprocess.cc which sets the dst pointer incorrectly. It leads to the observation that the loss and accuracy does not improve after a few iterations; (the loss is about 6.90x, tested for about 10k iterations); Cafffe's performance starts improving after 3000 iterations (is around 6.90x during 200-3500 iterations). After fixing the bug, training using mini-batch 128 works, but the loss starts reducing after around 10k steps.
Commit: | 7d43e27 | |
---|---|---|
Author: | chonho | |
Committer: | chonho |
SINGA-81 Add Python Helper, which enables users to construct a model (JobProto) and run Singa in Python - Add wrapper API "Driver::Train(const std::string Job_conf)" for python. - Add wrapper API "Driver::Test(const std::string Job_conf)" for python. - Python codes (1) construct a model (JobProto), and (2) run singa - Users are supposed to generate 'usermodel.py' . examples are provided, e.g., cifar10_cnn.py, mnist_mlp.py, mnist_rbm.py . 'cluster.conf' is required to maintain cluster information - Users are supposed to run it as follows. e.g., {code} cd SINGA_ROOT bin/singa-run.sh -conf tool/python/examples/cluster.conf -exe tool/python/examples/mnist_mlp.py {code} - Note: in job.proto, 'required' rule of the following fields should be changed to 'optional' . JobProto: name, neuralnet, train_one_batch, updater, train_steps . ClusterProto: workspace . workspace field can be set in either (i) cluster.conf or (ii) python code - __init__.py is required in the following directories . singa . singa/utils . singa/datasets . examples - Add StoreResult() that takes care of training results . in SingaRun() called by fit() or evaluate() . read logfile . store accuracy, loss, ppl, se, etc. in dictionary format - Parameter initialization . Parameter class is internally used . Weight follows gaussian distribution at default . Bias follows constant at default . As an option, users can explicitly specify parameter (e.g., *_parameter.py) - Removed dataset/ae.py and dataset/rbm.py . RBM and Autoencoder examples use Mnist dataset
Commit: | 1bc5007 | |
---|---|---|
Author: | Wei Wang |
SINGA-118 Make protobuf LayerType field id easy to assign Add comments for layer specific configurations.
Commit: | b8ac2b8 | |
---|---|---|
Author: | WANG Sheng |
SINGA-118 Make protobuf LayerType field id easy to assign clean ids in job.conf: - Input Layer: 100 - 199 - Neuron Layer: 200 - 299 - Loss Layer: 300 - 399 - Output Layer: 400 - 499 - Connection Layer: 500 - 599
Commit: | 8af565c | |
---|---|---|
Author: | WANG Sheng | |
Committer: | Wei Wang |
SINGA-113 Model/Hybrid Partition Support minor change after rebase to master branch * pass compilation * change Blob A.CopyFrom(B) to Copy(A, B) * make LayerProto.partition_dim stand for own partition dimension only * change InnerProduct layer connection to OneToAll
Commit: | 2d38cb3 | |
---|---|---|
Author: | WANG Sheng | |
Committer: | Wei Wang |
SINGA-113 Model/Hybrid Partition Support NeuralNet how will automatically add connection layers for distributed training. The partition is transparent to users. User just need to configure partition_dim field in NetProto: partition_dim = 0 : data partition partition_dim = 1 : model partition User can also overwrite partition_dim for a specific layer in LayerProto. This will result in hybrid partition.
Commit: | f8be9af | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-100 Implement layers using CUDNN for GPU training 1. Fix bugs from rebase onto latest master. The LOG(FATAL) inside mutable_grad and mutable_data for input/output/loss layers are commmented out. Because other layers may call check mutable_xxx() == nullptr. 2. Update Makefile.am; run make test;
Commit: | 8174760 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-100 Implement layers using CUDNN for GPU training test with multi-gpus on cifar10; setting batchszie=500, it takes one iteration 3s on single gpu, and 2s on 2 gpus fix bug from cudnnsoftmax and cudnnsoftmaxloss; todo debug accuracy problem. the accuracy improves slower than that from caffe and cannot reach 0.8 finally.
Commit: | 8cacd83 | |
---|---|---|
Author: | seaokcs | |
Committer: | Wei Wang |
SINGA-100 Implement layers using CUDNN for GPU training tmp commit; setup device in driver.cc, register cudnn layers; tmp commit; 1.Add kernel_sum_by_row() cuda kernel functions 2.Fixed neuron_layer.h bug: cudnn.h should be placed outside the namespace 3.Register CudnnSoftmaxLoss class (job.proto)
Commit: | 6e56334 | |
---|---|---|
Author: | seaokcs | |
Committer: | Wei Wang |
SINGA-100 Implement layers using CUDNN for GPU training Pass cublas handle from math_blob to math_addr. Test configure-make for cpu code. Compile success for Makefile.gpu. Todo set up Context when creating worker threads.
Commit: | af1bf50 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-100 Implement layers using CUDNN for GPU training * Add cudnn layers, including convoltuion, pooling, lrn, and activation (relu, sigmoid, tanh) * move declarations of layers from single file into <category>_layer.h under include/singa/neuralnet/ TODO compile the code with '#' will be ignored, and an empty message aborts the commit.
Commit: | 49293a6 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-100 Implement layers using CUDNN for GPU training Fix errors from compiling cpu code and cudnn code. tmp commit. Trying to run cudnn for gpu training and Mshadow for cpu training. 1. finish the CudnnSoftmaxLossLayer 2. finish the DropoutLayer 3. add ActivationLayer, which leads to link errors.
Commit: | 7cdb22f | |
---|---|---|
Author: | WANG Sheng | |
Committer: | WANG Sheng |
SINGA-111 Add slice, concate and split layers remove dedicated conf fields (slice_conf, concate_conf, split_conf) instead, use partition_dim, num_partitions in LayerProto for configuration
Commit: | 7e414e5 | |
---|---|---|
Author: | WANG Sheng | |
Committer: | WANG Sheng |
SINGA-111 Add slice, concate and split layers Add split layer implementation and test case
Commit: | 28e48a6 | |
---|---|---|
Author: | WANG Sheng | |
Committer: | WANG Sheng |
SINGA-111 Add slice, concate and split layers Add slice layer implementation and test case
Commit: | 4664b6b | |
---|---|---|
Author: | WANG Sheng | |
Committer: | WANG Sheng |
SINGA-106 Add dummy layer for test purpose Dummy layer can be used as input, neuron, output layer to construct a simple neuralnet when only focusing on testing a specific layer * Use as neuron layer (default): In compute feautre, it copy srclayer's data to its own data blob In compute feature, it copy its own grad blob to srclayer's grad * Use as input layer: It will create data and config blobs as configed. In compute feature, it randomly set values in data_ to [0,1] To use: - set dummy_conf.input as true - add dummy_conf.shape * Use as output layer: In compute feature, it randomly set values in grad_ to [0,1] To use: - set dummy_conf.output as true
Commit: | e1a6f14 | |
---|---|---|
Author: | jixin | |
Committer: | jixin |
SINGA-87 Replace exclude field to inlcude field for layer configuration (still support exclude field). Exclude field is deprecated, please use include field instead. Notice that include field only includes the specified layer. neuralnet.cc - add flag to indicate state, e.g., flag=0, no exclude and no include - add checks for exclude and include field, they cannot appear in the same .conf file job.proto - add include field definition in LayerProto
Commit: | 059ae9a | |
---|---|---|
Author: | jixin | |
Committer: | jixin |
SINGA-87 Replace exclude field to inlcude field for layer configuration (still support exclude field). Add include field definition in LayerProto.
Commit: | 5d076e5 | |
---|---|---|
Author: | Wei Wang |
SINGA-11 Start SINGA on Apache Mesos Merge branch 'feature-mesos'
Commit: | f0e1629 | |
---|---|---|
Author: | Anh Dinh | |
Committer: | Anh Dinh |
SINGA-11 Start SINGA on Apache Mesos Apache Mesos is a fine-grained cluster management framework which enables resource sharing in the same cluster. Mesos abstracts out the physical configurations of cluster nodes, and presents resources to the users in the form of "offers". SINGA uses Mesos for two purposes: 1. To acquire necessary resources for training the model. 2. To launch and monitor progress of the training task. To this end, we implement a "SINGA Scheduler" which interacts with Mesos master. The scheduler assumes that SINGA has been installed at the Mesos slave nodes. The scheduler is called when the user wants to start a new SINGA job, and it performs the following steps: Step 1. Read the job configuration file to determine necessary resources in terms of CPUs, memory and storage. Step 2. Wait for resource offers from the Mesos master. Step 3. Determine if the offers meet the requirement of resources. Step 4. Prepare the task to launch at each slave: + Deliver the job configuration file to the slave node. + Specify the command to run on the slave: "singa -conf ./job.conf" Step 5: Launch and monitor the progress For step 3, we currently implement a simple scheme: the number of CPUs offered by each Mesos slave exceed the total number of SINGA worker and SINGA server per process. In other words, each Mesos slave must be able to run the entire worker group or server group. For step 4, we currently relies on HDFS to deliver the configuration file to each slave. Particularly, we write the file to a known directory (different for each job) on HDFS and ask the slave to use its Fetcher utility to download the file before executing the task. The source code and README.md files are in the `tool/mesos` directory.
Commit: | ef9a111 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-91 - Add SoftmaxLayer and ArgSortLayer SoftmaxLayer applies the Softmax function against its source layer to compute its probability distribution over all labels. ArgSortLayer sorts labels based on their scores (e.g., probability) in descending order. Configuration for ArgSortLayer is like argsort_conf{ topk: 1}. Topk results will be extracted. Connecting ArgSortLayer to a CSVOutputLayer, we can dump the topk labels of each instance into one line.
Commit: | e928ebb | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-85 Add functions for extracting features and test new data * to test pre-trained model, 1. add checkpoint_path in job configuration to the checkpoint files 2. setup input layer properly to read test data 3. run singa with -test argument (e.g., ./bin/singa-run.sh -conf JOB_CONF -test * to extract features of new data, 1. add checkpoint_path in job configuration to the checkpoint files 2. setup input layer properly to read raw data 3. add outputl layers (e.g., RecordOutputLayer or CSVOutputLayer) after the layer whose data you want to extract. 4. run singa with -test argument (e.g., ./bin/singa-run.sh -conf JOB_CONF -test A test net (i.e., kTest phase is passed to creat the net) will be create to test performance and extract features. job.conf of cifar10 example contains instructions for extracting features and testing performance with checkpoint files. Rename ProtoRecordLayer and CSVRecordLayer to RecordInputLayer and CSVInputLayer respectively. Add RecordOutputLayer and CSVOutpuLayer for outputing data as RecordProto and CSV format.
Commit: | 5f010ca | |
---|---|---|
Author: | Wei Wang | |
Committer: | wang sheng |
SINGA-82 Refactor input layers using data store abstraction * Add StoreLayer to read data from Store, e.g., KVFile, TextFile (will add support for HDFS later). * Implemente subclasses of StoreLayer to parse different format tuples, e.g., SingleLabelImageRecord or CSV line. * Update examples to use the new input layers. * Add unit tests. * Add a function for Layer class, which returns a vector<AuxType> for auxiliary data (e.g., label). TODO 1. make AuxType a template argument of Layer class, and extend data() to return a vector of Blob for multiple dense features. 2. separate layer classeses into different files to make the structure of the source folder clear.
Commit: | 321ef96 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-70 Refactor API of Layer, Worker, Server and Driver For Layer class * Setup, ComputeFeature and ComputeGradient are updated to accept one argument which represents all source layers. * DebugString() is changed to ToString() for displaying debug info and other info. For example, the performance values can be aggregated in ComputeFeature function and then converted into string in ToString(). * the srclayer and dstlayer fields are removed. For Worker class * Report function is removed. The performance is now collected via Layer::ToString() and is reported by each worker. The Trainer class is renamed to Stub * Only the Run() function and message handling/generation functions are remained. Functions for creating servers and workers are moved into Driver. The Driver class * Rename function Submit to Train. * Add functions for creating workers and servers. All files under trainer folder are moved outside to be under src/ or include/.
Commit: | 3dc1eee | |
---|---|---|
Author: | chonho | |
Committer: | Wei Wang |
SINGA-10 Add Support for Recurrent Neural Networks (RNN) Revise Makefile.example; Add job.conf; Update src/utils/tool.cc to properly parse job.conf;
Commit: | ba3b1a5 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-10 Add Support for Recurrent Neural Networks (RNN) * Move the functions of WordLayer into the EmbeddingLayer. * Make DatLayer a subclass of both RNNLayer and DataLayer. * create_shard.cc wraps the WordRecord inside singa::Record, and inserts singa::Record into the DataShard. * Make the inheritance of base layer classes like InputLayer, NeuronLayer, etc. virutal from Layer to compilation problems if a future layer is declare to inherit two base layer. * Update the documentation on the website for RNNLM example. Need to optimize the training speed. The PPL is similar to that from RNNLM Toolkit.
Commit: | c1c6a2e | |
---|---|---|
Author: | chonho | |
Committer: | Wei Wang |
SINGA-10 Add Support for Recurrent Neural Networks (RNN) Add README.md to show instructions of RNNLM example; Revise the following files to remove warnings by a code checker, cpplint.py; create_shard.cc, rnnlm.h, rnnlm.cc, rnnlm.proto Add license paragraph to the following files; create_shard.cc, rnnlm.h, rnnlm.cc, rnnlm.proto, Makefile.example
Commit: | 13b1c08 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-10 Add Support for Recurrent Neural Networks (RNN) Draft upper layers for rnnlm; Compile using Makefile.example;
Commit: | ad86f72 | |
---|---|---|
Author: | kaiping | |
Committer: | Wei Wang |
SINGA-10 Add Support for Recurrent Neural Networks (RNN) Add user-defined records for word (word string, word id, class id, startpos, endpos); Implement RnnDataLayer, WordLayer and RnnLabelLayer; Implement create_shard.cc for the sample dataset of rnnlmlib;
Commit: | 7486647 | |
---|---|---|
Author: | jinyangturbo | |
Committer: | wang sheng |
SINGA-68 Preparation for Release -- License etc. Added ASF License to all source files. Removed gflags from LICENSE Prepared NOTICE and DISCLAIMER file.
Commit: | ed9e373 | |
---|---|---|
Author: | Wei Wang | |
Committer: | Wei Wang |
SINGA-57 Improve Distributed Hogwild The ClusterProto::sync_freq field controls the frequency of sync between server groups. After updating of Param (slice), the server checks the num of updates since last sync. It also checks the num of pending syncs (i.e., requests haven't received reponses) to avoid sending too many msgs to stopped servers (the msgs would be occupy the memory of the sending buffer) The server respones to every sync requests with the latest Param values. Note: current does not support (there is bug) multiple worker groups in one process for the distributed hogwild framework. We recommend to replace this cluster topology with in-memory hogwild, i.e., launching one worker group with multiple workers and one server group.
Commit: | 6d59eec | |
---|---|---|
Author: | Wei Wang |
SINGA-51 Improve the convolution and pooling operations Caffe's im2col is adopted to speed up the convolution operation. The max pooling operation is accelerated by book-keeping the max neuron position like Caffe.
Commit: | 50deedd | |
---|---|---|
Author: | wangwei |
SINGA-66 Fix bugs in Worker::RunOneBatch function and ClusterProto Add checks for test_net_ and validation_net_ (!=nullptr) before conduct testing and validation. Set default values (1) to nworker_groups and nserver_groups.
Commit: | ae20303 | |
---|---|---|
Author: | zhaojing | |
Committer: | wangwei |
SINGA-21 Code review 4 Update layers for RBM. The CD algorithm follows Hinton's science paper to do sampling (only hidden layer is sampled). May add configuration fields to control the sampling of each layer. Note. The first gibbs iteration samples the postivie data of the hidden layer (not the negative data, which is uninitialized).
Commit: | 53de92b | |
---|---|---|
Author: | Wei Wang | |
Committer: | wangwei |
SINGA-21 Code review 4 categorize all layers into 5 types: input, loss, output, neuron and connection layers. remove all is_xxlayer functions. Worker::TrainOneBatch() and Worker::TestOneBatch() have to check the layer type using other methods (e.g., typeinfo from typeid()). A function like type_id() may be added to return the type ID of a specific layer (which can be overridden by users). tested with example cnn, mlp, and rbm models.