LSTM Implementation in Caffe
Note that the master branch of Caffe supports LSTM now. (Jeff Donahue's implementation has been merged.)
This repo is no longer maintained.
Speed comparison (Titan X, 3-layer LSTM with 2048 units)
Jeff's code is more modularized, whereas this code is optimized for LSTM.
This code computes gradient w.r.t. recurrent weights with a single matrix computation.
- Batch size = 20, Length = 100
Code |
Forward(ms) |
Backward(ms) |
Total (ms) |
This code |
248 |
291 |
539 |
Jeff's code |
264 |
462 |
726 |
- Batch size = 4, Length = 100
Code |
Forward(ms) |
Backward(ms) |
Total (ms) |
This code |
131 |
118 |
249 |
Jeff's code |
140 |
290 |
430 |
- Batch size = 20, Length = 20
Code |
Forward(ms) |
Backward(ms) |
Total (ms) |
This code |
49 |
59 |
108 |
Jeff's code |
52 |
92 |
144 |
- Batch size = 4, Length = 20
Code |
Forward(ms) |
Backward(ms) |
Total (ms) |
This code |
29 |
26 |
55 |
Jeff's code |
30 |
61 |
91 |
Example
An example code is in /examples/lstm_sequence/.
In this code, LSTM network is trained to generate a predefined sequence without any inputs.
This experiment was introduced by Clockwork RNN.
Four different LSTM networks and shell scripts(.sh) for training are provided.
Each script generates a log file containing the predicted sequence and the true sequence.
You can use plot_result.m to visualize the result.
The result of four LSTM networks will be as follows:
- 1-layer LSTM with 15 hidden units for short sequence

- 1-layer LSTM with 50 hidden units for long sequence

- 3-layer deep LSTM with 7 hidden units for short sequence

- 3-layer deep LSTM with 23 hidden units for long sequence
