LSTM
function defined in module
Flux
LSTM(in => out)
Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.
The arguments
in and
out describe the size of the feature vectors passed as input and as output. That is, it accepts a vector of length
in or a batch of vectors represented as a
in x B matrix and outputs a vector of length
out or a batch of vectors of size
out x B.
This constructor is syntactic sugar for
Recur(LSTMCell(a...)), and so LSTMs are stateful. Note that the state shape can change depending on the inputs, and so it is good to
reset! the model between inference calls if the batch size changes. See the examples below.
See this article for a good overview of the internals.
julia> l = LSTM(3 => 5)
Recur(
LSTMCell(3 => 5), # 190 parameters
) # Total: 5 trainable arrays, 190 parameters,
# plus 2 non-trainable, 10 parameters, summarysize 1.062 KiB.
julia> l(rand(Float32, 3)) |> size
(5,)
julia> Flux.reset!(l);
julia> l(rand(Float32, 3, 10)) |> size # batch size of 10
(5, 10)
Failing to call
reset! when the input batch size changes can lead to unexpected behavior. See the example in
RNN.
LSTMCells can be constructed directly by specifying the non-linear function, the
Wi and
Wh internal matrices, a bias vector
b, and a learnable initial state
state0. The
Wi and
Wh matrices do not need to be the same type. See the example in
RNN.
There is
1
method for Flux.LSTM:
The following pages link back here: