RNN
function
defined in module
Flux
RNN(in => out, σ = tanh)
The most basic recurrent layer; essentially acts as a
Dense
layer, but with the output fed back into the input each time step.
The arguments
in
and
out
describe the size of the feature vectors passed as input and as output. That is, it accepts a vector of length
in
or a batch of vectors represented as a
in x B
matrix and outputs a vector of length
out
or a batch of vectors of size
out x B
.
This constructor is syntactic sugar for
Recur(RNNCell(a...))
, and so RNNs are stateful. Note that the state shape can change depending on the inputs, and so it is good to
reset!
the model between inference calls if the batch size changes. See the examples below.
julia> r = RNN(3 => 5)
Recur(
RNNCell(3 => 5, tanh), # 50 parameters
) # Total: 4 trainable arrays, 50 parameters,
# plus 1 non-trainable, 5 parameters, summarysize 432 bytes.
julia> r(rand(Float32, 3)) |> size
(5,)
julia> Flux.reset!(r);
julia> r(rand(Float32, 3, 10)) |> size # batch size of 10
(5, 10)
Failing to call
reset!
when the input batch size changes can lead to unexpected behavior. See the following example:
julia
>
r
=
RNN
(
3
=>
5
)
Recur
(
RNNCell
(
3
=>
5
,
tanh
)
,
# 50 parameters
)
# Total: 4 trainable arrays, 50 parameters,
# plus 1 non-trainable, 5 parameters, summarysize 432 bytes.
julia
>
r
.
state
|>
size
(
5
,
1
)
julia
>
r
(
rand
(
Float32
,
3
)
)
|>
size
(
5
,
)
julia
>
r
.
state
|>
size
(
5
,
1
)
julia
>
r
(
rand
(
Float32
,
3
,
10
)
)
|>
size
# batch size of 10
(
5
,
10
)
julia
>
r
.
state
|>
size
# state shape has changed
(
5
,
10
)
julia
>
r
(
rand
(
Float32
,
3
)
)
|>
size
# erroneously outputs a length 5*10 = 50 vector.
(
50
,
)
RNNCell
s can be constructed directly by specifying the non-linear function, the
Wi
and
Wh
internal matrices, a bias vector
b
, and a learnable initial state
state0
. The
Wi
and
Wh
matrices do not need to be the same type, but if
Wh
is
dxd
, then
Wi
should be of shape
dxN
.
julia
>
using
LinearAlgebra
julia
>
r
=
Flux
.
Recur
(
Flux
.
RNNCell
(
tanh
,
rand
(
5
,
4
)
,
Tridiagonal
(
rand
(
5
,
5
)
)
,
rand
(
5
)
,
rand
(
5
,
1
)
)
)
julia
>
r
(
rand
(
4
,
10
)
)
|>
size
# batch size of 10
(
5
,
10
)
There is
1
method for Flux.RNN
:
The following pages link back here: