GRUv3

function defined in module Flux


			GRUv3(in => out)

Gated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences. This implements the variant proposed in v3 of the referenced paper.

The arguments in and out describe the size of the feature vectors passed as input and as output. That is, it accepts a vector of length in or a batch of vectors represented as a in x B matrix and outputs a vector of length out or a batch of vectors of size out x B.

This constructor is syntactic sugar for Recur(GRUv3Cell(a...)), and so GRUv3s are stateful. Note that the state shape can change depending on the inputs, and so it is good to reset! the model between inference calls if the batch size changes. See the examples below.

See this article for a good overview of the internals.

Examples


			julia> g = GRUv3(3 => 5)
Recur(
  GRUv3Cell(3 => 5),                    # 140 parameters
)         # Total: 5 trainable arrays, 140 parameters,
          # plus 1 non-trainable, 5 parameters, summarysize 848 bytes.

julia> g(rand(Float32, 3)) |> size
(5,)

julia> Flux.reset!(g);

julia> g(rand(Float32, 3, 10)) |> size # batch size of 10
(5, 10)
Batch size changes

Failing to call reset! when the input batch size changes can lead to unexpected behavior. See the example in RNN.

Note:

GRUv3Cells can be constructed directly by specifying the non-linear function, the Wi, Wh, and Wh_h internal matrices, a bias vector b, and a learnable initial state state0. The Wi, Wh, and Wh_h matrices do not need to be the same type. See the example in RNN.

Methods

There is 1 method for Flux.GRUv3: