Built-in Layer Types

If you started at the beginning of the guide, then you have already met the basic Dense layer, and seen Chain for combining layers. These core layers form the foundation of almost all neural networks.

The Dense exemplifies several features:

It contains an an activation function, which is broadcasted over the output. Because this broadcast can be fused with other operations, doing so is more efficient than applying the activation function separately.
It take an init keyword, which accepts a function acting like rand. That is, init(2,3,4) should create an array of this size. Flux has many such functions built-in. All make a CPU array, moved later with gpu if desired.
The bias vector is always initialised Flux.zeros32. The keyword bias=false will turn this off, i.e. keeping the bias permanently zero.
It is annotated with @layer, which means that Flux.setup will see the contents, and gpu will move their arrays to the GPU.

By contrast, Chain itself contains no parameters, but connects other layers together. The section on dataflow layers introduces others like this.

Fully Connected

Flux.Dense — Type

Dense(in => out, σ=identity; bias=true, init=glorot_uniform)
Dense(W::AbstractMatrix, [bias, σ])

Create a traditional fully connected layer, whose forward pass is given by:

y = σ.(W * x .+ bias)

The input x should be a vector of length in, or batch of vectors represented as an in × N matrix, or any array with size(x,1) == in. The out y will be a vector of length out, or a batch with size(y) == (out, size(x)[2:end]...)

Keyword bias=false will switch off trainable bias for the layer. The initialisation of the weight matrix is W = init(out, in), calling the function given to keyword init, with default glorot_uniform. The weight matrix and/or the bias vector (of length out) may also be provided explicitly.

Examples

julia> d = Dense(5 => 2)
Dense(5 => 2)       # 12 parameters

julia> d(rand32(5, 64)) |> size
(2, 64)

julia> d(rand32(5, 6, 4, 64)) |> size  # treated as three batch dimensions
(2, 6, 4, 64)

julia> d1 = Dense(ones(2, 5), false, tanh)  # using provided weight matrix
Dense(5 => 2, tanh; bias=false)  # 10 parameters

julia> d1(ones(5))
2-element Vector{Float64}:
 0.9999092042625951
 0.9999092042625951

julia> Flux.params(d1)  # no trainable bias
Params([[1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0]])