Shape Inference

Flux has some tools to help generate models in an automated fashion, by inferring the size of arrays that layers will recieve, without doing any computation. This is especially useful for convolutional models, where the same Conv layer accepts any size of image, but the next layer may not.

The higher-level tool is a macro @autosize which acts on the code defining the layers, and replaces each appearance of _ with the relevant size. This simple example returns a model with Dense(845 => 10) as the last layer:

@autosize (28, 28, 1, 32) Chain(Conv((3, 3), _ => 5, relu, stride=2), Flux.flatten, Dense(_ => 10))

The input size may be provided at runtime, like @autosize (sz..., 1, 32) Chain(Conv(..., but all the layer constructors containing _ must be explicitly written out – the macro sees the code as written.

This macro relies on a lower-level function outputsize, which you can also use directly:

c = Conv((3, 3), 1 => 5, relu, stride=2)
Flux.outputsize(c, (28, 28, 1, 32))  # returns (13, 13, 5, 32)

The function outputsize works by passing a "dummy" array into the model, which propagates through very cheaply. It should work for all layers, including custom layers, out of the box.

An example of how to automate model building is this:

"""
    make_model(width, height, [inchannels, nclasses; layer_config])

Create a CNN for a given set of configuration parameters. Arguments:
- `width`, `height`: the input image size in pixels
- `inchannels`: the number of channels in the input image, default `1`
- `nclasses`: the number of output classes, default `10`
- Keyword `layer_config`: a vector of the number of channels per layer, default `[16, 16, 32, 64]`
"""
function make_model(width, height, inchannels = 1, nclasses = 10;
                    layer_config = [16, 16, 32, 64])
  # construct a vector of layers:
  conv_layers = []
  push!(conv_layers, Conv((5, 5), inchannels => layer_config[1], relu, pad=SamePad()))
  for (inch, outch) in zip(layer_config, layer_config[2:end])
    push!(conv_layers, Conv((3, 3), inch => outch, sigmoid, stride=2))
  end

  # compute the output dimensions after these conv layers:
  conv_outsize = Flux.outputsize(conv_layers, (width, height, inchannels); padbatch=true)

  # use this to define appropriate Dense layer:
  last_layer = Dense(prod(conv_outsize) => nclasses)
  return Chain(conv_layers..., Flux.flatten, last_layer)
end

m = make_model(28, 28, 3, layer_config = [9, 17, 33, 65])

Flux.outputsize(m, (28, 28, 3, 42)) == (10, 42) == size(m(randn(Float32, 28, 28, 3, 42)))

Alternatively, using the macro, the definition of make_model could end with:

  # compute the output dimensions & construct appropriate Dense layer:
  return @autosize (width, height, inchannels, 1) Chain(conv_layers..., Flux.flatten, Dense(_ => nclasses))
end

Listing

Flux.@autosizeMacro
@autosize (size...,) Chain(Layer(_ => 2), Layer(_), ...)

Returns the specified model, with each _ replaced by an inferred number, for input of the given size.

The unknown sizes are usually the second-last dimension of that layer's input, which Flux regards as the channel dimension. (A few layers, Dense & LayerNorm, instead always use the first dimension.) The underscore may appear as an argument of a layer, or inside a =>. It may be used in further calculations, such as Dense(_ => _÷4).

Examples

julia> @autosize (3, 1) Chain(Dense(_ => 2, sigmoid), BatchNorm(_, affine=false))
Chain(
  Dense(3 => 2, σ),                     # 8 parameters
  BatchNorm(2, affine=false),
) 

julia> img = [28, 28];

julia> @autosize (img..., 1, 32) Chain(              # size is only needed at runtime
          Chain(c = Conv((3,3), _ => 5; stride=2, pad=SamePad()),
                p = MeanPool((3,3)),
                b = BatchNorm(_),
                f = Flux.flatten),
          Dense(_ => _÷4, relu, init=Flux.rand32),   # can calculate output size _÷4
          SkipConnection(Dense(_ => _, relu), +),
          Dense(_ => 10),
       )
Chain(
  Chain(
    c = Conv((3, 3), 1 => 5, pad=1, stride=2),  # 50 parameters
    p = MeanPool((3, 3)),
    b = BatchNorm(5),                   # 10 parameters, plus 10
    f = Flux.flatten,
  ),
  Dense(80 => 20, relu),                # 1_620 parameters
  SkipConnection(
    Dense(20 => 20, relu),              # 420 parameters
    +,
  ),
  Dense(20 => 10),                      # 210 parameters
)         # Total: 10 trainable arrays, 2_310 parameters,
          # plus 2 non-trainable, 10 parameters, summarysize 10.469 KiB.

julia> outputsize(ans, (28, 28, 1, 32))
(10, 32)

Limitations:

  • While @autosize (5, 32) Flux.Bilinear(_ => 7) is OK, something like Bilinear((_, _) => 7) will fail.
  • While Scale(_) and LayerNorm(_) are fine (and use the first dimension), Scale(_,_) and LayerNorm(_,_) will fail if size(x,1) != size(x,2).
  • RNNs won't work: @autosize (7, 11) LSTM(_ => 5) fails, because outputsize(RNN(3=>7), (3,)) also fails, a known issue.
source
Flux.outputsizeFunction
outputsize(m, x_size, y_size, ...; padbatch=false)

For model or layer m accepting multiple arrays as input, this returns size(m((x, y, ...))) given size_x = size(x), etc.

Examples

julia> x, y = rand(Float32, 5, 64), rand(Float32, 7, 64);

julia> par = Parallel(vcat, Dense(5 => 9), Dense(7 => 11));

julia> Flux.outputsize(par, (5, 64), (7, 64))
(20, 64)

julia> m = Chain(par, Dense(20 => 13), softmax);

julia> Flux.outputsize(m, (5,), (7,); padbatch=true)
(13, 1)

julia> par(x, y) == par((x, y)) == Chain(par, identity)((x, y))
true

Notice that Chain only accepts multiple arrays as a tuple, while Parallel also accepts them as multiple arguments; outputsize always supplies the tuple.

source