Shape Inference
Flux has some tools to help generate models in an automated fashion, by inferring the size of arrays that layers will recieve, without doing any computation. This is especially useful for convolutional models, where the same Conv
layer accepts any size of image, but the next layer may not.
The higher-level tool is a macro @autosize
which acts on the code defining the layers, and replaces each appearance of _
with the relevant size. This simple example returns a model with Dense(845 => 10)
as the last layer:
@autosize (28, 28, 1, 32) Chain(Conv((3, 3), _ => 5, relu, stride=2), Flux.flatten, Dense(_ => 10))
The input size may be provided at runtime, like @autosize (sz..., 1, 32) Chain(Conv(
..., but all the layer constructors containing _
must be explicitly written out – the macro sees the code as written.
This macro relies on a lower-level function outputsize
, which you can also use directly:
c = Conv((3, 3), 1 => 5, relu, stride=2)
Flux.outputsize(c, (28, 28, 1, 32)) # returns (13, 13, 5, 32)
The function outputsize
works by passing a "dummy" array into the model, which propagates through very cheaply. It should work for all layers, including custom layers, out of the box.
An example of how to automate model building is this:
"""
make_model(width, height, [inchannels, nclasses; layer_config])
Create a CNN for a given set of configuration parameters. Arguments:
- `width`, `height`: the input image size in pixels
- `inchannels`: the number of channels in the input image, default `1`
- `nclasses`: the number of output classes, default `10`
- Keyword `layer_config`: a vector of the number of channels per layer, default `[16, 16, 32, 64]`
"""
function make_model(width, height, inchannels = 1, nclasses = 10;
layer_config = [16, 16, 32, 64])
# construct a vector of layers:
conv_layers = []
push!(conv_layers, Conv((5, 5), inchannels => layer_config[1], relu, pad=SamePad()))
for (inch, outch) in zip(layer_config, layer_config[2:end])
push!(conv_layers, Conv((3, 3), inch => outch, sigmoid, stride=2))
end
# compute the output dimensions after these conv layers:
conv_outsize = Flux.outputsize(conv_layers, (width, height, inchannels); padbatch=true)
# use this to define appropriate Dense layer:
last_layer = Dense(prod(conv_outsize) => nclasses)
return Chain(conv_layers..., Flux.flatten, last_layer)
end
m = make_model(28, 28, 3, layer_config = [9, 17, 33, 65])
Flux.outputsize(m, (28, 28, 3, 42)) == (10, 42) == size(m(randn(Float32, 28, 28, 3, 42)))
Alternatively, using the macro, the definition of make_model
could end with:
# compute the output dimensions & construct appropriate Dense layer:
return @autosize (width, height, inchannels, 1) Chain(conv_layers..., Flux.flatten, Dense(_ => nclasses))
end
Listing
Flux.@autosize
— Macro@autosize (size...,) Chain(Layer(_ => 2), Layer(_), ...)
Returns the specified model, with each _
replaced by an inferred number, for input of the given size
.
The unknown sizes are usually the second-last dimension of that layer's input, which Flux regards as the channel dimension. (A few layers, Dense
& LayerNorm
, instead always use the first dimension.) The underscore may appear as an argument of a layer, or inside a =>
. It may be used in further calculations, such as Dense(_ => _÷4)
.
Examples
julia> @autosize (3, 1) Chain(Dense(_ => 2, sigmoid), BatchNorm(_, affine=false))
Chain(
Dense(3 => 2, σ), # 8 parameters
BatchNorm(2, affine=false),
)
julia> img = [28, 28];
julia> @autosize (img..., 1, 32) Chain( # size is only needed at runtime
Chain(c = Conv((3,3), _ => 5; stride=2, pad=SamePad()),
p = MeanPool((3,3)),
b = BatchNorm(_),
f = Flux.flatten),
Dense(_ => _÷4, relu, init=Flux.rand32), # can calculate output size _÷4
SkipConnection(Dense(_ => _, relu), +),
Dense(_ => 10),
)
Chain(
Chain(
c = Conv((3, 3), 1 => 5, pad=1, stride=2), # 50 parameters
p = MeanPool((3, 3)),
b = BatchNorm(5), # 10 parameters, plus 10
f = Flux.flatten,
),
Dense(80 => 20, relu), # 1_620 parameters
SkipConnection(
Dense(20 => 20, relu), # 420 parameters
+,
),
Dense(20 => 10), # 210 parameters
) # Total: 10 trainable arrays, 2_310 parameters,
# plus 2 non-trainable, 10 parameters, summarysize 10.469 KiB.
julia> outputsize(ans, (28, 28, 1, 32))
(10, 32)
Limitations:
- While
@autosize (5, 32) Flux.Bilinear(_ => 7)
is OK, something likeBilinear((_, _) => 7)
will fail. - While
Scale(_)
andLayerNorm(_)
are fine (and use the first dimension),Scale(_,_)
andLayerNorm(_,_)
will fail ifsize(x,1) != size(x,2)
.
Flux.outputsize
— Functionoutputsize(m, x_size, y_size, ...; padbatch=false)
For model or layer m
accepting multiple arrays as input, this returns size(m((x, y, ...)))
given size_x = size(x)
, etc.
Examples
julia> x, y = rand(Float32, 5, 64), rand(Float32, 7, 64);
julia> par = Parallel(vcat, Dense(5 => 9), Dense(7 => 11));
julia> Flux.outputsize(par, (5, 64), (7, 64))
(20, 64)
julia> m = Chain(par, Dense(20 => 13), softmax);
julia> Flux.outputsize(m, (5,), (7,); padbatch=true)
(13, 1)
julia> par(x, y) == par((x, y)) == Chain(par, identity)((x, y))
true
Notice that Chain
only accepts multiple arrays as a tuple, while Parallel
also accepts them as multiple arguments; outputsize
always supplies the tuple.