More advanced layers
This page contains the API reference for some more advanced layers present in the Layers
module. These layers are used in Metalhead.jl to build more complex models, and can also be used by the user to build custom models. For a more basic introduction to the Layers
module, please refer to the introduction guide for the Layers
module.
Squeeze-and-excitation blocks
These are used in models like SE-ResNet and SE-ResNeXt, as well as in the design of inverted residual blocks used in the MobileNet and EfficientNet family of models.
Metalhead.Layers.squeeze_excite
— Functionsqueeze_excite(inplanes::Integer; reduction::Real = 16, round_fn = _round_channels,
norm_layer = identity, activation = relu, gate_activation = sigmoid)
Creates a squeeze-and-excitation layer used in MobileNets, EfficientNets and SE-ResNets.
Arguments
inplanes
: The number of input feature mapsreduction
: The reduction factor for the number of hidden feature maps in the squeeze and excite layer. The number of hidden feature maps is calculated asround_fn(inplanes / reduction)
.round_fn
: The function to round the number of reduced feature maps.activation
: The activation function for the first convolution layergate_activation
: The activation function for the gate layernorm_layer
: The normalization layer to be used after the convolution layersrd_planes
: The number of hidden feature maps in a squeeze and excite layer
Metalhead.Layers.effective_squeeze_excite
— Functioneffective_squeeze_excite(inplanes, gate_activation = sigmoid)
Effective squeeze-and-excitation layer. (reference: CenterMask : Real-Time Anchor-Free Instance Segmentation)
Arguments
inplanes
: The number of input feature mapsgate_activation
: The activation function for the gate layer
Inverted residual blocks
These blocks are designed to be used in the MobileNet and EfficientNet family of convolutional neural networks.
Metalhead.Layers.dwsep_conv_norm
— Functiondwsep_conv_norm(kernel_size::Dims{2}, inplanes::Integer, outplanes::Integer,
activation = relu; norm_layer = BatchNorm, stride::Integer = 1,
bias::Bool = !(norm_layer !== identity), pad::Integer = 0, [bias, weight, init])
Create a depthwise separable convolution chain as used in MobileNetv1. This is sequence of layers:
- a
kernel_size
depthwise convolution frominplanes => inplanes
- a (batch) normalisation layer +
activation
(ifnorm_layer !== identity
; otherwiseactivation
is applied to the convolution output) - a
kernel_size
convolution frominplanes => outplanes
- a (batch) normalisation layer +
activation
(ifnorm_layer !== identity
; otherwiseactivation
is applied to the convolution output)
See Fig. 3 in reference.
Arguments
kernel_size
: size of the convolution kernel (tuple)inplanes
: number of input feature mapsoutplanes
: number of output feature mapsactivation
: the activation function for the final layernorm_layer
: the normalisation layer used. Note that usingidentity
as the normalisation layer will result in no normalisation being applied.bias
: whether to use bias in the convolution layers.stride
: stride of the first convolution kernelpad
: padding of the first convolution kernelweight
,init
: initialization for the convolution kernel (seeFlux.Conv
)
Metalhead.Layers.mbconv
— Functionmbconv(kernel_size::Dims{2}, inplanes::Integer, explanes::Integer,
outplanes::Integer, activation = relu; stride::Integer,
reduction::Union{Nothing, Real} = nothing,
se_round_fn = x -> round(Int, x), norm_layer = BatchNorm, kwargs...)
Create a basic inverted residual block for MobileNet and Efficient variants. This is a sequence of layers:
a 1x1 convolution from
inplanes => explanes
followed by a (batch) normalisation layeractivation
ifinplanes != explanes
a
kernel_size
depthwise separable convolution fromexplanes => explanes
a (batch) normalisation layer
a squeeze-and-excitation block (if
reduction != nothing
) fromexplanes => se_round_fn(explanes / reduction)
and back toexplanes
a 1x1 convolution from
explanes => outplanes
a (batch) normalisation layer +
activation
This function does not handle the residual connection by default. The user must add this manually to use this block as a standalone. To construct a model, check out the builders, which handle the residual connection and other details.
First introduced in the MobileNetv2 paper. (See Fig. 3 in reference.)
Arguments
kernel_size
: kernel size of the convolutional layersinplanes
: number of input feature mapsexplanes
: The number of expanded feature maps. This is the number of feature maps after the first 1x1 convolution.outplanes
: The number of output feature mapsactivation
: The activation function for the first two convolution layerstride
: The stride of the convolutional kernel, has to be either 1 or 2reduction
: The reduction factor for the number of hidden feature maps in a squeeze and excite layer (seesqueeze_excite
)se_round_fn
: The function to round the number of reduced feature maps in the squeeze and excite layernorm_layer
: The normalization layer to use
Metalhead.Layers.fused_mbconv
— Functionfused_mbconv(kernel_size::Dims{2}, inplanes::Integer, explanes::Integer,
outplanes::Integer, activation = relu;
stride::Integer, norm_layer = BatchNorm)
Create a fused inverted residual block.
This is a sequence of layers:
- a
kernel_size
depthwise separable convolution fromexplanes => explanes
- a (batch) normalisation layer
- a 1x1 convolution from
explanes => outplanes
followed by a (batch) normalisation layer +activation
ifinplanes != explanes
This function does not handle the residual connection by default. The user must add this manually to use this block as a standalone. To construct a model, check out the builders, which handle the residual connection and other details.
Originally introduced by Google in EfficientNet-EdgeTPU: Creating Accelerator-Optimized Neural Networks with AutoML. Later used in the EfficientNetv2 paper.
Arguments
kernel_size
: kernel size of the convolutional layersinplanes
: number of input feature mapsexplanes
: The number of expanded feature mapsoutplanes
: The number of output feature mapsactivation
: The activation function for the first two convolution layerstride
: The stride of the convolutional kernel, has to be either 1 or 2norm_layer
: The normalization layer to use
Vision transformer-related layers
The Layers
module contains specific layers that are used to build vision transformer (ViT)-inspired models:
Metalhead.Layers.MultiHeadSelfAttention
— TypeMultiHeadSelfAttention(planes::Integer, nheads::Integer = 8; qkv_bias::Bool = false,
attn_dropout_prob = 0., proj_dropout_prob = 0.)
Multi-head self-attention layer.
Arguments
planes
: number of input channelsnheads
: number of headsqkv_bias
: whether to use bias in the layer to get the query, key and valueattn_dropout_prob
: dropout probability after the self-attention layerproj_dropout_prob
: dropout probability after the projection layer
Metalhead.Layers.ClassTokens
— TypeClassTokens(planes::Integer; init = Flux.zeros32)
Appends class tokens to an input with embedding dimension planes
for use in many vision transformer models.
Metalhead.Layers.ViPosEmbedding
— TypeViPosEmbedding(embedsize::Integer, npatches::Integer;
init = (dims::Dims{2}) -> rand(Float32, dims))
Positional embedding layer used by many vision transformer-like models.
Metalhead.Layers.PatchEmbedding
— FunctionPatchEmbedding(imsize::Dims{2} = (224, 224); inchannels::Integer = 3,
patch_size::Dims{2} = (16, 16), embedplanes = 768,
norm_layer = planes -> identity, flatten = true)
Patch embedding layer used by many vision transformer-like models to split the input image into patches.
Arguments
imsize
: the size of the input imageinchannels
: number of input channelspatch_size
: the size of the patchesembedplanes
: the number of channels in the embeddingnorm_layer
: the normalization layer - by default the identity function but otherwise takes a single argument constructor for a normalization layer like LayerNorm or BatchNormflatten
: set true to flatten the input spatial dimensions after the embedding
MLPMixer-related blocks
Apart from this, the Layers
module also contains certain blocks used in MLPMixer-style models:
Metalhead.Layers.gated_mlp_block
— Functiongated_mlp(gate_layer, inplanes::Integer, hidden_planes::Integer,
outplanes::Integer = inplanes; dropout_prob = 0.0, activation = gelu)
Feedforward block based on the implementation in the paper "Pay Attention to MLPs". (reference)
Arguments
gate_layer
: Layer to use for the gating.inplanes
: Number of dimensions in the input.hidden_planes
: Number of dimensions in the intermediate layer.outplanes
: Number of dimensions in the output - by default it is the same asinplanes
.dropout_prob
: Dropout probability.activation
: Activation function to use.
Metalhead.Layers.mlp_block
— Functionmlp_block(inplanes::Integer, hidden_planes::Integer, outplanes::Integer = inplanes;
dropout_prob = 0., activation = gelu)
Feedforward block used in many MLPMixer-like and vision-transformer models.
Arguments
inplanes
: Number of dimensions in the input.hidden_planes
: Number of dimensions in the intermediate layer.outplanes
: Number of dimensions in the output - by default it is the same asinplanes
.dropout_prob
: Dropout probability.activation
: Activation function to use.
Miscellaneous utilities for layers
These are some miscellaneous utilities present in the Layers
module, and are used with other custom/inbuilt layers to make certain common operations in neural networks easier.
Metalhead.Layers.inputscale
— Functioninputscale(λ; activation = identity)
Scale the input by a scalar λ
and applies an activation function to it. Equivalent to activation.(λ .* x)
.
Metalhead.Layers.actadd
— Functionactadd(activation = relu, xs...)
Convenience function for summing up the input arrays after applying an activation function to them. Useful as the connection
argument for the block function in Metalhead.resnet
.
Metalhead.Layers.addact
— Functionaddact(activation = relu, xs...)
Convenience function for applying an activation function to the output after summing up the input arrays. Useful as the connection
argument for the block function in Metalhead.resnet
.
Metalhead.Layers.cat_channels
— Functioncat_channels(x, y, zs...)
Concatenate x
and y
(and any z
s) along the channel dimension (third dimension). Equivalent to cat(x, y, zs...; dims=3)
. Convenient reduction operator for use with Parallel
.
Metalhead.Layers.flatten_chains
— Functionflatten_chains(m::Chain)
flatten_chains(m)
Convenience function for traversing nested layers of a Chain object and flatten them into a single iterator.
Metalhead.Layers.swapdims
— Functionswapdims(perm)
Convenience function that returns a closure which permutes the dimensions of an array. perm
is a vector or tuple specifying a permutation of the input dimensions. Equivalent to permutedims(x, perm)
.