ResNet-like models

This is the API reference for the ResNet inspired model structures present in Metalhead.jl.

The higher-level model constructors

Metalhead.ResNet — Type

ResNet(depth::Integer; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a ResNet model with the specified depth. (reference)

Arguments

depth: one of [18, 34, 50, 101, 152]. The depth of the ResNet model.
pretrain: set to true to load the model with pre-trained weights for ImageNet
inchannels: The number of input channels.
nclasses: the number of output classes

Advanced users who want more configuration options will be better served by using resnet.

source

Metalhead.WideResNet — Type

WideResNet(depth::Integer; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a Wide ResNet model with the specified depth. The model is the same as ResNet except for the bottleneck number of channels which is twice larger in every block. The number of channels in outer 1x1 convolutions is the same. (reference)

Arguments

depth: one of [18, 34, 50, 101, 152]. The depth of the Wide ResNet model.
pretrain: set to true to load the model with pre-trained weights for ImageNet
inchannels: The number of input channels.
nclasses: The number of output classes

Advanced users who want more configuration options will be better served by using resnet.

source

Metalhead.ResNeXt — Type

ResNeXt(depth::Integer; pretrain::Bool = false, cardinality::Integer = 32,
        base_width::Integer = 4, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a ResNeXt model with the specified depth, cardinality, and base width. (reference)

Arguments

depth: one of [50, 101, 152]. The depth of the ResNeXt model.
pretrain: set to true to load the model with pre-trained weights for ImageNet. Supported configurations are:
- depth 50, cardinality of 32 and base width of 4.
- depth 101, cardinality of 32 and base width of 8.
- depth 101, cardinality of 64 and base width of 4.
cardinality: the number of groups to be used in the 3x3 convolution in each block.
base_width: the number of feature maps in each group.
inchannels: the number of input channels.
nclasses: the number of output classes

Advanced users who want more configuration options will be better served by using resnet.

source

Metalhead.SEResNet — Type

SEResNet(depth::Integer; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a SEResNet model with the specified depth. (reference)

Arguments

depth: one of [18, 34, 50, 101, 152]. The depth of the SEResNet model.
pretrain: set to true to load the model with pre-trained weights for ImageNet
inchannels: the number of input channels.
nclasses: the number of output classes

Warning

SEResNet does not currently support pretrained weights.

Advanced users who want more configuration options will be better served by using resnet.

source

Metalhead.SEResNeXt — Type

SEResNeXt(depth::Integer; pretrain::Bool = false, cardinality::Integer = 32,
          base_width::Integer = 4, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a SEResNeXt model with the specified depth, cardinality, and base width. (reference)

Arguments

depth: one of [50, 101, 152]. The depth of the SEResNeXt model.
pretrain: set to true to load the model with pre-trained weights for ImageNet
cardinality: the number of groups to be used in the 3x3 convolution in each block.
base_width: the number of feature maps in each group.
inchannels: the number of input channels
nclasses: the number of output classes

Warning

SEResNeXt does not currently support pretrained weights.

Advanced users who want more configuration options will be better served by using resnet.

source

Metalhead.Res2Net — Type

Res2Net(depth::Integer; pretrain::Bool = false, scale::Integer = 4,
        base_width::Integer = 26, inchannels::Integer = 3,
        nclasses::Integer = 1000)

Creates a Res2Net model with the specified depth, scale, and base width. (reference)

Arguments

depth: one of [50, 101, 152]. The depth of the Res2Net model.
pretrain: set to true to load the model with pre-trained weights for ImageNet
scale: the number of feature groups in the block. See the paper for more details.
base_width: the number of feature maps in each group.
inchannels: the number of input channels.
nclasses: the number of output classes

Warning

Res2Net does not currently support pretrained weights.

Advanced users who want more configuration options will be better served by using resnet.

source

Metalhead.Res2NeXt — Type

Res2NeXt(depth::Integer; pretrain::Bool = false, scale::Integer = 4,
         base_width::Integer = 4, cardinality::Integer = 8,
         inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a Res2NeXt model with the specified depth, scale, base width and cardinality. (reference)

Arguments

depth: one of [50, 101, 152]. The depth of the Res2Net model.
pretrain: set to true to load the model with pre-trained weights for ImageNet
scale: the number of feature groups in the block. See the paper for more details.
base_width: the number of feature maps in each group.
cardinality: the number of groups in the 3x3 convolutions.
inchannels: the number of input channels.
nclasses: the number of output classes

Warning

Res2NeXt does not currently support pretrained weights.

Advanced users who want more configuration options will be better served by using resnet.

source

The mid-level function

Metalhead.resnet — Function

resnet(block_type, block_repeats::AbstractVector{<:Integer},
       downsample_opt::NTuple{2, Any} = (downsample_conv, downsample_identity);
       cardinality::Integer = 1, base_width::Integer = 64,
       inplanes::Integer = 64, reduction_factor::Integer = 1,
       connection = addact, activation = relu,
       norm_layer = BatchNorm, revnorm::Bool = false,
       attn_fn = planes -> identity, pool_layer = AdaptiveMeanPool((1, 1)),
       use_conv::Bool = false, dropblock_prob = nothing,
       stochastic_depth_prob = nothing, dropout_prob = nothing,
       imsize::Dims{2} = (256, 256), inchannels::Integer = 3,
       nclasses::Integer = 1000, kwargs...)

Creates a generic ResNet-like model that is used to create The higher-level model constructors like ResNet, Wide ResNet, ResNeXt and Res2Net. For an even more generic model API, see Metalhead.build_resnet.

Arguments

block_type: The type of block to be used in the model. This can be one of Metalhead.basicblock, Metalhead.bottleneck and Metalhead.bottle2neck. basicblock is used in the original ResNet paper for ResNet-18 and ResNet-34, and bottleneck is used in the original ResNet-50 and ResNet-101 models, as well as for the Wide ResNet and ResNeXt models. bottle2neck is introduced in the Res2Net paper.
block_repeats: A Vector of integers specifying the number of times each block is repeated in each stage of the ResNet model. For example, [3, 4, 6, 3] is the configuration used in ResNet-50, which has 3 blocks in the first stage, 4 blocks in the second stage, 6 blocks in the third stage and 3 blocks in the fourth stage.
downsample_opt: A NTuple of two callbacks that are used to determine the downsampling operation to be used in the model. The first callback is used to determine the convolutional operation to be used in the downsampling operation and the second callback is used to determine the identity operation to be used in the downsampling operation.
cardinality: The number of groups to be used in the 3x3 convolutional layer in the bottleneck block. This is usually modified from the default value of 1 in the ResNet models to 32 or 64 in the ResNeXt models.
base_width: The base width of the convolutional layer in the blocks of the model.
inplanes: The number of input channels in the first convolutional layer.
reduction_factor: The reduction factor used in the model.
connection: This is a function that determines the residual connection in the model. For resnets, either of Metalhead.Layers.addact or Metalhead.Layers.actadd is recommended. These decide whether the residual connection is added before or after the activation function.
norm_layer: The normalisation layer to be used in the model.
revnorm: set to true to place the normalisation layers before the convolutions
attn_fn: A callback that is used to determine the attention function to be used in the model. See Metalhead.Layers.squeeze_excite for an example.
pool_layer: A fully-instantiated pooling layer passed in to be used by the classifier head. For example, AdaptiveMeanPool((1, 1)) is used in the ResNet family by default, but something like MeanPool((3, 3)) should also work provided the dimensions after applying the pooling layer are compatible with the rest of the classifier head.
use_conv: Set to true to use convolutions instead of identity operations in the model.
dropblock_prob: DropBlock probability to be used in the model. Set to nothing to disable DropBlock. See Metalhead.DropBlock for more details.
stochastic_depth_prob: StochasticDepth probability to be used in the model. Set to nothing to disable StochasticDepth. See Metalhead.StochasticDepth for more details.
dropout_prob: Dropout probability to be used in the classifier head. Set to nothing to disable Dropout.
imsize: The size of the input (height, width).
inchannels: The number of input channels.
nclasses: The number of output classes.
kwargs: Additional keyword arguments to be passed to the block builder (note: ignore this argument if you are not sure what it does. To know more about how this works, check out the section of the documentation that talks about builders in Metalhead and specifically for the ResNet block functions).

source

Lower-level functions and builders

Block functions

Metalhead.basicblock — Function

basicblock(inplanes::Integer, planes::Integer; stride::Integer = 1,
           reduction_factor::Integer = 1, activation = relu,
           norm_layer = BatchNorm, revnorm::Bool = false,
           drop_block = identity, drop_path = identity,
           attn_fn = planes -> identity)

Creates a basic residual block (see reference). This function creates the layers. For more configuration options and to see the function used to build the block for the model, see Metalhead.basicblock_builder.

Arguments

inplanes: number of input feature maps
planes: number of feature maps for the block
stride: the stride of the block
reduction_factor: the factor by which the input feature maps are reduced before

the first convolution.

activation: the activation function to use.
norm_layer: the normalization layer to use.
revnorm: set to true to place the normalisation layer before the convolution
drop_block: the drop block layer
drop_path: the drop path layer
attn_fn: the attention function to use. See squeeze_excite for an example.

source

Metalhead.bottleneck — Function

bottleneck(inplanes::Integer, planes::Integer; stride::Integer,
           cardinality::Integer = 1, base_width::Integer = 64,
           reduction_factor::Integer = 1, activation = relu,
           norm_layer = BatchNorm, revnorm::Bool = false,
           drop_block = identity, drop_path = identity,
           attn_fn = planes -> identity)

Creates a bottleneck residual block (see reference). This function creates the layers. For more configuration options and to see the function used to build the block for the model, see Metalhead.bottleneck_builder.

Arguments

inplanes: number of input feature maps
planes: number of feature maps for the block
stride: the stride of the block
cardinality: the number of groups in the convolution.
base_width: the number of output feature maps for each convolutional group.
reduction_factor: the factor by which the input feature maps are reduced before the first convolution.
activation: the activation function to use.
norm_layer: the normalization layer to use.
revnorm: set to true to place the normalisation layer before the convolution
drop_block: the drop block layer
drop_path: the drop path layer
attn_fn: the attention function to use. See squeeze_excite for an example.

source

Metalhead.bottle2neck — Function

bottle2neck(inplanes::Integer, planes::Integer; stride::Integer = 1,
            cardinality::Integer = 1, base_width::Integer = 26,
            scale::Integer = 4, activation = relu, norm_layer = BatchNorm,
            revnorm::Bool = false, attn_fn = planes -> identity)

Creates a bottleneck block as described in the Res2Net paper. (reference) This function creates the layers. For more configuration options and to see the function used to build the block for the model, see Metalhead.bottle2neck_builder.

Arguments

inplanes: number of input feature maps
planes: number of feature maps for the block
stride: the stride of the block
cardinality: the number of groups in the 3x3 convolutions.
base_width: the number of output feature maps for each convolutional group.
scale: the number of feature groups in the block. See the paper for more details.
activation: the activation function to use.
norm_layer: the normalization layer to use.
revnorm: set to true to place the batch norm before the convolution
attn_fn: the attention function to use. See squeeze_excite for an example.

source

Downsampling functions

Metalhead.downsample_identity — Function

downsample_identity(inplanes::Integer, outplanes::Integer; kwargs...)

Creates an identity downsample layer. This returns identity if inplanes == outplanes. If outplanes > inplanes, it maps the input to outplanes channels using a 1x1 max pooling layer and zero padding.

Warning

This does not currently support the scenario where inplanes > outplanes.

Arguments

inplanes: number of input feature maps
outplanes: number of output feature maps

Note that kwargs are ignored and only included for compatibility with other downsample layers.

source

Metalhead.downsample_conv — Function

downsample_conv(inplanes::Integer, outplanes::Integer; stride::Integer = 1,
                norm_layer = BatchNorm, revnorm::Bool = false)

Creates a 1x1 convolutional downsample layer as used in ResNet.

Arguments

inplanes: number of input feature maps
outplanes: number of output feature maps
stride: the stride of the convolution
norm_layer: the normalization layer to use.
revnorm: set to true to place the normalisation layer before the convolution

source

Metalhead.downsample_pool — Function

downsample_pool(inplanes::Integer, outplanes::Integer; stride::Integer = 1,
                norm_layer = BatchNorm, revnorm::Bool = false)

Creates a pooling-based downsample layer as described in the Bag of Tricks paper. This adds an average pooling layer of size (2, 2) with stride followed by a 1x1 convolution.

Arguments

inplanes: number of input feature maps
outplanes: number of output feature maps
stride: the stride of the convolution
norm_layer: the normalization layer to use.
revnorm: set to true to place the normalisation layer before the convolution

source

Block builders

Metalhead.basicblock_builder — Function

basicblock_builder(block_repeats::AbstractVector{<:Integer};
                   inplanes::Integer = 64, reduction_factor::Integer = 1,
                   expansion::Integer = 1, norm_layer = BatchNorm,
                   revnorm::Bool = false, activation = relu,
                   attn_fn = planes -> identity,
                   dropblock_prob = nothing, stochastic_depth_prob = nothing,
                   stride_fn = resnet_stride, planes_fn = resnet_planes,
                   downsample_tuple = (downsample_conv, downsample_identity))

Builder for creating a basic block for a ResNet model. (reference)

Arguments

block_repeats: number of repeats of a block in each stage
inplanes: number of input channels
reduction_factor: reduction factor for the number of channels in each stage
expansion: expansion factor for the number of channels for the block
norm_layer: normalization layer to use
revnorm: set to true to place normalization layer before the convolution
activation: activation function to use
attn_fn: attention function to use
dropblock_prob: dropblock probability. Set to nothing to disable DropBlock
stochastic_depth_prob: stochastic depth probability. Set to nothing to disable StochasticDepth
stride_fn: callback for computing the stride of the block
planes_fn: callback for computing the number of channels in each block
downsample_tuple: two-element tuple of downsample functions to use. The first one is used when the number of channels changes in the block, the second one is used when the number of channels stays the same.

source

Metalhead.bottleneck_builder — Function

bottleneck_builder(block_repeats::AbstractVector{<:Integer};
                   inplanes::Integer = 64, cardinality::Integer = 1,
                   base_width::Integer = 64, reduction_factor::Integer = 1,
                   expansion::Integer = 4, norm_layer = BatchNorm,
                   revnorm::Bool = false, activation = relu,
                   attn_fn = planes -> identity, dropblock_prob = nothing,
                   stochastic_depth_prob = nothing, stride_fn = resnet_stride,
                   planes_fn = resnet_planes,
                   downsample_tuple = (downsample_conv, downsample_identity))

Builder for creating a bottleneck block for a ResNet/ResNeXt model. (reference)

Arguments

block_repeats: number of repeats of a block in each stage
inplanes: number of input channels
cardinality: number of groups for the convolutional layer
base_width: base width for the convolutional layer
reduction_factor: reduction factor for the number of channels in each stage
expansion: expansion factor for the number of channels for the block
norm_layer: normalization layer to use
revnorm: set to true to place normalization layer before the convolution
activation: activation function to use
attn_fn: attention function to use
dropblock_prob: dropblock probability. Set to nothing to disable DropBlock
stochastic_depth_prob: stochastic depth probability. Set to nothing to disable StochasticDepth
stride_fn: callback for computing the stride of the block
planes_fn: callback for computing the number of channels in each block
downsample_tuple: two-element tuple of downsample functions to use. The first one is used when the number of channels changes in the block, the second one is used when the number of channels stays the same.

source

Metalhead.bottle2neck_builder — Function

bottle2neck_builder(block_repeats::AbstractVector{<:Integer};
                    inplanes::Integer = 64, cardinality::Integer = 1,
                    base_width::Integer = 26, scale::Integer = 4,
                    expansion::Integer = 4, norm_layer = BatchNorm,
                    revnorm::Bool = false, activation = relu,
                    attn_fn = planes -> identity, stride_fn = resnet_stride,
                    planes_fn = resnet_planes,
                    downsample_tuple = (downsample_conv, downsample_identity))

Builder for creating a bottle2neck block for a Res2Net model. (reference)

Arguments

block_repeats: number of repeats of a block in each stage
inplanes: number of input channels
cardinality: number of groups for the convolutional layer
base_width: base width for the convolutional layer
scale: scale for the number of channels in each block
expansion: expansion factor for the number of channels for the block
norm_layer: normalization layer to use
revnorm: set to true to place normalization layer before the convolution
activation: activation function to use
attn_fn: attention function to use
stride_fn: callback for computing the stride of the block
planes_fn: callback for computing the number of channels in each block
downsample_tuple: two-element tuple of downsample functions to use. The first one is used when the number of channels changes in the block, the second one is used when the number of channels stays the same.

source

Generic ResNet model builder

Metalhead.build_resnet — Function

build_resnet(img_dims, stem, get_layers, block_repeats::AbstractVector{<:Integer},
             connection, classifier_fn)

Creates a generic ResNet-like model.

Info

This is a very generic, flexible but low level function that can be used to create any of the ResNet variants. For a more user friendly function, see Metalhead.resnet.

Arguments

img_dims: The dimensions of the input image. This is used to determine the number of feature maps to be passed to the classifier. This should be a tuple of the form (height, width, channels).
stem: The stem of the ResNet model. The stem should be created outside of this function and passed in as an argument. This is done to allow for more flexibility in creating the stem. resnet_stem is a helper function that Metalhead provides which is recommended for creating the stem.
get_layers is a function that takes in two inputs - the stage_idx, or the index of the stage, and the block_idx, or the index of the block within the stage. It returns a tuple of layers. If the tuple returned by get_layers has more than one element, then connection is used to splat this tuple into Parallel - if not, then the only element of the tuple is directly inserted into the network. get_layers is a very specific function and should not be created on its own. Instead, use one of the builders provided by Metalhead to create it.
block_repeats: This is a Vector of integers that specifies the number of repeats of each block in each stage.
connection: This is a function that determines the residual connection in the model. For resnets, either of Metalhead.Layers.addact or Metalhead.Layers.actadd is recommended.
classifier_fn: This is a function that takes in the number of feature maps and returns a classifier. This is usually built as a closure using a function like Metalhead.create_classifier. For example, if the number of output classes is nclasses, then the function can be defined as channels -> create_classifier(channels, nclasses).

source

Utility callbacks

Metalhead.resnet_planes — Function

resnet_planes(block_repeats::AbstractVector{<:Integer})

Default callback for determining the number of channels in each block in a ResNet model.

Arguments

block_repeats: A Vector of integers specifying the number of times each block is repeated in each stage of the ResNet model. For example, [3, 4, 6, 3] is the configuration used in ResNet-50, which has 3 blocks in the first stage, 4 blocks in the second stage, 6 blocks in the third stage and 3 blocks in the fourth stage.

source

Metalhead.resnet_stride — Function

resnet_stride(stage_idx::Integer, block_idx::Integer)

Default callback for determining the stride of a block in a ResNet model. Returns 2 for the first block in every stage except the first stage and 1 for all other blocks.

Arguments

stage_idx: The index of the stage in the ResNet model.
block_idx: The index of the block in the stage.

source

Metalhead.resnet_stem — Function

resnet_stem(; stem_type = :default, inchannels::Integer = 3, replace_stem_pool = false,
              norm_layer = BatchNorm, activation = relu)

Builds a stem to be used in a ResNet model. See the stem argument of resnet for details on how to use this function.

Arguments

stem_type: The type of stem to be built. One of [:default, :deep, :deep_tiered].
- :default: Builds a stem based on the default ResNet stem, which consists of a single 7x7 convolution with stride 2 and a normalisation layer followed by a 3x3 max pooling layer with stride 2.
- :deep: This borrows ideas from other papers (InceptionResNetv2, for example) in using a deeper stem with 3 successive 3x3 convolutions having normalisation layers after each one. This is followed by a 3x3 max pooling layer with stride 2.
- :deep_tiered: A variant of the :deep stem that has a larger width in the second convolution. This is an experimental variant from the timm library in Python that shows peformance improvements over the :deep stem in some cases.
inchannels: number of input channels
replace_pool: Set to true to replace the max pooling layers with a 3x3 convolution + normalization with a stride of two.
norm_layer: The normalisation layer used in the stem.
activation: The activation function used in the stem.

source