public
ViT
— struct
ViT(mode::Symbol = base; imsize::Dims{2} = (256, 256), inchannels = 3,
patch_size::Dims{2} = (16, 16), pool = :class, nclasses = 1000)
Creates a Vision Transformer (ViT) model. (reference).
Arguments
mode
: the model configuration, one of[:tiny, :small, :base, :large, :huge, :giant, :gigantic]
imsize
: image sizeinchannels
: number of input channelspatch_size
: size of the patchespool
: pooling type, either :class or :meannclasses
: number of classes in the output
See also Metalhead.vit
.