# Tutorial: A Simple ConvNet

In this tutorial, we build a simple Convolutional Neural Network (ConvNet) to classify the MNIST dataset. This model has a simple architecture with three feature detection layers (Conv -> ReLU -> MaxPool) followed by a final dense layer that classifies MNIST handwritten digits. Note that this model, while simple, should hit around 99% test accuracy after training for approximately 20 epochs.

This example writes out the saved model to the file `mnist_conv.bson`

. Also, it demonstrates basic model construction, training, saving, conditional early-exit, and learning rate scheduling.

To run this example, we need the following packages:

```
using Flux, MLDatasets, Statistics
using Flux: onehotbatch, onecold, logitcrossentropy, params
using MLDatasets: MNIST
using Base.Iterators: partition
using Printf, BSON
using CUDA
CUDA.allowscalar(false)
```

We set default values for learning rate, batch size, number of epochs, and path for saving the file `mnist_conv.bson`

:

```
Base.@kwdef mutable struct TrainArgs
lr::Float64 = 3e-3
epochs::Int = 20
batch_size = 128
savepath::String = "./"
end
```

## Data

To train our model, we need to bundle images together with their labels and group them into mini-batches (makes the training process faster). We define the function `make_minibatch`

that takes as inputs the images (`X`

) and their labels (`Y`

) as well as the indices for the mini-batches (`idx`

):

```
function make_minibatch(X, Y, idxs)
X_batch = Array{Float32}(undef, size(X)[1:end-1]..., 1, length(idxs))
for i in 1:length(idxs)
X_batch[:, :, :, i] = Float32.(X[:,:,idxs[i]])
end
Y_batch = onehotbatch(Y[idxs], 0:9)
return (X_batch, Y_batch)
end
```

`make_minibatch`

takes the following steps:

- Creates the
`X_batch`

array of size`28x28x1x128`

to store the mini-batches. - Stores the mini-batches in
`X_batch`

. - One hot encodes the labels of the images.
- Stores the labels in
`Y_batch`

.

`get_processed_data`

loads the train and test data from `Flux.Data.MNIST`

. First, it loads the images and labels of the train data set, and creates an array that contains the indices of the train images that correspond to each mini-batch (of size `args.batch_size`

). Then, it calls the `make_minibatch`

function to create all of the train mini-batches. Finally, it loads the test images and creates one mini-batch that contains them all.

```
function get_processed_data(args)
# Load labels and images
train_imgs, train_labels = MNIST.traindata()
mb_idxs = partition(1:length(train_labels), args.batch_size)
train_set = [make_minibatch(train_imgs, train_labels, i) for i in mb_idxs]
# Prepare test set as one giant minibatch:
test_imgs, test_labels = MNIST.testdata()
test_set = make_minibatch(test_imgs, test_labels, 1:length(test_labels))
return train_set, test_set
end
```

## Model

Now, we define the `build_model`

function that creates a ConvNet model which is composed of *three* convolution layers (feature detection) and *one* classification layer. The input layer size is `28x28`

. The images are grayscale, which means there is only *one* channel (compared to 3 for RGB) in every data point. Combined together, the convolutional layer structure would look like `Conv(kernel, input_channels => output_channels, ...)`

. Each convolution layer reduces the size of the image by applying the Rectified Linear unit (ReLU) and MaxPool operations. On the other hand, the classification layer outputs a vector of 10 dimensions (a dense layer), that is, the number of classes that the model will be able to predict.

```
function build_model(args; imgsize = (28,28,1), nclasses = 10)
cnn_output_size = Int.(floor.([imgsize[1]/8,imgsize[2]/8,32]))
return Chain(
# First convolution, operating upon a 28x28 image
Conv((3, 3), imgsize[3]=>16, pad=(1,1), relu),
MaxPool((2,2)),
# Second convolution, operating upon a 14x14 image
Conv((3, 3), 16=>32, pad=(1,1), relu),
MaxPool((2,2)),
# Third convolution, operating upon a 7x7 image
Conv((3, 3), 32=>32, pad=(1,1), relu),
MaxPool((2,2)),
# Reshape 3d array into a 2d one using `Flux.flatten`, at this point it should be (3, 3, 32, N)
flatten,
Dense(prod(cnn_output_size), 10))
end
```

To chain the layers of a model we use the Flux function Chain. It enables us to call the layers in sequence on a given input. Also, we use the function flatten to reshape the output image from the last convolution layer. Finally, we call the Dense function to create the classification layer.

## Training

Before training our model, we need to define a few functions that will be helpful for the process:

`augment`

adds gaussian random noise to our image, to make it more robust:`anynan`

checks whether any element of the params is NaN or not:`accuracy`

computes the proportion of inputs`x`

correctly classified by our ConvNet:

```
augment(x) = x .+ gpu(0.1f0*randn(eltype(x), size(x)))
anynan(x) = any(y -> any(isnan, y), x)
accuracy(x, y, model) = mean(onecold(cpu(model(x))) .== onecold(cpu(y)))
```

Finally, we define the `train`

function:

```
function train(; kws...)
args = TrainArgs(; kws...)
@info("Loading data set")
train_set, test_set = get_processed_data(args)
# Define our model. We will use a simple convolutional architecture with
# three iterations of Conv -> ReLU -> MaxPool, followed by a final Dense layer.
@info("Building model...")
model = build_model(args)
# Load model and datasets onto GPU, if enabled
train_set = gpu.(train_set)
test_set = gpu.(test_set)
model = gpu(model)
# Make sure our model is nicely precompiled before starting our training loop
model(train_set[1][1])
# `loss()` calculates the crossentropy loss between our prediction `y_hat`
# (calculated from `model(x)`) and the ground truth `y`. We augment the data
# a bit, adding gaussian random noise to our image to make it more robust.
function loss(x, y)
x̂ = augment(x)
ŷ = model(x̂)
return logitcrossentropy(ŷ, y)
end
# Train our model with the given training set using the Adam optimizer and
# printing out performance against the test set as we go.
opt = Adam(args.lr)
@info("Beginning training loop...")
best_acc = 0.0
last_improvement = 0
for epoch_idx in 1:args.epochs
# Train for a single epoch
Flux.train!(loss, params(model), train_set, opt)
# Terminate on NaN
if anynan(Flux.params(model))
@error "NaN params"
break
end
# Calculate accuracy:
acc = accuracy(test_set..., model)
@info(@sprintf("[%d]: Test accuracy: %.4f", epoch_idx, acc))
# If our accuracy is good enough, quit out.
if acc >= 0.999
@info(" -> Early-exiting: We reached our target accuracy of 99.9%")
break
end
# If this is the best accuracy we've seen so far, save the model out
if acc >= best_acc
@info(" -> New best accuracy! Saving model out to mnist_conv.bson")
BSON.@save joinpath(args.savepath, "mnist_conv.bson") params=cpu.(params(model)) epoch_idx acc
best_acc = acc
last_improvement = epoch_idx
end
# If we haven't seen improvement in 5 epochs, drop our learning rate:
if epoch_idx - last_improvement >= 5 && opt.eta > 1e-6
opt.eta /= 10.0
@warn(" -> Haven't improved in a while, dropping learning rate to $(opt.eta)!")
# After dropping learning rate, give it a few epochs to improve
last_improvement = epoch_idx
end
if epoch_idx - last_improvement >= 10
@warn(" -> We're calling this converged.")
break
end
end
end
```

`train`

calls the functions we defined above and trains our model. It stops when the model achieves 99% accuracy (early-exiting) or after performing 20 steps. More specifically, it performs the following steps:

- Loads the MNIST dataset.
- Builds our ConvNet model (as described above).
- Loads the train and test data sets as well as our model onto a GPU (if available).
- Defines a
`loss`

function that calculates the crossentropy between our prediction and the ground truth. - Sets the Adam optimiser to train the model with learning rate
`args.lr`

. - Runs the training loop. For each step (or epoch), it executes the following:
- Calls
`Flux.train!`

function to execute one training step. - If any of the parameters of our model is
`NaN`

, then the training process is terminated. - Calculates the model accuracy.
- If the model accuracy is >= 0.999, then early-exiting is executed.
- If the actual accuracy is the best so far, then the model is saved to
`mnist_conv.bson`

. Also, the new best accuracy and the current epoch is saved. - If there has not been any improvement for the last 5 epochs, then the learning rate is dropped and the process waits a little longer for the accuracy to improve.
- If the last improvement was more than 10 epochs ago, then the process is terminated.

- Calls

## Testing

Finally, to test our model we define the `test`

function:

```
function test(; kws...)
args = TrainArgs(; kws...)
# Loading the test data
_,test_set = get_processed_data(args)
# Re-constructing the model with random initial weights
model = build_model(args)
# Loading the saved parameters
BSON.@load joinpath(args.savepath, "mnist_conv.bson") params
# Loading parameters onto the model
Flux.loadparams!(model, params)
test_set = gpu.(test_set)
model = gpu(model)
@show accuracy(test_set...,model)
end
```

`test`

loads the MNIST test data set, reconstructs the model, and loads the saved parameters (in `mnist_conv.bson`

) onto it. Finally, it computes our model's predictions for the test set and shows the test accuracy (around 99%).

To see the full version of this example, see Simple ConvNets - model-zoo.

## Resources

- Neural Networks in Flux.jl with Huda Nassar (working with the MNIST dataset)
- Convolutional Neural Networks (CNNs / ConvNets).
- Convolutional Neural Networks Tutorial in PyTorch.

Originally published at fluxml.ai on 7 February 2021. Written by Elliot Saba, Adarsh Kumar, Mike J Innes, Dhairya Gandhi, Sudhanshu Agrawal, Sambit Kumar Dash, fps.io, Carlo Lucibello, Andrew Dinhobl, Liliana Badillo