# Tutorial: Simple Multi-layer Perceptron

In this example, we create a simple multi-layer perceptron (MLP) that classifies handwritten digits using the MNIST dataset. A MLP consists of at least *three layers* of stacked perceptrons: Input, hidden, and output. Each neuron of an MLP has parameters (weights and bias) and uses an activation function to compute its output.

To run this example, we need the following packages:

```
using Flux, Statistics
using Flux.Data: DataLoader
using Flux: onehotbatch, onecold, logitcrossentropy, throttle, @epochs, params
using Base.Iterators: repeated
using CUDA
using MLDatasets
if has_cuda() # Check if CUDA is available
@info "CUDA is on"
CUDA.allowscalar(false)
end
```

We set default values for learning rate, batch size, epochs, and the usage of a GPU (if available) for our model:

```
Base.@kwdef mutable struct Args
rate::Float64 = 3e-4 # learning rate
batchsize::Int = 1024 # batch size
epochs::Int = 10 # number of epochs
device::Function = gpu # set as gpu, if gpu available
end
```

If a GPU is available on our local system, then Flux uses it for computing the loss and updating the weights and biases when training our model.

## Data

We create the function `getdata`

to load the MNIST train and test data sets from MLDatasets and prepare them for the training process. In addition, we set mini-batches of the data sets by loading them onto a DataLoader object.

```
function getdata(args)
ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"
# Loading Dataset
xtrain, ytrain = MLDatasets.MNIST.traindata(Float32)
xtest, ytest = MLDatasets.MNIST.testdata(Float32)
# Reshape Data in order to flatten each image into a linear array
xtrain = Flux.flatten(xtrain)
xtest = Flux.flatten(xtest)
# One-hot-encode the labels
ytrain, ytest = onehotbatch(ytrain, 0:9), onehotbatch(ytest, 0:9)
# Batching
train_data = DataLoader((xtrain, ytrain), batchsize=args.batchsize, shuffle=true)
test_data = DataLoader((xtest, ytest), batchsize=args.batchsize)
return train_data, test_data
end
```

`getdata`

performs the following steps:

**Loads MNIST data set:**Loads the train and test set tensors. The shape of train data is`28x28x60000`

and test data is`28X28X10000`

.**Reshapes the train and test data:**Uses the flatten function to reshape the train data set into a`784x60000`

array and test data set into a`784x10000`

. Notice that we reshape the data so that we can pass these as arguments for the input layer of our model (a simple MLP expects a vector as an input).**One-hot encodes the train and test labels:**Creates a batch of one-hot vectors so we can pass the labels of the data as arguments for the loss function. For this example, we use the logitcrossentropy function and it expects data to be one-hot encoded.**Creates batches of data:**Creates two DataLoader objects (train and test) that handle data mini-batches of size`1024`

(as defined above). We create these two objects so that we can pass the entire data set through the loss function at once when training our model. Also, it shuffles the data points during each iteration (`shuffle=true`

).

## Model

As we mentioned above, a MLP consist of *three* layers that are fully connected. For this example, we define out model with the following layers and dimensions:

**Input:**It has`784`

perceptrons (the MNIST image size is`28x28`

). We flatten the train and test data so that we can pass them as arguments to this layer.**Hidden:**It has`32`

perceptrons that use the relu activation function.**Output:**It has`10`

perceptrons that output the model's prediction or probability that a digit is 0 to 9.

We define our model with the `build_model`

function:

```
function build_model(; imgsize=(28,28,1), nclasses=10)
return Chain(
Dense(prod(imgsize), 32, relu),
Dense(32, nclasses))
end
```

Note that we use the functions Dense so that our model is *densely* (or fully) connected and Chain to chain the computation of the three layers.

## Loss functions

Now, we define the loss function `loss_all`

. It expects a DataLoader object and the `model`

function we defined aboved as arguments. Notice that this function iterates through the `dataloader`

object in mini-batches and uses the function logitcrossentropy to compute the difference between the predicted and actual values.

```
function loss_all(dataloader, model)
l = 0f0
for (x,y) in dataloader
l += logitcrossentropy(model(x), y)
end
l/length(dataloader)
end
```

In addition, we define the function (`accuracy`

) to report the accuracy of our model during the training process. To compute the accuray, we need to decode the output of our model using the onecold function.

```
function accuracy(data_loader, model)
acc = 0
for (x,y) in data_loader
acc += sum(onecold(cpu(model(x))) .== onecold(cpu(y)))*1 / size(x,2)
end
acc/length(data_loader)
end
```

## Train our model

Finally, we create the `train`

function that calls the functions we defined and trains the model.

```
function train(; kws...)
# Initializing Model parameters
args = Args(; kws...)
# Load Data
train_data,test_data = getdata(args)
# Construct model
m = build_model()
train_data = args.device.(train_data)
test_data = args.device.(test_data)
m = args.device(m)
loss(x,y) = logitcrossentropy(m(x), y)
## Training
evalcb = () -> @show(loss_all(train_data, m))
opt = Adam(args.rate)
@epochs args.epochs Flux.train!(loss, params(m), train_data, opt, cb = evalcb)
@show accuracy(train_data, m)
@show accuracy(test_data, m)
end
```

`train`

performs the following steps:

**Initializes the model parameters:**Creates the`args`

object that contains the defult values for training our model.**Loads the train and test data:**Calls the function`getdata`

we defined above.**Constructs the model:**Builds the model and loads the train and test data sets, and our model onto the GPU (if available).**Trains the model:**Defines the*callback*function`evalcb`

to show the value of the`loss_all`

function during the training process. Then, it sets Adam as the optimiser for training out model. Finally, it runs the training process with the macro`@epochs`

for`10`

epochs (as defined in the`args`

object) and shows the`accuracy`

value for the train and test data.

To see the full version of this example, see Simple multi-layer perceptron - model-zoo.

## Resources

Originally published at fluxml.ai on 26 January 2021. Written by Adarsh Kumar, Mike J Innes, Andrew Dinhobl, Jerry Ling, natema, Zhang Shitian, Liliana Badillo, Dhairya Gandhi