Tutorial: Simple Multi-layer Perceptron

In this example, we create a simple multi-layer perceptron (MLP) that classifies handwritten digits using the MNIST dataset. A MLP consists of at least three layers of stacked perceptrons: Input, hidden, and output. Each neuron of an MLP has parameters (weights and bias) and uses an activation function to compute its output.

To run this example, we need the following packages:

using Flux, Statistics
using Flux.Data: DataLoader
using Flux: onehotbatch, onecold, logitcrossentropy, throttle, params
using Base.Iterators: repeated
using CUDA
using MLDatasets
if has_cuda()		# Check if CUDA is available
    @info "CUDA is on"

We set default values for learning rate, batch size, epochs, and the usage of a GPU (if available) for our model:

Base.@kwdef mutable struct Args
    rate::Float64 = 3e-4    # learning rate
    batchsize::Int = 1024   # batch size
    epochs::Int = 10        # number of epochs
    device::Function = gpu  # set as gpu, if gpu available

If a GPU is available on our local system, then Flux uses it for computing the loss and updating the weights and biases when training our model.


We create the function getdata to load the MNIST train and test data sets from MLDatasets and prepare them for the training process. In addition, we set mini-batches of the data sets by loading them onto a DataLoader object.

function getdata(args)

    # Loading Dataset	
    xtrain, ytrain = MLDatasets.MNIST.traindata(Float32)
    xtest, ytest = MLDatasets.MNIST.testdata(Float32)
    # Reshape Data in order to flatten each image into a linear array
    xtrain = Flux.flatten(xtrain)
    xtest = Flux.flatten(xtest)

    # One-hot-encode the labels
    ytrain, ytest = onehotbatch(ytrain, 0:9), onehotbatch(ytest, 0:9)

    # Batching
    train_data = DataLoader((xtrain, ytrain), batchsize=args.batchsize, shuffle=true)
    test_data = DataLoader((xtest, ytest), batchsize=args.batchsize)

    return train_data, test_data

getdata performs the following steps:

  • Loads MNIST data set: Loads the train and test set tensors. The shape of train data is 28x28x60000 and test data is 28X28X10000.
  • Reshapes the train and test data: Uses the flatten function to reshape the train data set into a 784x60000 array and test data set into a 784x10000. Notice that we reshape the data so that we can pass these as arguments for the input layer of our model (a simple MLP expects a vector as an input).
  • One-hot encodes the train and test labels: Creates a batch of one-hot vectors so we can pass the labels of the data as arguments for the loss function. For this example, we use the logitcrossentropy function and it expects data to be one-hot encoded.
  • Creates batches of data: Creates two DataLoader objects (train and test) that handle data mini-batches of size 1024 (as defined above). We create these two objects so that we can pass the entire data set through the loss function at once when training our model. Also, it shuffles the data points during each iteration (shuffle=true).


As we mentioned above, a MLP consist of three layers that are fully connected. For this example, we define out model with the following layers and dimensions:

  • Input: It has 784 perceptrons (the MNIST image size is 28x28). We flatten the train and test data so that we can pass them as arguments to this layer.
  • Hidden: It has 32 perceptrons that use the relu activation function.
  • Output: It has 10 perceptrons that output the model's prediction or probability that a digit is 0 to 9.

We define our model with the build_model function:

function build_model(; imgsize=(28,28,1), nclasses=10)
    return Chain(
 	    Dense(prod(imgsize), 32, relu),
            Dense(32, nclasses))

Note that we use the functions Dense so that our model is densely (or fully) connected and Chain to chain the computation of the three layers.

Loss functions

Now, we define the loss function loss_all. It expects a DataLoader object and the model function we defined above as arguments. Notice that this function iterates through the dataloader object in mini-batches and uses the function logitcrossentropy to compute the difference between the predicted and actual values.

function loss_all(dataloader, model)
    l = 0f0
    for (x,y) in dataloader
        l += logitcrossentropy(model(x), y)

In addition, we define the function (accuracy) to report the accuracy of our model during the training process. To compute the accuray, we need to decode the output of our model using the onecold function.

function accuracy(data_loader, model)
    acc = 0
    for (x,y) in data_loader
        acc += sum(onecold(cpu(model(x))) .== onecold(cpu(y)))*1 / size(x,2)

Train our model

Finally, we create the train function that calls the functions we defined and trains the model.

function train(; kws...)
    # Initializing Model parameters 
    args = Args(; kws...)

    # Load Data
    train_data,test_data = getdata(args)

    # Construct model
    m = build_model()
    train_data = args.device.(train_data)
    test_data = args.device.(test_data)
    m = args.device(m)
    loss(x,y) = logitcrossentropy(m(x), y)
    ## Training
    evalcb = () -> @show(loss_all(train_data, m))
    opt = Adam(args.rate)
    for epoch in 1:args.epochs
        @info "Epoch $epoch"
        Flux.train!(loss, params(m), train_data, opt, cb = evalcb)

    @show accuracy(train_data, m)

    @show accuracy(test_data, m)

train performs the following steps:

  • Initializes the model parameters: Creates the args object that contains the defult values for training our model.
  • Loads the train and test data: Calls the function getdata we defined above.
  • Constructs the model: Builds the model and loads the train and test data sets, and our model onto the GPU (if available).
  • Trains the model: Defines the callback function evalcb to show the value of the loss_all function during the training process. Then, it sets Adam as the optimiser for training out model. Finally, it runs the training process for 10 epochs (as defined in the args object) and shows the accuracy value for the train and test data.

To see the full version of this example, see Simple multi-layer perceptron - model-zoo.



Originally published at fluxml.ai on 26 January 2021. Written by Adarsh Kumar, Mike J Innes, Andrew Dinhobl, Jerry Ling, natema, Zhang Shitian, Liliana Badillo, Dhairya Gandhi