Simple multi-layer perceptron

In this example, we create a simple multi-layer perceptron (MLP) that classifies handwritten digits using the MNIST dataset. A MLP consists of at least three layers of stacked perceptrons: Input, hidden, and output. Each neuron of an MLP has parameters (weights and bias) and uses an activation function to compute its output.

To run this example, we need the following packages:

using Flux, Statistics
using Flux.Data: DataLoader
using Flux: onehotbatch, onecold, logitcrossentropy, throttle, @epochs
using Base.Iterators: repeated
using Parameters: @with_kw
using CUDA
using MLDatasets
if has_cuda()		# Check if CUDA is available
    @info "CUDA is on"

We set default values for learning rate, batch size, epochs, and the usage of a GPU (if available) for our model:

@with_kw mutable struct Args
    η::Float64 = 3e-4       # learning rate
    batchsize::Int = 1024   # batch size
    epochs::Int = 10        # number of epochs
    device::Function = gpu  # set as gpu, if gpu available

If a GPU is available on our local system, then Flux uses it for computing the loss and updating the weights and biases when training our model.


We create the function getdata to load the MNIST train and test data sets from MLDatasets and prepare them for the training process. In addition, we set mini-batches of the data sets by loading them onto a DataLoader object.

function getdata(args)

    # Loading Dataset	
    xtrain, ytrain = MLDatasets.MNIST.traindata(Float32)
    xtest, ytest = MLDatasets.MNIST.testdata(Float32)
    # Reshape Data in order to flatten each image into a linear array
    xtrain = Flux.flatten(xtrain)
    xtest = Flux.flatten(xtest)

    # One-hot-encode the labels
    ytrain, ytest = onehotbatch(ytrain, 0:9), onehotbatch(ytest, 0:9)

    # Batching
    train_data = DataLoader(xtrain, ytrain, batchsize=args.batchsize, shuffle=true)
    test_data = DataLoader(xtest, ytest, batchsize=args.batchsize)

    return train_data, test_data

getdata performs the following steps:


As we mentioned above, a MLP consist of three layers that are fully connected. For this example, we define out model with the following layers and dimensions:

We define our model with the build_model function:

function build_model(; imgsize=(28,28,1), nclasses=10)
    return Chain(
 	    Dense(prod(imgsize), 32, relu),
            Dense(32, nclasses))

Note that we use the functions Dense so that our model is densely (or fully) connected and Chain to chain the computation of the three layers.

Loss functions

Now, we define the loss function loss_all. It expects a DataLoader object and the model function we defined aboved as arguments. Notice that this function iterates through the dataloader object in mini-batches and uses the function logitcrossentropy to compute the difference between the predicted and actual values.

function loss_all(dataloader, model)
    l = 0f0
    for (x,y) in dataloader
        l += logitcrossentropy(model(x), y)

In addition, we define the function (accuracy) to report the accuracy of our model during the training process. To compute the accuray, we need to decode the output of our model using the onecold function.

function accuracy(data_loader, model)
    acc = 0
    for (x,y) in data_loader
        acc += sum(onecold(cpu(model(x))) .== onecold(cpu(y)))*1 / size(x,2)

Train our model

Finally, we create the train function that calls the functions we defined and trains the model.

function train(; kws...)
    # Initializing Model parameters 
    args = Args(; kws...)

    # Load Data
    train_data,test_data = getdata(args)

    # Construct model
    m = build_model()
    train_data = args.device.(train_data)
    test_data = args.device.(test_data)
    m = args.device(m)
    loss(x,y) = logitcrossentropy(m(x), y)
    ## Training
    evalcb = () -> @show(loss_all(train_data, m))
    opt = ADAM(args.η)
    @epochs args.epochs Flux.train!(loss, params(m), train_data, opt, cb = evalcb)

    @show accuracy(train_data, m)

    @show accuracy(test_data, m)

train performs the following steps:

To see the full version of this example, see Simple multi-layer perceptron - model-zoo.


– Adarsh Kumar, Mike J Innes, Andrew Dinhobl, Jerry Ling, natema, Zhang Shitian, Liliana Badillo, Dhairya Gandhi