Simple multi-layer perceptron
In this example, we create a simple multi-layer perceptron (MLP) that classifies handwritten digits using the MNIST dataset. A MLP consists of at least three layers of stacked perceptrons: Input, hidden, and output. Each neuron of an MLP has parameters (weights and bias) and uses an activation function to compute its output.
To run this example, we need the following packages:
using Flux, Statistics using Flux.Data: DataLoader using Flux: onehotbatch, onecold, logitcrossentropy, throttle, @epochs using Base.Iterators: repeated using Parameters: @with_kw using CUDA using MLDatasets if has_cuda() # Check if CUDA is available @info "CUDA is on" CUDA.allowscalar(false) end
We set default values for learning rate, batch size, epochs, and the usage of a GPU (if available) for our model:
@with_kw mutable struct Args η::Float64 = 3e-4 # learning rate batchsize::Int = 1024 # batch size epochs::Int = 10 # number of epochs device::Function = gpu # set as gpu, if gpu available end
If a GPU is available on our local system, then Flux uses it for computing the loss and updating the weights and biases when training our model.
We create the function
getdata to load the MNIST train and test data sets from MLDatasets and prepare them for the training process. In addition, we set mini-batches of the data sets by loading them onto a DataLoader object.
function getdata(args) ENV["DATADEPS_ALWAYS_ACCEPT"] = "true" # Loading Dataset xtrain, ytrain = MLDatasets.MNIST.traindata(Float32) xtest, ytest = MLDatasets.MNIST.testdata(Float32) # Reshape Data in order to flatten each image into a linear array xtrain = Flux.flatten(xtrain) xtest = Flux.flatten(xtest) # One-hot-encode the labels ytrain, ytest = onehotbatch(ytrain, 0:9), onehotbatch(ytest, 0:9) # Batching train_data = DataLoader((xtrain, ytrain), batchsize=args.batchsize, shuffle=true) test_data = DataLoader((xtest, ytest), batchsize=args.batchsize) return train_data, test_data end
getdata performs the following steps:
- Loads MNIST data set: Loads the train and test set tensors. The shape of train data is
28x28x60000and test data is
- Reshapes the train and test data: Uses the flatten function to reshape the train data set into a
784x60000array and test data set into a
784x10000. Notice that we reshape the data so that we can pass these as arguments for the input layer of our model (a simple MLP expects a vector as an input).
- One-hot encodes the train and test labels: Creates a batch of one-hot vectors so we can pass the labels of the data as arguments for the loss function. For this example, we use the logitcrossentropy function and it expects data to be one-hot encoded.
- Creates batches of data: Creates two DataLoader objects (train and test) that handle data mini-batches of size
1024(as defined above). We create these two objects so that we can pass the entire data set through the loss function at once when training our model. Also, it shuffles the data points during each iteration (
As we mentioned above, a MLP consist of three layers that are fully connected. For this example, we define out model with the following layers and dimensions:
- Input: It has
784perceptrons (the MNIST image size is
28x28). We flatten the train and test data so that we can pass them as arguments to this layer.
- Hidden: It has
32perceptrons that use the relu activation function.
- Output: It has
10perceptrons that output the model’s prediction or probability that a digit is 0 to 9.
We define our model with the
function build_model(; imgsize=(28,28,1), nclasses=10) return Chain( Dense(prod(imgsize), 32, relu), Dense(32, nclasses)) end
Now, we define the loss function
loss_all. It expects a DataLoader object and the
model function we defined aboved as arguments. Notice that this function iterates through the
dataloader object in mini-batches and uses the function logitcrossentropy to compute the difference between the predicted and actual values.
function loss_all(dataloader, model) l = 0f0 for (x,y) in dataloader l += logitcrossentropy(model(x), y) end l/length(dataloader) end
In addition, we define the function (
accuracy) to report the accuracy of our model during the training process. To compute the accuray, we need to decode the output of our model using the onecold function.
function accuracy(data_loader, model) acc = 0 for (x,y) in data_loader acc += sum(onecold(cpu(model(x))) .== onecold(cpu(y)))*1 / size(x,2) end acc/length(data_loader) end
Train our model
Finally, we create the
train function that calls the functions we defined and trains the model.
function train(; kws...) # Initializing Model parameters args = Args(; kws...) # Load Data train_data,test_data = getdata(args) # Construct model m = build_model() train_data = args.device.(train_data) test_data = args.device.(test_data) m = args.device(m) loss(x,y) = logitcrossentropy(m(x), y) ## Training evalcb = () -> @show(loss_all(train_data, m)) opt = ADAM(args.η) @epochs args.epochs Flux.train!(loss, params(m), train_data, opt, cb = evalcb) @show accuracy(train_data, m) @show accuracy(test_data, m) end
train performs the following steps:
- Initializes the model parameters: Creates the
argsobject that contains the defult values for training our model.
- Loads the train and test data: Calls the function
getdatawe defined above.
- Constructs the model: Builds the model and loads the train and test data sets, and our model onto the GPU (if available).
- Trains the model: Defines the callback function
evalcbto show the value of the
loss_allfunction during the training process. Then, it sets ADAM as the optimiser for training out model. Finally, it runs the training process with the macro
10epochs (as defined in the
argsobject) and shows the
accuracyvalue for the train and test data.
To see the full version of this example, see Simple multi-layer perceptron - model-zoo.