Neural Architecture Search with MLJFlux

This demonstration is available as a Jupyter notebook or julia script here.

Neural Architecture Search (NAS) is an instance of hyperparameter tuning concerned with tuning model hyperparameters defining the architecture itself. Although it's typically performed with sophisticated search algorithms for efficiency, in this example we will be using a simple random search.

Julia version is assumed to be 1.10.*

Basic Imports

using MLJ               # Has MLJFlux models
using Flux              # For more flexibility
using RDatasets: RDatasets        # Dataset source
using DataFrames        # To view tuning results in a table
import Optimisers       # native Flux.jl optimisers no longer supported

Loading and Splitting the Data

iris = RDatasets.dataset("datasets", "iris");
y, X = unpack(iris, ==(:Species), rng = 123);
X = Float32.(X);      # To be compatible with type of network network parameters
first(X, 5)
5×4 DataFrame
RowSepalLengthSepalWidthPetalLengthPetalWidth
Float32Float32Float32Float32
16.73.35.72.1
25.72.84.11.3
37.23.05.81.6
44.42.91.40.2
55.62.53.91.1

Instantiating the model

Now let's construct our model. This follows a similar setup the one followed in the Quick Start.

NeuralNetworkClassifier = @load NeuralNetworkClassifier pkg = "MLJFlux"
clf = NeuralNetworkClassifier(
    builder = MLJFlux.MLP(; hidden = (1, 1, 1), σ = Flux.relu),
    optimiser = Optimisers.ADAM(0.01),
    batch_size = 8,
    epochs = 10,
    rng = 42,
)
NeuralNetworkClassifier(
  builder = MLP(
        hidden = (1, 1, 1), 
        σ = NNlib.relu), 
  finaliser = NNlib.softmax, 
  optimiser = Adam(0.01, (0.9, 0.999), 1.0e-8), 
  loss = Flux.Losses.crossentropy, 
  epochs = 10, 
  batch_size = 8, 
  lambda = 0.0, 
  alpha = 0.0, 
  rng = 42, 
  optimiser_changes_trigger_retraining = false, 
  acceleration = CPU1{Nothing}(nothing), 
  embedding_dims = Dict{Symbol, Real}())

Generating Network Architectures

We know that the MLP builder takes a tuple of the form $(z_1, z_2, ..., z_k)$ to define a network with $k$ hidden layers and where the ith layer has $z_i$ neurons. We will proceed by defining a function that can generate all possible networks with a specific number of hidden layers, a minimum and maximum number of neurons per layer and increments to consider for the number of neurons.

function generate_networks(
    ;min_neurons::Int,
    max_neurons::Int,
    neuron_step::Int,
    num_layers::Int,
    )
    # Define the range of neurons
    neuron_range = min_neurons:neuron_step:max_neurons

    # Empty list to store the network configurations
    networks = Vector{Tuple{Vararg{Int, num_layers}}}()

    # Recursive helper function to generate all combinations of tuples
    function generate_tuple(current_layers, remaining_layers)
        if remaining_layers > 0
            for n in neuron_range
                # current_layers =[] then current_layers=[(min_neurons)],
                # [(min_neurons+neuron_step)], [(min_neurons+2*neuron_step)],...
                # for each of these we call generate_layers again which appends
                # the n combinations for each one of them
                generate_tuple(vcat(current_layers, [n]), remaining_layers - 1)
            end
        else
            # in the base case, no more layers to "recurse on"
            # and we just append the current_layers as a tuple
            push!(networks, tuple(current_layers...))
        end
    end

    # Generate networks for the given number of layers
    generate_tuple([], num_layers)

    return networks
end
generate_networks (generic function with 1 method)

Now let's generate an array of all possible neural networks with three hidden layers and number of neurons per layer ∈ [1,64] with a step of 4

networks_space =
    generate_networks(
        min_neurons = 1,
        max_neurons = 64,
        neuron_step = 4,
        num_layers = 3,
    )

networks_space[1:5]
5-element Vector{Tuple{Int64, Int64, Int64}}:
 (1, 1, 1)
 (1, 1, 5)
 (1, 1, 9)
 (1, 1, 13)
 (1, 1, 17)

Wrapping the Model for Tuning

Let's use this array to define the range of hyperparameters and pass it along with the model to the TunedModel constructor.

r1 = range(clf, :(builder.hidden), values = networks_space)

tuned_clf = TunedModel(
    model = clf,
    tuning = RandomSearch(),
    resampling = CV(nfolds = 4, rng = 42),
    range = [r1],
    measure = cross_entropy,
    n = 100,             # searching over 100 random samples are enough
);

Similar to the last workflow example, all we need now is to fit our model and the search will take place automatically:

mach = machine(tuned_clf, X, y);
fit!(mach, verbosity = 0);
fitted_params(mach).best_model
NeuralNetworkClassifier(
  builder = MLP(
        hidden = (9, 5, 37), 
        σ = NNlib.relu), 
  finaliser = NNlib.softmax, 
  optimiser = Adam(0.01, (0.9, 0.999), 1.0e-8), 
  loss = Flux.Losses.crossentropy, 
  epochs = 10, 
  batch_size = 8, 
  lambda = 0.0, 
  alpha = 0.0, 
  rng = 42, 
  optimiser_changes_trigger_retraining = false, 
  acceleration = CPU1{Nothing}(nothing), 
  embedding_dims = Dict{Symbol, Real}())

Analyzing the Search Results

Let's analyze the search results by converting the history array to a dataframe and viewing it:

history = report(mach).history
history_df = DataFrame(
    mlp = [x[:model].builder for x in history],
    measurement = [x[:measurement][1] for x in history],
)
first(sort!(history_df, [order(:measurement)]), 10)
10×2 DataFrame
Rowmlpmeasurement
MLP…Float64
1MLP(hidden = (9, 5, 37), …)0.0702663
2MLP(hidden = (25, 9, 49), …)0.0867743
3MLP(hidden = (33, 9, 49), …)0.0892747
4MLP(hidden = (25, 45, 49), …)0.0894714
5MLP(hidden = (21, 9, 45), …)0.0905676
6MLP(hidden = (25, 17, 29), …)0.0992405
7MLP(hidden = (29, 9, 9), …)0.0995201
8MLP(hidden = (53, 9, 33), …)0.101136
9MLP(hidden = (57, 45, 37), …)0.101165
10MLP(hidden = (45, 49, 49), …)0.103885

This page was generated using Literate.jl.