Neural Architecture Search with MLJFlux
This demonstration is available as a Jupyter notebook or julia script here.
Neural Architecture Search (NAS) is an instance of hyperparameter tuning concerned with tuning model hyperparameters defining the architecture itself. Although it's typically performed with sophisticated search algorithms for efficiency, in this example we will be using a simple random search.
Julia version is assumed to be 1.10.*
Basic Imports
using MLJ # Has MLJFlux models
using Flux # For more flexibility
using RDatasets: RDatasets # Dataset source
using DataFrames # To view tuning results in a table
import Optimisers # native Flux.jl optimisers no longer supported
Loading and Splitting the Data
iris = RDatasets.dataset("datasets", "iris");
y, X = unpack(iris, ==(:Species), rng = 123);
X = Float32.(X); # To be compatible with type of network network parameters
first(X, 5)
Row | SepalLength | SepalWidth | PetalLength | PetalWidth |
Float32 | Float32 | Float32 | Float32 | |
1 | 6.7 | 3.3 | 5.7 | 2.1 |
2 | 5.7 | 2.8 | 4.1 | 1.3 |
3 | 7.2 | 3.0 | 5.8 | 1.6 |
4 | 4.4 | 2.9 | 1.4 | 0.2 |
5 | 5.6 | 2.5 | 3.9 | 1.1 |
Instantiating the model
Now let's construct our model. This follows a similar setup the one followed in the Quick Start.
NeuralNetworkClassifier = @load NeuralNetworkClassifier pkg = "MLJFlux"
clf = NeuralNetworkClassifier(
builder = MLJFlux.MLP(; hidden = (1, 1, 1), σ = Flux.relu),
optimiser = Optimisers.ADAM(0.01),
batch_size = 8,
epochs = 10,
rng = 42,
builder = MLP(
hidden = (1, 1, 1),
σ = NNlib.relu),
finaliser = NNlib.softmax,
optimiser = Adam(0.01, (0.9, 0.999), 1.0e-8),
loss = Flux.Losses.crossentropy,
epochs = 10,
batch_size = 8,
lambda = 0.0,
alpha = 0.0,
rng = 42,
optimiser_changes_trigger_retraining = false,
acceleration = CPU1{Nothing}(nothing),
embedding_dims = Dict{Symbol, Real}())
Generating Network Architectures
We know that the MLP builder takes a tuple of the form $(z_1, z_2, ..., z_k)$ to define a network with $k$ hidden layers and where the ith layer has $z_i$ neurons. We will proceed by defining a function that can generate all possible networks with a specific number of hidden layers, a minimum and maximum number of neurons per layer and increments to consider for the number of neurons.
function generate_networks(
# Define the range of neurons
neuron_range = min_neurons:neuron_step:max_neurons
# Empty list to store the network configurations
networks = Vector{Tuple{Vararg{Int, num_layers}}}()
# Recursive helper function to generate all combinations of tuples
function generate_tuple(current_layers, remaining_layers)
if remaining_layers > 0
for n in neuron_range
# current_layers =[] then current_layers=[(min_neurons)],
# [(min_neurons+neuron_step)], [(min_neurons+2*neuron_step)],...
# for each of these we call generate_layers again which appends
# the n combinations for each one of them
generate_tuple(vcat(current_layers, [n]), remaining_layers - 1)
# in the base case, no more layers to "recurse on"
# and we just append the current_layers as a tuple
push!(networks, tuple(current_layers...))
# Generate networks for the given number of layers
generate_tuple([], num_layers)
return networks
generate_networks (generic function with 1 method)
Now let's generate an array of all possible neural networks with three hidden layers and number of neurons per layer ∈ [1,64] with a step of 4
networks_space =
min_neurons = 1,
max_neurons = 64,
neuron_step = 4,
num_layers = 3,
5-element Vector{Tuple{Int64, Int64, Int64}}:
(1, 1, 1)
(1, 1, 5)
(1, 1, 9)
(1, 1, 13)
(1, 1, 17)
Wrapping the Model for Tuning
Let's use this array to define the range of hyperparameters and pass it along with the model to the TunedModel
r1 = range(clf, :(builder.hidden), values = networks_space)
tuned_clf = TunedModel(
model = clf,
tuning = RandomSearch(),
resampling = CV(nfolds = 4, rng = 42),
range = [r1],
measure = cross_entropy,
n = 100, # searching over 100 random samples are enough
Performing the Search
Similar to the last workflow example, all we need now is to fit our model and the search will take place automatically:
mach = machine(tuned_clf, X, y);
fit!(mach, verbosity = 0);
builder = MLP(
hidden = (57, 21, 53),
σ = NNlib.relu),
finaliser = NNlib.softmax,
optimiser = Adam(0.01, (0.9, 0.999), 1.0e-8),
loss = Flux.Losses.crossentropy,
epochs = 10,
batch_size = 8,
lambda = 0.0,
alpha = 0.0,
rng = 42,
optimiser_changes_trigger_retraining = false,
acceleration = CPU1{Nothing}(nothing),
embedding_dims = Dict{Symbol, Real}())
Analyzing the Search Results
Let's analyze the search results by converting the history array to a dataframe and viewing it:
history = report(mach).history
history_df = DataFrame(
mlp = [x[:model].builder for x in history],
measurement = [x[:measurement][1] for x in history],
first(sort!(history_df, [order(:measurement)]), 10)
Row | mlp | measurement |
MLP… | Float64 | |
1 | MLP(hidden = (57, 21, 53), …) | 0.0812572 |
2 | MLP(hidden = (61, 33, 25), …) | 0.0945999 |
3 | MLP(hidden = (33, 33, 5), …) | 0.09471 |
4 | MLP(hidden = (49, 49, 33), …) | 0.0950018 |
5 | MLP(hidden = (25, 49, 25), …) | 0.0960489 |
6 | MLP(hidden = (45, 53, 13), …) | 0.0968639 |
7 | MLP(hidden = (45, 57, 37), …) | 0.0996649 |
8 | MLP(hidden = (53, 9, 33), …) | 0.101136 |
9 | MLP(hidden = (21, 25, 29), …) | 0.103415 |
10 | MLP(hidden = (37, 25, 17), …) | 0.109497 |
