Models
MLJFlux provides the model types below, for use with input features X
and targets y
of the scientific type indicated in the table below. The parameters n_in
, n_out
and n_channels
refer to information passed to the builder, as described under Defining Custom Builders.
Model Type | Prediction type | scitype(X) <: _ | scitype(y) <: _ |
---|---|---|---|
NeuralNetworkRegressor | Deterministic | AbstractMatrix{Continuous} or Table(Continuous) with n_in columns | AbstractVector{<:Continuous) (n_out = 1 ) |
MultitargetNeuralNetworkRegressor | Deterministic | AbstractMatrix{Continuous} or Table(Continuous) with n_in columns | <: Table(Continuous) with n_out columns |
NeuralNetworkClassifier | Probabilistic | AbstractMatrix{Continuous} or Table(Continuous) with n_in columns | AbstractVector{<:Finite} with n_out classes |
NeuralNetworkBinaryClassifier | Probabilistic | AbstractMatrix{Continuous} or Table(Continuous) with n_in columns | AbstractVector{<:Finite{2}} (but n_out = 1 ) |
ImageClassifier | Probabilistic | AbstractVector(<:Image{W,H}) with n_in = (W, H) | AbstractVector{<:Finite} with n_out classes |
What exactly is a "model"?
In MLJ a model is a mutable struct storing hyper-parameters for some learning algorithm indicated by the model name, and that's all. In particular, an MLJ model does not store learned parameters.
In Flux the term "model" has another meaning. However, as all Flux "models" used in MLJFLux are Flux.Chain
objects, we call them chains, and restrict use of "model" to models in the MLJ sense.
Are oberservations rows or columns?
In MLJ the convention for two-dimensional data (tables and matrices) is rows=obervations. For matrices Flux has the opposite convention. If your data is a matrix with whose column index the observation index, then your optimal solution is to present the adjoint
or transpose
of your matrix to MLJFlux models. Otherwise, you can use the matrix as is, or transform one time with permutedims
, and again present the adjoint
or transpose
as the optimal solution for MLJFlux training.
Instructions for coercing common image formats into some AbstractVector{<:Image}
are here.
Fitting and warm restarts
MLJ machines cache state enabling the "warm restart" of model training, as demonstrated in the incremental training example. In the case of MLJFlux models, fit!(mach)
will use a warm restart if:
only
model.epochs
has changed since the last call; oronly
model.epochs
ormodel.optimiser
have changed since the last call andmodel.optimiser_changes_trigger_retraining == false
(the default) (the "state" part of the optimiser is ignored in this comparison). This allows one to dynamically modify learning rates, for example.
Here model=mach.model
is the associated MLJ model.
The warm restart feature makes it possible to externally control iteration. See, for example, Early Stopping with MLJFlux and Using MLJ to classifiy the MNIST image dataset.
Model Hyperparameters.
All models share the following hyper-parameters. See individual model docstrings for a full list.
Hyper-parameter | Description | Default |
---|---|---|
builder | Default builder for models. | MLJFlux.Linear(σ=Flux.relu) (regressors) or MLJFlux.Short(n_hidden=0, dropout=0.5, σ=Flux.σ) (classifiers) |
optimiser | The optimiser to use for training. | Optimiser.Adam() |
loss | The loss function used for training. | Flux.mse (regressors) and Flux.crossentropy (classifiers) |
n_epochs | Number of epochs to train for. | 10 |
batch_size | The batch size for the data. | 1 |
lambda | The regularization strength. Range = [0, ∞). | 0 |
alpha | The L2/L1 mix of regularization. Range = [0, 1]. | 0 |
rng | The random number generator (RNG) passed to builders, for weight initialization, for example. Can be any AbstractRNG or the seed (integer) for a Xoshirio that is reset on every cold restart of model (machine) training. | GLOBAL_RNG |
acceleration | Use CUDALibs() for training on GPU; default is CPU1() . | CPU1() |
optimiser_changes_trigger_retraining | True if fitting an associated machine should trigger retraining from scratch whenever the optimiser changes. | false |
The classifiers have an additional hyperparameter finaliser
(default is Flux.softmax
, or Flux.σ
in the binary case) which is the operation applied to the unnormalized output of the final layer to obtain probabilities (outputs summing to one). It should return a vector of the same length as its input.
Currently, the loss function specified by loss=...
is applied internally by Flux and needs to conform to the Flux API. You cannot, for example, supply one of MLJ's probabilistic loss functions, such as MLJ.cross_entropy
to one of the classifier constructors.
That said, you can only use MLJ loss functions or metrics in evaluation meta-algorithms (such as cross validation) and they will work even if the underlying model comes from MLJFlux
.
More on accelerated training with GPUs
As in the table, when instantiating a model for training on a GPU, specify acceleration=CUDALibs()
, as in
using MLJ
ImageClassifier = @load ImageClassifier
model = ImageClassifier(epochs=10, acceleration=CUDALibs())
mach = machine(model, X, y) |> fit!
In this example, the data X, y
is copied onto the GPU under the hood on the call to fit!
and cached for use in any warm restart (see above). The Flux chain used in training is always copied back to the CPU at then conclusion of fit!
, and made available as fitted_params(mach)
.
Builders
Builder | Description |
---|---|
MLJFlux.MLP (hidden=(10,)) | General multi-layer perceptron |
MLJFlux.Short (n_hidden=0, dropout=0.5, σ=sigmoid) | Fully connected network with one hidden layer and dropout |
MLJFlux.Linear (σ=relu) | Vanilla linear network with no hidden layers and activation function σ |
MLJFlux.@builder | Macro for customized builders |