When training neural networks, you often have to tune hyperparameters.
In FluxTraining.jl the following definition is used:
A hyperparameter is any state that influences the training and is not a parameter of the model.
Common hyperparameters to worry about are the learning rate, batch size and regularization strength.
In recent years, it has also become common practice to schedule some hyperparameters. The cyclical learning rate schedule introduced in L. Smith 2015 , for example, changes the learning rate every step to speed up convergence.
FluxTraining.jl provides an extensible interface for hyperparameter scheduling that is not restricted to optimizer hyperparameters as in many other training frameworks. To use it, you have to create a
Scheduler
, a callback that can be passed to a
Learner
.
Scheduler
's constructor takes pairs of hyperparameter types and associated schedules.
As an example
LearningRate
is a hyperparameter type representing the optimizer's step size; and
schedule = Exp(γ=0.9)
represents an exponential decay scheduling
We can create the callback scheduling the learning rate according to
Scheduler(LearningRate => schedule)
.
Schedule
s are built around
ParameterSchedulers.jl
. See that package's documentation for more details on how to construct them.
Let's define a
Schedule
that follows the above-mentioned cyclical learning rate schedule.
The idea is to start with a small learning rate, gradually increase it, and then slowly decrease it again.
For example, we could start with a learning rate of 0.01, increase it to 0.1 over 3 epochs, and then down to 0.001 over 7 epochs. Let's also use cosine annealing, a common practice that makes sure the values are interpolated more smoothly.
In code, that looks like this:
using
ParameterSchedulers
:
Shifted
,
Sin
# for cosine annealing
es
=
length
(
traindl
)
# number of steps in an epoch
schedule
=
Sequence
(
Sin
(
λ0
=
0.01
,
# initial learning rate
λ1
=
0.1
,
# max learning rate
period
=
2
*
3
es
#
)
=>
3
es
,
Shifted
(
Sin
(
λ0
=
0.1
,
# max learning rate
λ1
=
0.001
,
# end learning rate
period
=
2
*
7
es
)
,
7
es
+
1
)
=>
7
es
)
learner
=
model
(
model
,
data
,
opt
,
lossfn
,
Scheduler
(
LearningRate
=>
schedule
)
)
For convenience, you can also use the
onecycle
helper to create this
Schedule
. See
ParameterSchedulers.jl documentation
for more details on warm-up schedules.
You can create and schedule your own hyperparameters.
To do this, you will need to define
a type for your hyperparameter, e.g.
abstract type MyParam <: HyperParameter end
,
how to set the hyperparameter by implementing
sethyperparameter!
(learner, ::Type{MyParam}, value)
what state needs to be accessed to set the hyperparameter by implementing
stateaccess
(::Type{MyParam})
. See
custom callbacks for more info on why this is needed.
Hyperparameters don't need to belong to the optimizer! For example, you could create a hyperparameter for batch size. That is not implemented here because this package is agnostic of the data iterators and the implementation would differ for every type of iterator.
The following pages link back here: