Scheduling optimizers
A schedule by itself is not helpful; we need to use the schedules to adjust parameters. In this tutorial, we will examine three ways to do just that — iterating the schedule, using a stateful iterator, and using an scheduled optimizer.
Iterating during training
Since every schedule is a standard iterator, we can insert it into a training loop by simply zipping up with another iterator. For example, the following code adjusts the learning rate of the optimizer before each batch of training.
using Flux, ParameterSchedulers
data = [(rand(4, 10), rand([-1, 1], 1, 10)) for _ in 1:3]
m = Chain(Dense(4, 4, tanh), Dense(4, 1, tanh))
p = params(m)
opt = Descent()
s = Exp(λ = 1e-1, γ = 0.2)
for (η, (x, y)) in zip(s, data)
opt.eta = η
g = Flux.gradient(() -> Flux.mse(m(x), y), p)
Flux.update!(opt, p, g)
println("η: $(opt.eta)")
end
η: 0.1
η: 0.020000000000000004
η: 0.004000000000000001
We can also adjust the learning on an epoch basis instead. All that is required is to change what we zip our schedule with.
nepochs = 6
s = Step(λ = 1e-1, γ = 0.2, step_sizes = [3, 2, 1])
for (η, epoch) in zip(s, 1:nepochs)
opt.eta = η
for (i, (x, y)) in enumerate(data)
g = Flux.gradient(() -> Flux.mse(m(x), y), p)
Flux.update!(opt, p, g)
println("epoch: $epoch, batch: $i, η: $(opt.eta)")
end
end
epoch: 1, batch: 1, η: 0.1
epoch: 1, batch: 2, η: 0.1
epoch: 1, batch: 3, η: 0.1
epoch: 2, batch: 1, η: 0.1
epoch: 2, batch: 2, η: 0.1
epoch: 2, batch: 3, η: 0.1
epoch: 3, batch: 1, η: 0.1
epoch: 3, batch: 2, η: 0.1
epoch: 3, batch: 3, η: 0.1
epoch: 4, batch: 1, η: 0.020000000000000004
epoch: 4, batch: 2, η: 0.020000000000000004
epoch: 4, batch: 3, η: 0.020000000000000004
epoch: 5, batch: 1, η: 0.020000000000000004
epoch: 5, batch: 2, η: 0.020000000000000004
epoch: 5, batch: 3, η: 0.020000000000000004
epoch: 6, batch: 1, η: 0.004000000000000001
epoch: 6, batch: 2, η: 0.004000000000000001
epoch: 6, batch: 3, η: 0.004000000000000001
Stateful iteration with training
Sometimes zipping up the schedule with an iterator isn’t sufficient. For example, we might want to advance the schedule with every batch but not be forced to restart each epoch. In such a situation with nested loops, it becomes useful to use ScheduleIterator
which maintains its own iteration state.
nepochs = 3
s = ScheduleIterator(Inv(λ = 1e-1, γ = 0.2, p = 2))
for epoch in 1:nepochs
for (i, (x, y)) in enumerate(data)
opt.eta = next!(s)
g = Flux.gradient(() -> Flux.mse(m(x), y), p)
Flux.update!(opt, p, g)
println("epoch: $epoch, batch: $i, η: $(opt.eta)")
end
end
LoadError("string", 3, UndefVarError(:next!))
Working with Flux optimizers
Warning
Currently, we are porting ScheduledOptim
to Flux.jl.
It may be renamed once it is ported out of this package.
While the approaches above can be helpful when dealing with fine-grained training loops, it is usually simpler to just use a ScheduledOptim
.
nepochs = 3
s = Inv(λ = 1e-1, p = 2, γ = 0.2)
opt = ScheduledOptim(s, Descent())
for epoch in 1:nepochs
for (i, (x, y)) in enumerate(data)
g = Flux.gradient(() -> Flux.mse(m(x), y), p)
Flux.update!(opt, p, g)
println("epoch: $epoch, batch: $i, η: $(opt.optim.eta)")
end
end
epoch: 1, batch: 1, η: 0.1
epoch: 1, batch: 2, η: 0.06944444444444445
epoch: 1, batch: 3, η: 0.05102040816326532
epoch: 2, batch: 1, η: 0.0390625
epoch: 2, batch: 2, η: 0.030864197530864196
epoch: 2, batch: 3, η: 0.025
epoch: 3, batch: 1, η: 0.02066115702479339
epoch: 3, batch: 2, η: 0.01736111111111111
epoch: 3, batch: 3, η: 0.014792899408284023
The scheduled optimizer, opt
, can be used anywhere a Flux optimizer can. For example, it can be passed to Flux.train!
.