ExpDecay
struct
defined in module
Flux.Optimise
ExpDecay(η = 0.001, decay = 0.1, decay_step = 1000, clip = 1e-4, start = 1)
Discount the learning rate
η
by the factor
decay
every
decay_step
steps till a minimum of
clip
.
Learning rate (
η
): Amount by which gradients are discounted before updating the weights.
decay
: Factor by which the learning rate is discounted.
decay_step
: Schedule decay operations by setting the number of steps between two decay operations.
clip
: Minimum value of learning rate.
'start': Step at which the decay starts.
See also the Scheduling Optimisers section of the docs for more general scheduling techniques.
ExpDecay
is typically composed with other optimisers as the last transformation of the gradient:
opt
=
Optimiser
(
Adam
(
)
,
ExpDecay
(
1.0
)
)
Note: you may want to start with
η=1
in
ExpDecay
when combined with other optimisers (
Adam
in this case) that have their own learning rate.
There are
2
methods for Flux.Optimise.ExpDecay
:
The following pages link back here: