ExpDecay
struct defined in module
Flux.Optimise
ExpDecay(η = 0.001, decay = 0.1, decay_step = 1000, clip = 1e-4, start = 1)
Discount the learning rate
η by the factor
decay every
decay_step steps till a minimum of
clip.
Learning rate (
η): Amount by which gradients are discounted before updating the weights.
decay: Factor by which the learning rate is discounted.
decay_step: Schedule decay operations by setting the number of steps between two decay operations.
clip: Minimum value of learning rate.
'start': Step at which the decay starts.
See also the Scheduling Optimisers section of the docs for more general scheduling techniques.
ExpDecay is typically composed with other optimisers as the last transformation of the gradient:
opt
=
Optimiser
(
Adam
(
)
,
ExpDecay
(
1.0
)
)
Note: you may want to start with
η=1 in
ExpDecay when combined with other optimisers (
Adam in this case) that have their own learning rate.
There are
2
methods for Flux.Optimise.ExpDecay:
The following pages link back here: