AdamW
function
defined in module
Flux.Optimise
AdamW(η = 0.001, β::Tuple = (0.9, 0.999), decay = 0)
AdamW is a variant of Adam fixing (as in repairing) its weight decay regularization.
Learning rate (
η
): Amount by which gradients are discounted before updating the weights.
Decay of momentums (
β::Tuple
): Exponential decay for the first (β1) and the second (β2) momentum estimate.
decay
: Decay applied to weights during optimisation.
opt
=
AdamW
(
)
opt
=
AdamW
(
0.001
,
(
0.89
,
0.995
)
,
0.1
)
There is
1
method for Flux.Optimise.AdamW
:
The following pages link back here:
Flux.jl , deprecations.jl , optimise/Optimise.jl , optimise/optimisers.jl