Nesterov
struct
defined in module
Flux.Optimise
Nesterov(η = 0.001, ρ = 0.9)
Gradient descent optimiser with learning rate
η
and Nesterov momentum
ρ
.
Learning rate (
η
): Amount by which gradients are discounted before updating the weights.
Nesterov momentum (
ρ
): Controls the acceleration of gradient descent in the prominent direction, in effect damping oscillations.
opt
=
Nesterov
(
)
opt
=
Nesterov
(
0.003
,
0.95
)
There are
2
methods for Flux.Optimise.Nesterov
:
The following pages link back here:
Flux.jl , deprecations.jl , optimise/Optimise.jl , optimise/optimisers.jl