`AdamW`

function defined in module Flux.Optimise


			AdamW(η = 0.001, β::Tuple = (0.9, 0.999), decay = 0)

AdamW is a variant of Adam fixing (as in repairing) its weight decay regularization.

Parameters

Learning rate (η): Amount by which gradients are discounted before updating the weights.
Decay of momentums (β::Tuple): Exponential decay for the first (β1) and the second (β2) momentum estimate.
decay: Decay applied to weights during optimisation.

Examples

Methods

There is 1 method for Flux.Optimise.AdamW:

optimise/optimisers.jl:502

Backlinks

The following pages link back here:

Flux.jl , deprecations.jl , optimise/Optimise.jl , optimise/optimisers.jl