AdamW

function defined in module Flux.Optimise


			AdamW(η = 0.001, β::Tuple = (0.9, 0.999), decay = 0)

AdamW is a variant of Adam fixing (as in repairing) its weight decay regularization.

Parameters

  • Learning rate ( η): Amount by which gradients are discounted before updating the weights.

  • Decay of momentums ( β::Tuple): Exponential decay for the first (β1) and the second (β2) momentum estimate.

  • decay: Decay applied to weights during optimisation.

Examples


			
			
			
			opt
			 
			=
			 
			

			AdamW
			(
			)
			

			

			
			opt
			 
			=
			 
			

			AdamW
			(
			0.001
			,
			 
			
			(
			0.89
			,
			 
			0.995
			)
			,
			 
			0.1
			)
Methods

There is 1 method for Flux.Optimise.AdamW: