AdaDelta

struct defined in module Flux.Optimise


			AdaDelta(ρ = 0.9, ϵ = 1.0e-8)

AdaDelta is a version of AdaGrad adapting its learning rate based on a window of past gradient updates. Parameters don't need tuning.

Parameters

  • Rho ( ρ): Factor by which the gradient is decayed at each time step.

Examples


			
			
			
			opt
			 
			=
			 
			

			AdaDelta
			(
			)
			

			

			
			opt
			 
			=
			 
			

			AdaDelta
			(
			0.89
			)
Methods