# Utilities

Zygote's gradients can be used to construct a Jacobian (by repeated evaluation) or a Hessian (by taking a second derivative).

`Zygote.jacobian`

— Function`jacobian(f, args...) -> Tuple`

For each array `a ∈ args`

this returns a matrix with `Ja[k,i] = ∂y[k]/∂a[i]`

where `y = f(args...)`

is usually a vector. Arrays of higher dimension are treated like `vec(a)`

, or `vec(y)`

for output.

For scalar `x::Number ∈ args`

, the result is a vector `Jx[k] = ∂y[k]/∂x`

, while for scalar `y`

all results have just one row.

With any other argument type, no result is produced, even if `gradient`

would work.

This reverse-mode Jacobian needs to evaluate the pullback once for each element of `y`

. Doing so is usually only efficient when `length(y)`

is small compared to `length(a)`

, otherwise forward mode is likely to be better.

See also `withjacobian`

, `hessian`

, `hessian_reverse`

.

**Examples**

```
julia> jacobian(a -> 100*a[1:3].^2, 1:7)[1] # first index (rows) is output
3×7 Matrix{Int64}:
200 0 0 0 0 0 0
0 400 0 0 0 0 0
0 0 600 0 0 0 0
julia> jacobian((a,x) -> a.^2 .* x, [1,2,3], 1) # scalar argument has vector jacobian
([2 0 0; 0 4 0; 0 0 6], [1, 4, 9])
julia> jacobian((a,d) -> prod(a, dims=d), [1 2; 3 4; 5 6], 2)
([2 0 … 0 0; 0 4 … 3 0; 0 0 … 0 5], [0, 0, 0])
```

For arguments of any type except `Number`

& `AbstractArray`

, the result is `nothing`

.

```
julia> jacobian((a,s) -> a.^length(s), [1,2,3], "str")
([3 0 0; 0 12 0; 0 0 27], nothing)
julia> jacobian((a,t) -> sum(a .* t[1]) + t[2], [1,2,3], (4,5))
([4 4 4], nothing)
julia> gradient((a,t) -> sum(a .* t[1]) + t[2], [1,2,3], (4,5)) # gradient undersands the tuple
([4 4 4], (6, 1))
```

`jacobian(loss, ::Params)`

Like `gradient`

with implicit parameters, this method takes a zero-argument function and returns an `IdDict`

-like object, now containing the Jacobian for each parameter.

**Examples**

```
julia> xs = [1 2; 3 4]; ys = [5,7,9];
julia> Jxy = jacobian(() -> ys[1:2] .+ sum(xs.^2), Params([xs, ys]))
Grads(...)
julia> Jxy[ys]
2×3 Matrix{Int64}:
1 0 0
0 1 0
julia> Jxy[xs]
2×4 Matrix{Int64}:
2 6 4 8
2 6 4 8
```

`Zygote.hessian`

— Function`hessian(f, x)`

Construct the Hessian `∂²f/∂x²`

, where `x`

is a real number or an array, and `f(x)`

is a real number. When `x`

is an array, the result is a matrix `H[i,j] = ∂²f/∂x[i]∂x[j]`

, using linear indexing `x[i]`

even if the argument is higher-dimensional.

This uses forward over reverse, ForwardDiff over Zygote, calling `hessian_dual(f, x)`

. See `hessian_reverse`

for an all-Zygote alternative.

See also `diaghessian`

to compute only the diagonal part.

**Examples**

```
julia> hessian(x -> x[1]*x[2], randn(2))
2×2 Matrix{Float64}:
0.0 1.0
1.0 0.0
julia> hessian(x -> sum(x.^3), [1 2; 3 4]) # uses linear indexing of x
4×4 Matrix{Int64}:
6 0 0 0
0 18 0 0
0 0 12 0
0 0 0 24
julia> hessian(sin, pi/2)
-1.0
```

`Zygote.diaghessian`

— Function`diaghessian(f, args...) -> Tuple`

Diagonal part of the Hessian. Returns a tuple containing, for each argument `x`

, `h`

of the same shape with `h[i] = Hᵢᵢ = ∂²y/∂x[i]∂x[i]`

. The original evaluation `y = f(args...)`

must give a real number `y`

.

For one vector argument `x`

, this is equivalent to `(diag(hessian(f,x)),)`

. Like `hessian`

it uses ForwardDiff over Zygote.

For arguments of any type except `Number`

& `AbstractArray`

, the result is `nothing`

.

**Examples**

```
julia> diaghessian(x -> sum(x.^3), [1 2; 3 4])[1]
2×2 Matrix{Int64}:
6 12
18 24
julia> Diagonal(vec(ans)) == hessian(x -> sum(x.^3), [1 2; 3 4]) # full Hessian is diagonal
true
julia> diaghessian((x,y) -> sum(x .* y .* y'), [1 22; 333 4], [0.5, 0.666]) # two array arguments
([0.0 0.0; 0.0 0.0], [2.0, 8.0])
julia> diaghessian(atan, 1, 2) # two scalar arguments
(-0.16, 0.16)
julia> hessian(xy -> atan(xy[1], xy[2]), [1, 2]) # full Hessian is not diagonal
2×2 Matrix{Float64}:
-0.16 -0.12
-0.12 0.16
```

Zygote also provides a set of helpful utilities. These are all "user-level" tools – in other words you could have written them easily yourself, but they live in Zygote for convenience.

`Zygote.withgradient`

— Function```
withgradient(f, args...)
withgradient(f, ::Params)
```

Returns both the value of the function and the `gradient`

, as a named tuple.

```
julia> y, ∇ = withgradient(/, 1, 2)
(val = 0.5, grad = (0.5, -0.25))
julia> ∇ == gradient(/, 1, 2)
true
```

`Zygote.withjacobian`

— Function`withjacobian(f, args...)`

Returns both the value `f(args...)`

and the `jacobian`

as a named tuple.

```
julia> withjacobian(cumsum, [1,2,3])
(val = [1, 3, 6], grad = ([1 0 0; 1 1 0; 1 1 1],))
```

`Zygote.@showgrad`

— Macro`@showgrad(x) -> x`

Much like `@show`

, but shows the gradient about to accumulate to `x`

. Useful for debugging gradients.

```
julia> gradient(2, 3) do a, b
@showgrad(a)*b
end
∂(a) = 3
(3, 2)
```

Note that the gradient depends on how the output of `@showgrad`

is *used*, and is not the *overall* gradient of the variable `a`

. For example:

```
julia> gradient(2) do a
@showgrad(a)*a
end
∂(a) = 2
(4,)
julia> gradient(2, 3) do a, b
@showgrad(a) # not used, so no gradient
a*b
end
∂(a) = nothing
(3, 2)
```

`Zygote.hook`

— Function`hook(x̄ -> ..., x) -> x`

Gradient hooks. Allows you to apply an arbitrary function to the gradient for `x`

.

```
julia> gradient(2, 3) do a, b
hook(ā -> @show(ā), a)*b
end
ā = 3
(3, 2)
julia> gradient(2, 3) do a, b
hook(-, a)*b
end
(-3, 2)
```

`Zygote.dropgrad`

— Function`dropgrad(x) -> x`

Drop the gradient of `x`

.

```
julia> gradient(2, 3) do a, b
dropgrad(a)*b
end
(nothing, 2)
```

`Zygote.Buffer`

— Type`Buffer(xs, ...)`

`Buffer`

is an array-like type which is mutable when taking gradients. You can construct a `Buffer`

with the same syntax as `similar`

(e.g. `Buffer(xs, 5)`

) and then use normal indexing. Finally, use `copy`

to get back a normal array.

For example:

```
julia> function vstack(xs)
buf = Buffer(xs, length(xs), 5)
for i = 1:5
buf[:, i] = xs
end
return copy(buf)
end
vstack (generic function with 1 method)
julia> vstack([1, 2, 3])
3×5 Array{Int64,2}:
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
julia> gradient(x -> sum(vstack(x)), [1, 2, 3])
([5.0, 5.0, 5.0],)
```

`Buffer`

is not an `AbstractArray`

and can't be used for linear algebra operations like matrix multiplication. This prevents it from being captured by pullbacks.

`copy`

is a semantic copy, but does not allocate memory. Instead the `Buffer`

is made immutable after copying.

`Zygote.forwarddiff`

— Function`forwarddiff(f, x; chunk_threshold = ForwardDiff.DEFAULT_CHUNK_THRESHOLD) -> f(x)`

Runs `f(x)`

as usual, but instructs Zygote to differentiate `f`

using forward mode, rather than the usual reverse mode. The `chunk_threshold`

argument controls the maximum chunk size (c.f. ForwardDiff documentation).

Forward mode takes time linear in `length(x)`

but only has constant memory overhead, and is very efficient for scalars, so in some cases this can be a useful optimisation.

```
julia> function pow(x, n)
r = one(x)
for i = 1:n
r *= x
end
return r
end
pow (generic function with 1 method)
julia> gradient(5) do x
forwarddiff(x) do x
pow(x, 2)
end
end
(10,)
```

Note that the function `f`

will *drop gradients* for any closed-over values.

```
julia> gradient(2, 3) do a, b
forwarddiff(a) do a
a*b
end
end
(3, nothing)
```

This can be rewritten by explicitly passing through `b`

, i.e.

```
gradient(2, 3) do a, b
forwarddiff([a, b]) do (a, b)
a*b
end
end
```

`Zygote.ignore`

— Function```
ignore() do
...
end
```

Tell Zygote to ignore a block of code. Everything inside the `do`

block will run on the forward pass as normal, but Zygote won't try to differentiate it at all. This can be useful for e.g. code that does logging of the forward pass.

Obviously, you run the risk of incorrect gradients if you use this incorrectly.

`Zygote.checkpointed`

— Function`checkpointed(f, xs...)`

Use gradient checkpointing on the call `f(xs...)`

. This means that `checkpointed(f, xs...) === f(xs...)`

, but when computing the derivative intermediate results from the forward pass of `f`

will not be stored. Instead the forward pass will be repeated, when computing the derivative. This saves memory at the cost of increasing exectution time.

If `f`

is not a pure function, `checkpointed`

will likely give wrong results.

`Params`

and `Grads`

can be copied to and from arrays using the `copy!`

function.

## Working with Grads

Map, broadcast, and iteration are supported for the dictionary-like `Grads`

objects. These operations are value based and preserve the keys.

```
using Zygote, Test
w, x1, x2, b = rand(2), rand(2), rand(2), rand(2)
gs1 = gradient(() -> sum(tanh.(w .* x1 .+ b)), Params([w, b]))
gs2 = gradient(() -> sum(tanh.(w .* x2 .+ b)), Params([w, b]))
# accumulate gradients
gs = gs1 .+ gs2
@test gs[w] ≈ gs1[w] + gs2[w]
@test gs[b] ≈ gs1[b] + gs2[b]
# gradients and IdDict interact nicely
# note that an IdDict must be used for gradient algebra on the GPU
gs .+= IdDict(p => randn(size(p)) for p in keys(gs))
# clip gradients
map(x -> clamp.(x, -0.1, 0.1), gs)
# clip gradients in-place
foreach(x -> clamp!(x, -0.1, 0.1), gs)
for (p, g) in pairs(gs)
# do something with parameter `p` and corresponding gradient `g`
end
# note that gradients must be w.r.t. to the same parameter key set
gs3 = gradient(() -> sum(tanh.(w .* x2)), Params([w]))
# gs3 does not have the key b
@test_throws ArgumentError gs1 .+ gs3
```