# Loss Functions

Flux provides a large number of common loss functions used for training machine learning models. They are grouped together in the `Flux.Losses`

module.

Loss functions for supervised learning typically expect as inputs a target `y`

, and a prediction `ŷ`

. In Flux's convention, the order of the arguments is the following

`loss(ŷ, y)`

Most loss functions in Flux have an optional argument `agg`

, denoting the type of aggregation performed over the batch:

```
loss(ŷ, y) # defaults to `mean`
loss(ŷ, y, agg=sum) # use `sum` for reduction
loss(ŷ, y, agg=x->sum(x, dims=2)) # partial reduction
loss(ŷ, y, agg=x->mean(w .* x)) # weighted mean
loss(ŷ, y, agg=identity) # no aggregation.
```

## Losses Reference

`Flux.Losses.mae`

— Function`mae(ŷ, y; agg=mean)`

Return the loss corresponding to mean absolute error:

`agg(abs.(ŷ .- y))`

`Flux.Losses.mse`

— Function`mse(ŷ, y; agg=mean)`

Return the loss corresponding to mean square error:

`agg((ŷ .- y).^2)`

`Flux.Losses.msle`

— Function`msle(ŷ, y; agg=mean, ϵ=eps(ŷ))`

The loss corresponding to mean squared logarithmic errors, calculated as

`agg((log.(ŷ .+ ϵ) .- log.(y .+ ϵ)).^2)`

The `ϵ`

term provides numerical stability. Penalizes an under-estimation more than an over-estimatation.

`Flux.Losses.huber_loss`

— Function`huber_loss(ŷ, y; δ=1, agg=mean)`

Return the mean of the Huber loss given the prediction `ŷ`

and true values `y`

.

```
| 0.5 * |ŷ - y|, for |ŷ - y| <= δ
Huber loss = |
| δ * (|ŷ - y| - 0.5 * δ), otherwise
```

`Flux.Losses.crossentropy`

— Function`crossentropy(ŷ, y; dims=1, ϵ=eps(ŷ), agg=mean)`

Return the cross entropy between the given probability distributions; calculated as

`agg(-sum(y .* log.(ŷ .+ ϵ); dims=dims))`

Cross entropy is tipically used as a loss in multi-class classification, in which case the labels `y`

are given in a one-hot format. `dims`

specifies the dimension (or the dimensions) containing the class probabilities. The prediction `ŷ`

is supposed to sum to one across `dims`

, as would be the case with the output of a `softmax`

operation.

Use of `logitcrossentropy`

is recomended over `crossentropy`

for numerical stability.

See also: `Flux.logitcrossentropy`

, `Flux.binarycrossentropy`

, `Flux.logitbinarycrossentropy`

`Flux.Losses.logitcrossentropy`

— Function`logitcrossentropy(ŷ, y; dims=1, ϵ=eps(ŷ), agg=mean)`

Return the crossentropy computed after a `Flux.logsoftmax`

operation; calculated as

`agg(.-sum(y .* logsoftmax(ŷ; dims=dims); dims=dims))`

`logitcrossentropy(ŷ, y)`

is mathematically equivalent to `Flux.Losses.crossentropy(softmax(ŷ), y)`

but it is more numerically stable.

See also: `Flux.Losses.crossentropy`

, `Flux.Losses.binarycrossentropy`

, `Flux.Losses.logitbinarycrossentropy`

`Flux.Losses.binarycrossentropy`

— Function`binarycrossentropy(ŷ, y; agg=mean, ϵ=eps(ŷ))`

Return the binary cross-entropy loss, computed as

`agg(@.(-y*log(ŷ + ϵ) - (1-y)*log(1-ŷ + ϵ)))`

The `ϵ`

term provides numerical stability.

Typically, the prediction `ŷ`

is given by the output of a `sigmoid`

activation.

Use of `logitbinarycrossentropy`

is recomended over `binarycrossentropy`

for numerical stability.

See also: `Flux.Losses.crossentropy`

, `Flux.Losses.logitcrossentropy`

, `Flux.Losses.logitbinarycrossentropy`

`Flux.Losses.logitbinarycrossentropy`

— Function`logitbinarycrossentropy(ŷ, y; agg=mean)`

Mathematically equivalent to `Flux.binarycrossentropy(σ(ŷ), y)`

but is more numerically stable.

See also: `Flux.Losses.crossentropy`

, `Flux.Losses.logitcrossentropy`

, `Flux.Losses.binarycrossentropy`

```

`Flux.Losses.kldivergence`

— Function`kldivergence(ŷ, y; agg=mean)`

Return the Kullback-Leibler divergence between the given probability distributions.

KL divergence is a measure of how much one probability distribution is different from the other. It is always non-negative and zero only when both the distributions are equal everywhere.

`Flux.Losses.poisson_loss`

— Function`poisson_loss(ŷ, y)`

**Return how much the predicted distribution ŷ diverges from the expected Poisson**

**distribution y; calculated as sum(ŷ .- y .* log.(ŷ)) / size(y, 2).**

REDO More information..

`Flux.Losses.hinge_loss`

— Function`hinge_loss(ŷ, y; agg=mean)`

Return the hinge_loss loss given the prediction `ŷ`

and true labels `y`

(containing 1 or -1); calculated as `sum(max.(0, 1 .- ŷ .* y)) / size(y, 2)`

.

See also: `squared_hinge_loss`

`Flux.Losses.squared_hinge_loss`

— Function`squared_hinge_loss(ŷ, y)`

Return the squared hinge_loss loss given the prediction `ŷ`

and true labels `y`

(containing 1 or -1); calculated as `sum((max.(0, 1 .- ŷ .* y)).^2) / size(y, 2)`

.

See also: `hinge_loss`

`Flux.Losses.dice_coeff_loss`

— Function`dice_coeff_loss(ŷ, y; smooth=1)`

Return a loss based on the dice coefficient. Used in the V-Net image segmentation architecture. Similar to the F1_score. Calculated as:

`1 - 2*sum(|ŷ .* y| + smooth) / (sum(ŷ.^2) + sum(y.^2) + smooth)`

`Flux.Losses.tversky_loss`

— Function`tversky_loss(ŷ, y; β=0.7)`

Return the Tversky loss. Used with imbalanced data to give more weight to false negatives. Larger β weigh recall more than precision (by placing more emphasis on false negatives) Calculated as: 1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + β*(1 .- y) .* ŷ + (1 - β)*y .* (1 .- ŷ)) + 1)