Loss Functions
Flux provides a large number of common loss functions used for training machine learning models. They are grouped together in the Flux.Losses
module.
Loss functions for supervised learning typically expect as inputs a target y
, and a prediction ŷ
. In Flux's convention, the order of the arguments is the following
loss(ŷ, y)
Most loss functions in Flux have an optional argument agg
, denoting the type of aggregation performed over the batch:
loss(ŷ, y) # defaults to `mean`
loss(ŷ, y, agg=sum) # use `sum` for reduction
loss(ŷ, y, agg=x->sum(x, dims=2)) # partial reduction
loss(ŷ, y, agg=x->mean(w .* x)) # weighted mean
loss(ŷ, y, agg=identity) # no aggregation.
Losses Reference
Flux.Losses.mae
— Functionmae(ŷ, y; agg=mean)
Return the loss corresponding to mean absolute error:
agg(abs.(ŷ .- y))
Flux.Losses.mse
— Functionmse(ŷ, y; agg=mean)
Return the loss corresponding to mean square error:
agg((ŷ .- y).^2)
Flux.Losses.msle
— Functionmsle(ŷ, y; agg=mean, ϵ=eps(ŷ))
The loss corresponding to mean squared logarithmic errors, calculated as
agg((log.(ŷ .+ ϵ) .- log.(y .+ ϵ)).^2)
The ϵ
term provides numerical stability. Penalizes an under-estimation more than an over-estimatation.
Flux.Losses.huber_loss
— Functionhuber_loss(ŷ, y; δ=1, agg=mean)
Return the mean of the Huber loss given the prediction ŷ
and true values y
.
| 0.5 * |ŷ - y|^2, for |ŷ - y| <= δ
Huber loss = |
| δ * (|ŷ - y| - 0.5 * δ), otherwise
Flux.Losses.crossentropy
— Functioncrossentropy(ŷ, y; dims=1, ϵ=eps(ŷ), agg=mean)
Return the cross entropy between the given probability distributions; calculated as
agg(-sum(y .* log.(ŷ .+ ϵ); dims=dims))
Cross entropy is tipically used as a loss in multi-class classification, in which case the labels y
are given in a one-hot format. dims
specifies the dimension (or the dimensions) containing the class probabilities. The prediction ŷ
is supposed to sum to one across dims
, as would be the case with the output of a softmax
operation.
Use of logitcrossentropy
is recomended over crossentropy
for numerical stability.
See also: Flux.logitcrossentropy
, Flux.binarycrossentropy
, Flux.logitbinarycrossentropy
Flux.Losses.logitcrossentropy
— Functionlogitcrossentropy(ŷ, y; dims=1, ϵ=eps(ŷ), agg=mean)
Return the crossentropy computed after a Flux.logsoftmax
operation; calculated as
agg(.-sum(y .* logsoftmax(ŷ; dims=dims); dims=dims))
logitcrossentropy(ŷ, y)
is mathematically equivalent to Flux.Losses.crossentropy(softmax(ŷ), y)
but it is more numerically stable.
See also: Flux.Losses.crossentropy
, Flux.Losses.binarycrossentropy
, Flux.Losses.logitbinarycrossentropy
Flux.Losses.binarycrossentropy
— Functionbinarycrossentropy(ŷ, y; agg=mean, ϵ=eps(ŷ))
Return the binary cross-entropy loss, computed as
agg(@.(-y*log(ŷ + ϵ) - (1-y)*log(1-ŷ + ϵ)))
The ϵ
term provides numerical stability.
Typically, the prediction ŷ
is given by the output of a sigmoid
activation.
Use of logitbinarycrossentropy
is recomended over binarycrossentropy
for numerical stability.
See also: Flux.Losses.crossentropy
, Flux.Losses.logitcrossentropy
, Flux.Losses.logitbinarycrossentropy
Flux.Losses.logitbinarycrossentropy
— Functionlogitbinarycrossentropy(ŷ, y; agg=mean)
Mathematically equivalent to Flux.binarycrossentropy(σ(ŷ), y)
but is more numerically stable.
See also: Flux.Losses.crossentropy
, Flux.Losses.logitcrossentropy
, Flux.Losses.binarycrossentropy
```
Flux.Losses.kldivergence
— Functionkldivergence(ŷ, y; agg=mean)
Return the Kullback-Leibler divergence between the given probability distributions.
KL divergence is a measure of how much one probability distribution is different from the other. It is always non-negative and zero only when both the distributions are equal everywhere.
Flux.Losses.poisson_loss
— Functionpoisson_loss(ŷ, y)
Return how much the predicted distribution ŷ
diverges from the expected Poisson
distribution y
; calculated as sum(ŷ .- y .* log.(ŷ)) / size(y, 2)
.
REDO More information..
Flux.Losses.hinge_loss
— Functionhinge_loss(ŷ, y; agg=mean)
Return the hinge_loss loss given the prediction ŷ
and true labels y
(containing 1 or -1); calculated as sum(max.(0, 1 .- ŷ .* y)) / size(y, 2)
.
See also: squared_hinge_loss
Flux.Losses.squared_hinge_loss
— Functionsquared_hinge_loss(ŷ, y)
Return the squared hinge_loss loss given the prediction ŷ
and true labels y
(containing 1 or -1); calculated as sum((max.(0, 1 .- ŷ .* y)).^2) / size(y, 2)
.
See also: hinge_loss
Flux.Losses.dice_coeff_loss
— Functiondice_coeff_loss(ŷ, y; smooth=1)
Return a loss based on the dice coefficient. Used in the V-Net image segmentation architecture. Similar to the F1_score. Calculated as:
1 - 2*sum(|ŷ .* y| + smooth) / (sum(ŷ.^2) + sum(y.^2) + smooth)
Flux.Losses.tversky_loss
— Functiontversky_loss(ŷ, y; β=0.7)
Return the Tversky loss. Used with imbalanced data to give more weight to false negatives. Larger β weigh recall more than precision (by placing more emphasis on false negatives) Calculated as: 1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + β(1 .- y) . ŷ + (1 - β)y . (1 .- ŷ)) + 1)