Loss Functions
Flux provides a large number of common loss functions used for training machine learning models. They are grouped together in the Flux.Losses module.
Loss functions for supervised learning typically expect as inputs a target y, and a prediction ŷ. In Flux's convention, the order of the arguments is the following
loss(ŷ, y)Most loss functions in Flux have an optional argument agg, denoting the type of aggregation performed over the batch:
loss(ŷ, y) # defaults to `mean`
loss(ŷ, y, agg=sum) # use `sum` for reduction
loss(ŷ, y, agg=x->sum(x, dims=2)) # partial reduction
loss(ŷ, y, agg=x->mean(w .* x)) # weighted mean
loss(ŷ, y, agg=identity) # no aggregation.Losses Reference
Flux.Losses.mae — Functionmae(ŷ, y; agg=mean)Return the loss corresponding to mean absolute error:
agg(abs.(ŷ .- y))Flux.Losses.mse — Functionmse(ŷ, y; agg=mean)Return the loss corresponding to mean square error:
agg((ŷ .- y).^2)Flux.Losses.msle — Functionmsle(ŷ, y; agg=mean, ϵ=eps(ŷ))The loss corresponding to mean squared logarithmic errors, calculated as
agg((log.(ŷ .+ ϵ) .- log.(y .+ ϵ)).^2)The ϵ term provides numerical stability. Penalizes an under-estimation more than an over-estimatation.
Flux.Losses.huber_loss — Functionhuber_loss(ŷ, y; δ=1, agg=mean)Return the mean of the Huber loss given the prediction ŷ and true values y.
| 0.5 * |ŷ - y|^2, for |ŷ - y| <= δ
Huber loss = |
| δ * (|ŷ - y| - 0.5 * δ), otherwiseFlux.Losses.crossentropy — Functioncrossentropy(ŷ, y; dims=1, ϵ=eps(ŷ), agg=mean)Return the cross entropy between the given probability distributions; calculated as
agg(-sum(y .* log.(ŷ .+ ϵ); dims=dims))Cross entropy is tipically used as a loss in multi-class classification, in which case the labels y are given in a one-hot format. dims specifies the dimension (or the dimensions) containing the class probabilities. The prediction ŷ is supposed to sum to one across dims, as would be the case with the output of a softmax operation.
Use of logitcrossentropy is recomended over crossentropy for numerical stability.
See also: Flux.logitcrossentropy, Flux.binarycrossentropy, Flux.logitbinarycrossentropy
Flux.Losses.logitcrossentropy — Functionlogitcrossentropy(ŷ, y; dims=1, ϵ=eps(ŷ), agg=mean)Return the crossentropy computed after a Flux.logsoftmax operation; calculated as
agg(.-sum(y .* logsoftmax(ŷ; dims=dims); dims=dims))logitcrossentropy(ŷ, y) is mathematically equivalent to Flux.Losses.crossentropy(softmax(ŷ), y) but it is more numerically stable.
See also: Flux.Losses.crossentropy, Flux.Losses.binarycrossentropy, Flux.Losses.logitbinarycrossentropy
Flux.Losses.binarycrossentropy — Functionbinarycrossentropy(ŷ, y; agg=mean, ϵ=eps(ŷ))Return the binary cross-entropy loss, computed as
agg(@.(-y*log(ŷ + ϵ) - (1-y)*log(1-ŷ + ϵ)))The ϵ term provides numerical stability.
Typically, the prediction ŷ is given by the output of a sigmoid activation.
Use of logitbinarycrossentropy is recomended over binarycrossentropy for numerical stability.
See also: Flux.Losses.crossentropy, Flux.Losses.logitcrossentropy, Flux.Losses.logitbinarycrossentropy
Flux.Losses.logitbinarycrossentropy — Functionlogitbinarycrossentropy(ŷ, y; agg=mean)Mathematically equivalent to Flux.binarycrossentropy(σ(ŷ), y) but is more numerically stable.
See also: Flux.Losses.crossentropy, Flux.Losses.logitcrossentropy, Flux.Losses.binarycrossentropy ```
Flux.Losses.kldivergence — Functionkldivergence(ŷ, y; agg=mean)Return the Kullback-Leibler divergence between the given probability distributions.
KL divergence is a measure of how much one probability distribution is different from the other. It is always non-negative and zero only when both the distributions are equal everywhere.
Flux.Losses.poisson_loss — Functionpoisson_loss(ŷ, y)Return how much the predicted distribution ŷ diverges from the expected Poisson
distribution y; calculated as sum(ŷ .- y .* log.(ŷ)) / size(y, 2).
REDO More information..
Flux.Losses.hinge_loss — Functionhinge_loss(ŷ, y; agg=mean)Return the hinge_loss loss given the prediction ŷ and true labels y (containing 1 or -1); calculated as sum(max.(0, 1 .- ŷ .* y)) / size(y, 2).
See also: squared_hinge_loss
Flux.Losses.squared_hinge_loss — Functionsquared_hinge_loss(ŷ, y)Return the squared hinge_loss loss given the prediction ŷ and true labels y (containing 1 or -1); calculated as sum((max.(0, 1 .- ŷ .* y)).^2) / size(y, 2).
See also: hinge_loss
Flux.Losses.dice_coeff_loss — Functiondice_coeff_loss(ŷ, y; smooth=1)Return a loss based on the dice coefficient. Used in the V-Net image segmentation architecture. Similar to the F1_score. Calculated as:
1 - 2*sum(|ŷ .* y| + smooth) / (sum(ŷ.^2) + sum(y.^2) + smooth)Flux.Losses.tversky_loss — Functiontversky_loss(ŷ, y; β=0.7)Return the Tversky loss. Used with imbalanced data to give more weight to false negatives. Larger β weigh recall more than precision (by placing more emphasis on false negatives) Calculated as: 1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + β(1 .- y) . ŷ + (1 - β)y . (1 .- ŷ)) + 1)