Loss Functions

Flux provides a large number of common loss functions used for training machine learning models. They are grouped together in the Flux.Losses module.

Loss functions for supervised learning typically expect as inputs a target y, and a prediction ŷ. In Flux's convention, the order of the arguments is the following

loss(ŷ, y)

Most loss functions in Flux have an optional argument agg, denoting the type of aggregation performed over the batch:

loss(ŷ, y)                         # defaults to `mean`
loss(ŷ, y, agg=sum)                # use `sum` for reduction
loss(ŷ, y, agg=x->sum(x, dims=2))  # partial reduction
loss(ŷ, y, agg=x->mean(w .* x))    # weighted mean
loss(ŷ, y, agg=identity)           # no aggregation.

Losses Reference

Flux.Losses.mae — Function

mae(ŷ, y; agg=mean)

Return the loss corresponding to mean absolute error:

agg(abs.(ŷ .- y))

source

Flux.Losses.mse — Function

mse(ŷ, y; agg=mean)

Return the loss corresponding to mean square error:

agg((ŷ .- y).^2)

source

Flux.Losses.msle — Function

msle(ŷ, y; agg=mean, ϵ=eps(ŷ))

The loss corresponding to mean squared logarithmic errors, calculated as

agg((log.(ŷ .+ ϵ) .- log.(y .+ ϵ)).^2)

The ϵ term provides numerical stability. Penalizes an under-estimation more than an over-estimatation.

source

Flux.Losses.huber_loss — Function

huber_loss(ŷ, y; δ=1, agg=mean)

Return the mean of the Huber loss given the prediction ŷ and true values y.

             | 0.5 * |ŷ - y|^2,            for |ŷ - y| <= δ
Huber loss = |
             |  δ * (|ŷ - y| - 0.5 * δ), otherwise

source

Flux.Losses.label_smoothing — Function

label_smoothing(y::Union{Number, AbstractArray}, α; dims::Int=1)

Returns smoothed labels, meaning the confidence on label values are relaxed.

When y is given as one-hot vector or batch of one-hot, its calculated as

y .* (1 - α) .+ α / size(y, dims)

when y is given as a number or batch of numbers for binary classification, its calculated as

y .* (1 - α) .+ α / 2

in which case the labels are squeezed towards 0.5.

α is a number in interval (0, 1) called the smoothing factor. Higher the value of α larger the smoothing of y.

dims denotes the one-hot dimension, unless dims=0 which denotes the application of label smoothing to binary distributions encoded in a single number.

Usage example:

sf = 0.1
y = onehotbatch([1, 1, 1, 0, 0], 0:1)
y_smoothed = label_smoothing(ya, 2sf)
y_sim = y .* (1-2sf) .+ sf
y_dis = copy(y_sim)
y_dis[1,:], y_dis[2,:] = y_dis[2,:], y_dis[1,:]
@assert crossentropy(y_sim, y) < crossentropy(y_sim, y_smoothed)
@assert crossentropy(y_dis, y) > crossentropy(y_dis, y_smoothed)

source

Flux.Losses.crossentropy — Function

crossentropy(ŷ, y; dims=1, ϵ=eps(ŷ), agg=mean)

Return the cross entropy between the given probability distributions; calculated as

agg(-sum(y .* log.(ŷ .+ ϵ); dims=dims))

Cross entropy is typically used as a loss in multi-class classification, in which case the labels y are given in a one-hot format. dims specifies the dimension (or the dimensions) containing the class probabilities. The prediction ŷ is supposed to sum to one across dims, as would be the case with the output of a softmax operation.

Use label_smoothing to smooth the true labels as preprocessing before computing the loss.

Use of logitcrossentropy is recomended over crossentropy for numerical stability.

source

Flux.Losses.logitcrossentropy — Function

logitcrossentropy(ŷ, y; dims=1, agg=mean)

Return the crossentropy computed after a logsoftmax operation; calculated as

agg(.-sum(y .* logsoftmax(ŷ; dims=dims); dims=dims))

Use label_smoothing to smooth the true labels as preprocessing before computing the loss.

logitcrossentropy(ŷ, y) is mathematically equivalent to crossentropy(softmax(ŷ), y) but it is more numerically stable.

source

Flux.Losses.binarycrossentropy — Function

binarycrossentropy(ŷ, y; agg=mean, ϵ=eps(ŷ))

Return the binary cross-entropy loss, computed as

agg(@.(-y*log(ŷ + ϵ) - (1-y)*log(1-ŷ + ϵ)))

The ϵ term provides numerical stability.

Typically, the prediction ŷ is given by the output of a sigmoid activation.

Use label_smoothing to smooth the y value as preprocessing before computing the loss.

Use of logitbinarycrossentropy is recomended over binarycrossentropy for numerical stability.

source

Flux.Losses.logitbinarycrossentropy — Function

logitbinarycrossentropy(ŷ, y; agg=mean)

Mathematically equivalent to binarycrossentropy(σ(ŷ), y) but is more numerically stable.

Use label_smoothing to smooth the y value as preprocessing before computing the loss.

source

Flux.Losses.kldivergence — Function

kldivergence(ŷ, y; agg=mean)

Return the Kullback-Leibler divergence between the given probability distributions.

KL divergence is a measure of how much one probability distribution is different from the other. It is always non-negative and zero only when both the distributions are equal everywhere.

source

Flux.Losses.poisson_loss — Function

poisson_loss(ŷ, y)

Return how much the predicted distribution ŷ diverges from the expected Poisson

distribution y; calculated as sum(ŷ .- y .* log.(ŷ)) / size(y, 2).

REDO More information..

source

Flux.Losses.hinge_loss — Function

hinge_loss(ŷ, y; agg=mean)

Return the hinge_loss loss given the prediction ŷ and true labels y (containing 1 or -1); calculated as sum(max.(0, 1 .- ŷ .* y)) / size(y, 2).