It's common to encode categorical variables (like true, false or cat, dog) in "one-of-k" or "one-hot" form. OneHotArrays.jl provides the onehot function to make this easy.
There is also a onecold function, which is an inverse of onehot. It can also be given an array of numbers instead of booleans, in which case it performs an argmax-like operation, returning the label with the highest corresponding weight.
Note that these operations returned OneHotVector and OneHotMatrix rather than Arrays. OneHotVectors behave like normal vectors but avoid any unnecessary cost compared to using an integer index directly. For example, multiplying a matrix with a one-hot vector simply slices out the relevant row of the matrix under the hood.
Returns a OneHotVector which is roughly a sparse representation of x .== labels.
Instead of storing say Vector{Bool}, it stores the index of the first occurrence of x in labels. If x is not found in labels, then it either returns onehot(default, labels), or gives an error if no default is given.
See also onehotbatch to apply this to many xs, and onecold to reverse either of these, as well as to generalise argmax.
Roughly the inverse operation of onehot or onehotbatch: This finds the index of the largest element of y, or each column of y, and looks them up in labels.
If labels are not specified, the default is integers 1:size(y,1) – the same operation as argmax(y, dims=1) but sometimes a different return type.
Returns a OneHotMatrix where kth column of the matrix is onehot(xs[k], labels). This is a sparse matrix, which stores just a Vector{UInt32} containing the indices of the nonzero elements.
If one of the inputs in xs is not found in labels, that column is onehot(default, labels) if default is given, else an error.
If xs has more dimensions, N = ndims(xs) > 1, then the result is an AbstractArray{Bool, N+1} which is one-hot along the first dimension, i.e. result[:, k...] == onehot(xs[k...], labels).
Note that xs can be any iterable, such as a string. And that using a tuple for labels will often speed up construction, certainly for less than 32 classes.
A one-hot vector with L labels (i.e. length(A) == L and count(A) == 1) typically constructed by onehot. Stored efficiently as a single index of type T, usually UInt32.