NNlib
Flux re-exports all of the functions exported by the NNlib package.
Activation Functions
Non-linearities that go between layers of your model. Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call σ.(xs)
, relu.(xs)
and so on.
NNlib.celu
— Functioncelu(x, α=1) =
(x ≥ 0 ? x : α * (exp(x/α) - 1))
Continuously Differentiable Exponential Linear Units See Continuously Differentiable Exponential Linear Units.
NNlib.elu
— Functionelu(x, α=1) =
x > 0 ? x : α * (exp(x) - 1)
Exponential Linear Unit activation function. See Fast and Accurate Deep Network Learning by Exponential Linear Units. You can also specify the coefficient explicitly, e.g. elu(x, 1)
.
NNlib.gelu
— Functiongelu(x) = 0.5x * (1 + tanh(√(2/π) * (x + 0.044715x^3)))
Gaussian Error Linear Unit activation function.
NNlib.hardsigmoid
— Functionhardσ(x, a=0.2) = max(0, min(1.0, a * x + 0.5))
Segment-wise linear approximation of sigmoid. See BinaryConnect: Training Deep Neural Networks withbinary weights during propagations.
NNlib.hardtanh
— Functionhardtanh(x) = max(-1, min(1, x))
Segment-wise linear approximation of tanh. Cheaper and more computational efficient version of tanh. See Large Scale Machine Learning.
NNlib.leakyrelu
— Functionleakyrelu(x, a=0.01) = max(a*x, x)
Leaky Rectified Linear Unit activation function. You can also specify the coefficient explicitly, e.g. leakyrelu(x, 0.01)
.
NNlib.lisht
— Functionlisht(x) = x * tanh(x)
Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function. See LiSHT
NNlib.logcosh
— Functionlogcosh(x)
Return log(cosh(x))
which is computed in a numerically stable way.
NNlib.logsigmoid
— Functionlogσ(x)
Return log(σ(x))
which is computed in a numerically stable way.
julia> logσ(0)
-0.6931471805599453
julia> logσ.([-100, -10, 100])
3-element Array{Float64,1}:
-100.0
-10.000045398899218
-3.720075976020836e-44
NNlib.mish
— Functionmish(x) = x * tanh(softplus(x))
Self Regularized Non-Monotonic Neural Activation Function. See Mish: A Self Regularized Non-Monotonic Neural Activation Function.
NNlib.relu
— Functionrelu(x) = max(0, x)
Rectified Linear Unit activation function.
NNlib.relu6
— Functionrelu6(x) = min(max(0, x), 6)
Rectified Linear Unit activation function capped at 6. See Convolutional Deep Belief Networks on CIFAR-10
NNlib.rrelu
— Functionrrelu(x, l=1/8, u=1/3) = max(a*x, x)
a = randomly sampled from uniform distribution U(l, u)
Randomized Leaky Rectified Linear Unit activation function. You can also specify the bound explicitly, e.g. rrelu(x, 0.0, 1.0)
.
NNlib.selu
— Functionselu(x) = λ * (x ≥ 0 ? x : α * (exp(x) - 1))
λ ≈ 1.0507
α ≈ 1.6733
Scaled exponential linear units. See Self-Normalizing Neural Networks.
NNlib.sigmoid
— Functionσ(x) = 1 / (1 + exp(-x))
Classic sigmoid activation function.
NNlib.softplus
— Functionsoftplus(x) = log(exp(x) + 1)
NNlib.softshrink
— Functionsoftshrink(x, λ=0.5) =
(x ≥ λ ? x - λ : (-λ ≥ x ? x + λ : 0))
NNlib.softsign
— Functionsoftsign(x) = x / (1 + |x|)
NNlib.swish
— Functionswish(x) = x * σ(x)
Self-gated activation function. See Swish: a Self-Gated Activation Function.
NNlib.tanhshrink
— Functiontanhshrink(x) = x - tanh(x)
NNlib.trelu
— Functiontrelu(x, theta = 1.0) = x > theta ? x : 0
Threshold Gated Rectified Linear. See ThresholdRelu
Softmax
NNlib.softmax
— Functionsoftmax(x; dims=1)
Softmax turns input array x
into probability distributions that sum to 1 along the dimensions specified by dims
. It is semantically equivalent to the following:
softmax(x; dims=1) = exp.(x) ./ sum(exp.(x), dims=dims)
with additional manipulations enhancing numerical stability.
For a matrix input x
it will by default (dims=1
) treat it as a batch of vectors, with each column independent. Keyword dims=2
will instead treat rows independently, etc...
julia> softmax([1, 2, 3])
3-element Array{Float64,1}:
0.0900306
0.244728
0.665241
See also logsoftmax
.
NNlib.logsoftmax
— Functionlogsoftmax(x; dims=1)
Computes the log of softmax in a more numerically stable way than directly taking log.(softmax(xs))
. Commonly used in computing cross entropy loss.
It is semantically equivalent to the following:
logsoftmax(x; dims=1) = x .- log.(sum(exp.(x), dims=dims))
See also softmax
.
Pooling
NNlib.maxpool
— Functionmaxpool(x, k::NTuple; pad=0, stride=k)
Perform max pool operation with window size k
on input tensor x
.
NNlib.meanpool
— Functionmeanpool(x, k::NTuple; pad=0, stride=k)
Perform mean pool operation with window size k
on input tensor x
.
Convolution
NNlib.conv
— Functionconv(x, w; stride=1, pad=0, dilation=1, flipped=false)
Apply convolution filter w
to input x
. x
and w
are 3d/4d/5d tensors in 1d/2d/3d convolutions respectively.
NNlib.depthwiseconv
— Functiondepthwiseconv(x, w; stride=1, pad=0, dilation=1, flipped=false)
Depthwise convolution operation with filter w
on input x
. x
and w
are 3d/4d/5d tensors in 1d/2d/3d convolutions respectively.
Batched Operations
NNlib.batched_mul
— Functionbatched_mul(A, B) -> C
A ⊠ B # \boxtimes
Batched matrix multiplication. Result has C[:,:,k] == A[:,:,k] * B[:,:,k]
for all k
. If size(B,3) == 1
then instead C[:,:,k] == A[:,:,k] * B[:,:,1]
, and similarly for A
.
To transpose each matrix, apply batched_transpose
to the array, or batched_adjoint
for conjugate-transpose:
julia> A, B = randn(2,5,17), randn(5,9,17);
julia> A ⊠ B |> size
(2, 9, 17)
julia> batched_adjoint(A) |> size
(5, 2, 17)
julia> batched_mul(A, batched_adjoint(randn(9,5,17))) |> size
(2, 9, 17)
julia> A ⊠ randn(5,9,1) |> size
(2, 9, 17)
julia> batched_transpose(A) == PermutedDimsArray(A, (2,1,3))
true
The equivalent PermutedDimsArray
may be used in place of batched_transpose
. Other permutations are also handled by BLAS, provided that the batch index k
is not the first dimension of the underlying array. Thus PermutedDimsArray(::Array, (1,3,2))
and PermutedDimsArray(::Array, (3,1,2))
are fine.
However, A = PermutedDimsArray(::Array, (3,2,1))
is not acceptable to BLAS, since the batch dimension is the contiguous one: stride(A,3) == 1
. This will be copied, as doing so is faster than batched_mul_generic!
.
Both this copy
and batched_mul_generic!
produce @debug
messages, and setting for instance ENV["JULIA_DEBUG"] = NNlib
will display them.
batched_mul(A::Array{T,3}, B::Matrix)
batched_mul(A::Matrix, B::Array{T,3})
A ⊠ B
This is always matrix-matrix multiplication, but either A
or B
may lack a batch index.
When
B
is a matrix, result hasC[:,:,k] == A[:,:,k] * B[:,:]
for allk
.When
A
is a matrix, thenC[:,:,k] == A[:,:] * B[:,:,k]
. This can also be done by reshaping and calling*
, for instanceA ⊡ B
using TensorCore.jl, but is implemented here usingbatched_gemm
instead ofgemm
.
julia> randn(16,8,32) ⊠ randn(8,4) |> size
(16, 4, 32)
julia> randn(16,8,32) ⊠ randn(8,4,1) |> size # equivalent
(16, 4, 32)
julia> randn(16,8) ⊠ randn(8,4,32) |> size
(16, 4, 32)
See also batched_vec
to regard B
as a batch of vectors, A[:,:,k] * B[:,k]
.
NNlib.batched_mul!
— Functionbatched_mul!(C, A, B) -> C
batched_mul!(C, A, B, α=1, β=0)
In-place batched matrix multiplication, equivalent to mul!(C[:,:,k], A[:,:,k], B[:,:,k], α, β)
for all k
. If size(B,3) == 1
then every batch uses B[:,:,1]
instead.
This will call batched_gemm!
whenever possible. For real arrays this means that, for X ∈ [A,B,C]
, either strides(X,1)==1
or strides(X,2)==1
, the latter may be caused by batched_transpose
or by for instance PermutedDimsArray(::Array, (3,1,2))
. Unlike batched_mul
this will never make a copy.
For complex arrays, the wrapper made by batched_adjoint
must be outermost to be seen. In this case the strided accepted by BLAS are more restricted, if stride(C,1)==1
then only stride(AorB::BatchedAdjoint,2) == 1
is accepted.
NNlib.batched_adjoint
— Functionbatched_transpose(A::AbstractArray{T,3})
batched_adjoint(A)
Equivalent to applying transpose
or adjoint
to each matrix A[:,:,k]
.
These exist to control how batched_mul
behaves, as it operates on such matrix slices of an array with ndims(A)==3
.
PermutedDimsArray(A, (2,1,3))
is equivalent to batched_transpose(A)
, and is also understood by batched_mul
(and more widely supported elsewhere).
BatchedTranspose{T, S} <: AbstractBatchedMatrix{T, 3}
BatchedAdjoint{T, S}
Lazy wrappers analogous to Transpose
and Adjoint
, returned by batched_transpose
etc.
NNlib.batched_transpose
— Functionbatched_transpose(A::AbstractArray{T,3})
batched_adjoint(A)
Equivalent to applying transpose
or adjoint
to each matrix A[:,:,k]
.
These exist to control how batched_mul
behaves, as it operates on such matrix slices of an array with ndims(A)==3
.
PermutedDimsArray(A, (2,1,3))
is equivalent to batched_transpose(A)
, and is also understood by batched_mul
(and more widely supported elsewhere).
BatchedTranspose{T, S} <: AbstractBatchedMatrix{T, 3}
BatchedAdjoint{T, S}
Lazy wrappers analogous to Transpose
and Adjoint
, returned by batched_transpose
etc.