GPU Support

Starting with v0.14, Flux doesn't force a specific GPU backend and the corresponding package dependencies on the users. Thanks to the package extension mechanism introduced in julia v1.9, Flux conditionally loads GPU specific code once a GPU package is made available (e.g. through using CUDA).

NVIDIA GPU support requires the packages CUDA.jl and cuDNN.jl to be installed in the environment. In the julia REPL, type ] add CUDA, cuDNN to install them. For more details see the CUDA.jl readme.

AMD GPU support is available since Julia 1.9 on systems with ROCm and MIOpen installed. For more details refer to the AMDGPU.jl repository.

Metal GPU acceleration is available on Apple Silicon hardware. For more details refer to the Metal.jl repository. Metal support in Flux is experimental and many features are not yet available.

In order to trigger GPU support in Flux, you need to call using CUDA, using AMDGPU or using Metal in your code. Notice that for CUDA, explicitly loading also cuDNN is not required, but the package has to be installed in the environment.

Flux ≤ 0.13

Old versions of Flux automatically installed CUDA.jl to provide GPU support. Starting from Flux v0.14, CUDA.jl is not a dependency anymore and has to be installed manually.

Checking GPU Availability

By default, Flux will run the checks on your system to see if it can support GPU functionality. You can check if Flux identified a valid GPU setup by typing the following:

julia> using CUDA

julia> CUDA.functional()
true

For AMD GPU:

julia> using AMDGPU

julia> AMDGPU.functional()
true

julia> AMDGPU.functional(:MIOpen)
true

For Metal GPU:

julia> using Metal

julia> Metal.functional()
true

Selecting GPU backend

Available GPU backends are: CUDA, AMDGPU and Metal.

Flux relies on Preferences.jl for selecting default GPU backend to use.

There are two ways you can specify it:

  • From the REPL/code in your project, call Flux.gpu_backend!("AMDGPU") and restart (if needed) Julia session for the changes to take effect.
  • In LocalPreferences.toml file in you project directory specify:
[Flux]
gpu_backend = "AMDGPU"

Current GPU backend can be fetched from Flux.GPU_BACKEND variable:

julia> Flux.GPU_BACKEND
"CUDA"

The current backend will affect the behaviour of methods like the method gpu described below.

Basic GPU Usage

Support for array operations on other hardware backends, like GPUs, is provided by external packages like CUDA.jl, AMDGPU.jl, and Metal.jl. Flux is agnostic to array types, so we simply need to move model weights and data to the GPU and Flux will handle it.

For example, we can use CUDA.CuArray (with the cu converter) to run our basic example on an NVIDIA GPU.

(Note that you need to have CUDA available to use CUDA.CuArray – please see the CUDA.jl instructions for more details.)

using CUDA

W = cu(rand(2, 5)) # a 2×5 CuArray
b = cu(rand(2))

predict(x) = W*x .+ b
loss(x, y) = sum((predict(x) .- y).^2)

x, y = cu(rand(5)), cu(rand(2)) # Dummy data
loss(x, y) # ~ 3

Note that we convert both the parameters (W, b) and the data set (x, y) to cuda arrays. Taking derivatives and training works exactly as before.

If you define a structured model, like a Dense layer or Chain, you just need to convert the internal parameters. Flux provides fmap, which allows you to alter all parameters of a model at once.

d = Dense(10 => 5, σ)
d = fmap(cu, d)
d.weight # CuArray
d(cu(rand(10))) # CuArray output

m = Chain(Dense(10 => 5, σ), Dense(5 => 2), softmax)
m = fmap(cu, m)
m(cu(rand(10)))

As a convenience, Flux provides the gpu function to convert models and data to the GPU if one is available. By default, it'll do nothing. So, you can safely call gpu on some data or model (as shown below), and the code will not error, regardless of whether the GPU is available or not. If a GPU library (e.g. CUDA) loads successfully, gpu will move data from the CPU to the GPU. As is shown below, this will change the type of something like a regular array to a CuArray.

julia> using Flux, CUDA

julia> m = Dense(10, 5) |> gpu
Dense(10 => 5)      # 55 parameters

julia> x = rand(10) |> gpu
10-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 0.066846445
 ⋮
 0.76706964

julia> m(x)
5-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 -0.99992573
 ⋮
 -0.547261

The analogue cpu is also available for moving models and data back off of the GPU.

julia> x = rand(10) |> gpu
10-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 0.8019236
 ⋮
 0.7766742

julia> x |> cpu
10-element Vector{Float32}:
 0.8019236
 ⋮
 0.7766742

Transferring Training Data

In order to train the model using the GPU both model and the training data have to be transferred to GPU memory. Moving the data can be done in two different ways:

  1. Iterating over the batches in a DataLoader object transferring each one of the training batches at a time to the GPU. This is recommended for large datasets. Done by hand, it might look like this:

    train_loader = Flux.DataLoader((X, Y), batchsize=64, shuffle=true)
    # ... model definition, optimiser setup
    for epoch in 1:epochs
        for (x_cpu, y_cpu) in train_loader
            x = gpu(x_cpu)
            y = gpu(y_cpu)
            grads = gradient(m -> loss(m, x, y), model)
            Flux.update!(opt_state, model, grads[1])
        end
    end

    Rather than write this out every time, you can just call gpu(::DataLoader):

    gpu_train_loader = Flux.DataLoader((X, Y), batchsize=64, shuffle=true) |> gpu
    # ... model definition, optimiser setup
    for epoch in 1:epochs
        for (x, y) in gpu_train_loader
            grads = gradient(m -> loss(m, x, y), model)
            Flux.update!(opt_state, model, grads[1])
        end
    end

    This is equivalent to DataLoader(MLUtils.mapobs(gpu, (X, Y)); keywords...). Something similar can also be done with CUDA.CuIterator, gpu_train_loader = CUDA.CuIterator(train_loader). However, this only works with a limited number of data types: first(train_loader) should be a tuple (or NamedTuple) of arrays.

  2. Transferring all training data to the GPU at once before creating the DataLoader. This is usually performed for smaller datasets which are sure to fit in the available GPU memory.

    gpu_train_loader = Flux.DataLoader((X, Y) |> gpu, batchsize = 32)
    # ...
    for epoch in 1:epochs
        for (x, y) in gpu_train_loader
            # ...

    Here (X, Y) |> gpu applies gpu to both arrays, as it recurses into structures.

Saving GPU-Trained Models

After the training process is done, one must always transfer the trained model back to the cpu memory scope before serializing or saving to disk. This can be done, as described in the previous section, with:

model = cpu(model) # or model = model |> cpu

and then

using BSON
# ...
BSON.@save "./path/to/trained_model.bson" model

# in this approach the cpu-transferred model (referenced by the variable `model`)
# only exists inside the `let` statement
let model = cpu(model)
   # ...
   BSON.@save "./path/to/trained_model.bson" model
end

# is equivalent to the above, but uses `key=value` storing directive from BSON.jl
BSON.@save "./path/to/trained_model.bson" model = cpu(model)

The reason behind this is that models trained in the GPU but not transferred to the CPU memory scope will expect CuArrays as input. In other words, Flux models expect input data coming from the same kind device in which they were trained on.

In controlled scenarios in which the data fed to the loaded models is garanteed to be in the GPU there's no need to transfer them back to CPU memory scope, however in production environments, where artifacts are shared among different processes, equipments or configurations, there is no garantee that the CUDA.jl package will be available for the process performing inference on the model loaded from the disk.

Disabling CUDA or choosing which GPUs are visible to Flux

Sometimes it is required to control which GPUs are visible to julia on a system with multiple GPUs or disable GPUs entirely. This can be achieved with an environment variable CUDA_VISIBLE_DEVICES.

To disable all devices:

$ export CUDA_VISIBLE_DEVICES='-1'

To select specific devices by device id:

$ export CUDA_VISIBLE_DEVICES='0,1'

More information for conditional use of GPUs in CUDA.jl can be found in its documentation, and information about the specific use of the variable is described in the Nvidia CUDA blog post.

Using device objects

As a more convenient syntax, Flux allows the usage of GPU device objects which can be used to easily transfer models to GPUs (and defaulting to using the CPU if no GPU backend is available). This syntax has a few advantages including automatic selection of the GPU backend and type stability of data movement. These features are provided by MLDataDevices.jl package, that Flux's uses internally and re-exports.

A device object can be created using the gpu_device function. gpu_device first checks for a GPU preference, and if possible returns a device for the preference backend. For instance, consider the following example, where we load the CUDA.jl package to use an NVIDIA GPU ("CUDA" is the default preference):

julia> using Flux, CUDA;

julia> device = gpu_device()   # returns handle to an NVIDIA GPU if available
(::CUDADevice{Nothing}) (generic function with 4 methods)

julia> model = Dense(2 => 3);

julia> model.weight     # the model initially lives in CPU memory
3×2 Matrix{Float32}:
 -0.984794  -0.904345
  0.720379  -0.486398
  0.851011  -0.586942

julia> model = model |> device      # transfer model to the GPU
Dense(2 => 3)       # 9 parameters

julia> model.weight
3×2 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 -0.984794  -0.904345
  0.720379  -0.486398
  0.851011  -0.586942

The device preference can also be set via the gpu_backend! function. For instance, below we first set our device preference to "AMDGPU":

julia> gpu_backend!("AMDGPU")
[ Info: GPU backend has been set to AMDGPU. Restart Julia to use the new backend.

If no functional GPU backend is available, the device will default to a CPU device. You can also explictly request a CPU device by calling the cpu_device function.

julia> using Flux, MLDataDevices

julia> cdev = cpu_device()
(::CPUDevice{Nothing}) (generic function with 4 methods)

julia> gdev = gpu_device(force=true)   # force GPU device, error if no GPU is available
(::CUDADevice{Nothing}) (generic function with 4 methods)

julia> model = Dense(2 => 3);     # model in CPU memory

julia> gmodel = model |> gdev;    # transfer model to GPU

julia> cmodel = gmodel |> cdev;   # transfer model back to CPU

Data movement across GPU devices

Flux also supports getting handles to specific GPU devices, and transferring models from one GPU device to another GPU device from the same backend. Let's try it out for NVIDIA GPUs. First, we list all the available devices:

julia> using Flux, CUDA;

julia> CUDA.devices()
CUDA.DeviceIterator() for 3 devices:
0. NVIDIA TITAN RTX
1. NVIDIA TITAN RTX
2. NVIDIA TITAN RTX

Then, let's select the device with id 0:

julia> device0 = gpu_device(1)
(::CUDADevice{CuDevice}) (generic function with 4 methods)

julia> device0.device
CuDevice(0): NVIDIA TITAN RTX

Notice that indexing starts from 0 in the CUDA.devices() output, but gpu_device! expects the device id starting from 1.

Then, let's move a simple dense layer to the GPU represented by device0:

julia> dense_model = Dense(2 => 3)
Dense(2 => 3)       # 9 parameters

julia> dense_model = dense_model |> device0;

julia> dense_model.weight
3×2 CuArray{Float32, 2, CUDA.DeviceMemory}:
 -0.142062  -0.131455
 -0.828134  -1.06552
  0.608595  -1.05375

julia> CUDA.device(dense_model.weight)      # check the GPU to which dense_model is attached
CuDevice(0): NVIDIA TITAN RTX

Next, we'll get a handle to the device with id 1, and move dense_model to that device:

julia> device1 = gpu_device(2)
(::CUDADevice{CuDevice}) (generic function with 4 methods)

julia> dense_model = dense_model |> device1;    # don't directly print the model; see warning below

julia> CUDA.device(dense_model.weight)
CuDevice(1): NVIDIA TITAN RTX

Due to a limitation in Metal.jl, currently this kind of data movement across devices is only supported for CUDA and AMDGPU backends.

Printing models after moving to a different device

Due to a limitation in how GPU packages currently work, printing models on the REPL after moving them to a GPU device which is different from the current device will lead to an error.

MLDataDevices.get_deviceFunction
get_device(x) -> dev::AbstractDevice | Exception | Nothing

If all arrays (on the leaves of the structure) are on the same device, we return that device. Otherwise, we throw an error. If the object is device agnostic, we return nothing.

Note

Trigger Packages must be loaded for this to return the correct device.

Warning

RNG types currently don't participate in device determination. We will remove this restriction in the future.

See also get_device_type for a faster alternative that can be used for dispatch based on device type.

source
MLDataDevices.gpu_deviceFunction
gpu_device(device_id::Union{Nothing, Integer}=nothing;
    force::Bool=false) -> AbstractDevice

Selects GPU device based on the following criteria:

  1. If gpu_backend preference is set and the backend is functional on the system, then that device is selected.
  2. Otherwise, an automatic selection algorithm is used. We go over possible device backends in the order specified by supported_gpu_backends() and select the first functional backend.
  3. If no GPU device is functional and force is false, then cpu_device() is invoked.
  4. If nothing works, an error is thrown.

Arguments

  • device_id::Union{Nothing, Integer}: The device id to select. If nothing, then we return the last selected device or if none was selected then we run the autoselection and choose the current device using CUDA.device() or AMDGPU.device() or similar. If Integer, then we select the device with the given id. Note that this is 1-indexed, in contrast to the 0-indexed CUDA.jl. For example, id = 4 corresponds to CUDA.device!(3).
Warning

device_id is only applicable for CUDA and AMDGPU backends. For Metal, oneAPI and CPU backends, device_id is ignored and a warning is printed.

Warning

gpu_device won't select a CUDA device unless both CUDA.jl and cuDNN.jl are loaded. This is to ensure that deep learning operations work correctly. Nonetheless, if cuDNN is not loaded you can still manually create a CUDADevice object and use it (e.g. dev = CUDADevice()).

Keyword Arguments

  • force::Bool: If true, then an error is thrown if no functional GPU device is found.
source
MLDataDevices.gpu_backend!Function
gpu_backend!() = gpu_backend!("")
gpu_backend!(backend) = gpu_backend!(string(backend))
gpu_backend!(backend::AbstractGPUDevice)
gpu_backend!(backend::String)

Creates a LocalPreferences.toml file with the desired GPU backend.

If backend == "", then the gpu_backend preference is deleted. Otherwise, backend is validated to be one of the possible backends and the preference is set to backend.

If a new backend is successfully set, then the Julia session must be restarted for the change to take effect.

source
MLDataDevices.get_device_typeFunction
get_device_type(x) -> Type{<:AbstractDevice} | Exception | Type{Nothing}

Similar to get_device but returns the type of the device instead of the device itself. This value is often a compile time constant and is recommended to be used instead of get_device where ever defining dispatches based on the device type.

Note

Trigger Packages must be loaded for this to return the correct device.

Warning

RNG types currently don't participate in device determination. We will remove this restriction in the future.

source
MLDataDevices.supported_gpu_backendsFunction
supported_gpu_backends() -> Tuple{String, ...}

Return a tuple of supported GPU backends.

Warning

This is not the list of functional backends on the system, but rather backends which MLDataDevices.jl supports.

source
MLDataDevices.DeviceIteratorType
DeviceIterator(dev::AbstractDevice, iterator)

Create a DeviceIterator that iterates through the provided iterator via iterate. Upon each iteration, the current batch is copied to the device dev, and the previous iteration is marked as freeable from GPU memory (via unsafe_free!) (no-op for a CPU device).

The conversion follows the same semantics as dev(<item from iterator>).

Similarity to `CUDA.CuIterator`

The design inspiration was taken from CUDA.CuIterator and was generalized to work with other backends and more complex iterators (using Functors).

`MLUtils.DataLoader`

Calling dev(::MLUtils.DataLoader) will automatically convert the dataloader to use the same semantics as DeviceIterator. This is generally preferred over looping over the dataloader directly and transferring the data to the device.

Examples

The following was run on a computer with an NVIDIA GPU.

julia> using MLDataDevices, MLUtils

julia> X = rand(Float64, 3, 33);

julia> dataloader = DataLoader(X; batchsize=13, shuffle=false);

julia> for (i, x) in enumerate(dataloader)
           @show i, summary(x)
       end
(i, summary(x)) = (1, "3×13 Matrix{Float64}")
(i, summary(x)) = (2, "3×13 Matrix{Float64}")
(i, summary(x)) = (3, "3×7 Matrix{Float64}")

julia> for (i, x) in enumerate(CUDADevice()(dataloader))
           @show i, summary(x)
       end
(i, summary(x)) = (1, "3×13 CuArray{Float32, 2, CUDA.DeviceMemory}")
(i, summary(x)) = (2, "3×13 CuArray{Float32, 2, CUDA.DeviceMemory}")
(i, summary(x)) = (3, "3×7 CuArray{Float32, 2, CUDA.DeviceMemory}")
source

Distributed data parallel training

Experimental

Distributed support is experimental and could change in the future.

Flux supports now distributed data parallel training with DistributedUtils module. If you want to run your code on multiple GPUs, you have to install MPI.jl (see docs for more info).

julia> using MPI

julia> MPI.install_mpiexecjl()

Now you can run your code with mpiexecjl --project=. -n <np> julia <filename>.jl from CLI.

You can use either the MPIBackend or NCCLBackend, the latter only if also NCCL.jl is loaded. First, initialize a backend with DistributedUtils.initialize, e.g.

julia> using Flux, MPI, NCCL, CUDA

julia> CUDA.allowscalar(false)

julia> DistributedUtils.initialize(NCCLBackend)

julia> backend = DistributedUtils.get_distributed_backend(NCCLBackend)
NCCLBackend{Communicator, MPIBackend{MPI.Comm}}(Communicator(Ptr{NCCL.LibNCCL.ncclComm} @0x000000000607a660), MPIBackend{MPI.Comm}(MPI.Comm(1140850688)))

Pass your model, as well as any data to GPU device.

julia> model = Chain(Dense(1 => 256, tanh), Dense(256 => 1)) |> gpu
Chain(
  Dense(1 => 256, tanh),                # 512 parameters
  Dense(256 => 1),                      # 257 parameters
)                   # Total: 4 arrays, 769 parameters, 744 bytes.

julia> x = rand(Float32, 1, 16) |> gpu
1×16 CUDA.CuArray{Float32, 2, CUDA.DeviceMemory}:
 0.239324  0.331029  0.924996  0.55593  0.853093  0.874513  0.810269  0.935858  0.477176  0.564591  0.678907  0.729682  0.96809  0.115833  0.66191  0.75822

julia> y = x .^ 3
1×16 CUDA.CuArray{Float32, 2, CUDA.DeviceMemory}:
 0.0137076  0.0362744  0.791443  0.171815  0.620854  0.668804  0.53197  0.819654  0.108651  0.179971  0.312918  0.388508  0.907292  0.00155418  0.29  0.435899

In this case, we are training on a total of 16 * number of processes samples. You can also use DistributedUtils.DistributedDataContainer to split the data uniformly across processes (or do it manually).

julia> data = DistributedUtils.DistributedDataContainer(backend, x)
Flux.DistributedUtils.DistributedDataContainer(Float32[0.23932439 0.33102947 … 0.66191036 0.75822026], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])

You have to wrap your model in DistributedUtils.FluxDistributedModel and synchronize it (broadcast accross all processes):

julia> model = DistributedUtils.synchronize!!(backend, DistributedUtils.FluxDistributedModel(model); root=0)
Chain(
  Dense(1 => 256, tanh),                # 512 parameters

  Dense(256 => 1),                      # 257 parameters
)                   # Total: 4 arrays, 769 parameters, 744 bytes.

Time to set up an optimizer by using DistributedUtils.DistributedOptimizer and synchronize it as well.

julia> using Optimisers

julia> opt = DistributedUtils.DistributedOptimizer(backend, Optimisers.Adam(0.001f0))
DistributedOptimizer{MPIBackend{Comm}}(MPIBackend{Comm}(Comm(1140850688)), Adam(0.001, (0.9, 0.999), 1.0e-8))

julia> st_opt = Optimisers.setup(opt, model)
(layers = ((weight = Leaf(DistributedOptimizer{MPIBackend{Comm}}(MPIBackend{Comm}(Comm(1140850688)), Adam(0.001, (0.9, 0.999), 1.0e-8)), (Float32[0.0; 0.0; … ; 0.0; 0.0;;], Float32[0.0; 0.0; … ; 0.0; 0.0;;], (0.9, 0.999))), bias = Leaf(DistributedOptimizer{MPIBackend{Comm}}(MPIBackend{Comm}(Comm(1140850688)), Adam(0.001, (0.9, 0.999), 1.0e-8)), (Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], (0.9, 0.999))), σ = ()), (weight = Leaf(DistributedOptimizer{MPIBackend{Comm}}(MPIBackend{Comm}(Comm(1140850688)), Adam(0.001, (0.9, 0.999), 1.0e-8)), (Float32[0.0 0.0 … 0.0 0.0], Float32[0.0 0.0 … 0.0 0.0], (0.9, 0.999))), bias = Leaf(DistributedOptimizer{MPIBackend{Comm}}(MPIBackend{Comm}(Comm(1140850688)), Adam(0.001, (0.9, 0.999), 1.0e-8)), (Float32[0.0], Float32[0.0], (0.9, 0.999))), σ = ())),)

julia> st_opt = DistributedUtils.synchronize!!(backend, st_opt; root=0) 
(layers = ((weight = Leaf(DistributedOptimizer{MPIBackend{Comm}}(MPIBackend{Comm}(Comm(1140850688)), Adam(0.001, (0.9, 0.999), 1.0e-8)), (Float32[0.0; 0.0; … ; 0.0; 0.0;;], Float32[0.0; 0.0; … ; 0.0; 0.0;;], (0.9, 0.999))), bias = Leaf(DistributedOptimizer{MPIBackend{Comm}}(MPIBackend{Comm}(Comm(1140850688)), Adam(0.001, (0.9, 0.999), 1.0e-8)), (Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], (0.9, 0.999))), σ = ()), (weight = Leaf(DistributedOptimizer{MPIBackend{Comm}}(MPIBackend{Comm}(Comm(1140850688)), Adam(0.001, (0.9, 0.999), 1.0e-8)), (Float32[0.0 0.0 … 0.0 0.0], Float32[0.0 0.0 … 0.0 0.0], (0.9, 0.999))), bias = Leaf(DistributedOptimizer{MPIBackend{Comm}}(MPIBackend{Comm}(Comm(1140850688)), Adam(0.001, (0.9, 0.999), 1.0e-8)), (Float32[0.0], Float32[0.0], (0.9, 0.999))), σ = ())),)

Now you can define loss and train the model.

julia> loss(model) = mean((model(x) .- y).^2)
loss (generic function with 1 method)

julia> for epoch in 1:100
           global model, st_opt
           l, grad = Zygote.withgradient(loss, model)
           println("Epoch $epoch: Loss $l")
           st_opt, model = Optimisers.update(st_opt, model, grad[1])
         end
Epoch 1: Loss 0.011638729
Epoch 2: Loss 0.0116432225
Epoch 3: Loss 0.012763695
...

Remember that in order to run it on multiple GPUs you have to run from CLI mpiexecjl --project=. -n <np> julia <filename>.jl, where <np> is the number of processes that you want to use. The number of processes usually corresponds to the number of gpus.

By default MPI.jl MPI installation is CUDA-unaware so if you want to run it in CUDA-aware mode, read more here on custom installation and rebuilding MPI.jl. Then test if your MPI is CUDA-aware by

julia> import Pkg
julia> Pkg.test("MPI"; test_args=["--backend=CUDA"])

If it is, set your local preference as below

julia> using Preferences
julia> set_preferences!("Flux", "FluxDistributedMPICUDAAware" => true)
Known shortcomings

We don't run CUDA-aware tests so you're running it at own risk.