Welcome! This section contains information on how to create your first machine learning model using Flux.
Flux is 100% pure-Julia stack and provides lightweight abstractions on top of Julia's native GPU and AD support. It makes the easy things easy while remaining fully hackable. Also, Flux has a next-generation Automatic Differentiation (AD) system Zygote.
Before you begin using Flux, you need to install Julia version 1.3 or later. For more information on installing Julia, see Download Julia.
After installing Julia, you can install Flux by running the following command in the Julia REPL:
julia> ] add Flux
Alternatively, you can run the following:
julia> using Pkg; Pkg.add("Flux")
In this tutorial, you'll create your first machine learning model using Flux. This is a simple linear regression model that attempts to recover a linear function by looking at noisy examples.
To import Flux add the following:
First, we'll write a function that generates our "true" data. We'll use to use Flux to recover
b_truth by looking only at examples of the
W_truth = [1 2 3 4 5; 5 4 3 2 1] b_truth = [-1.0; -2.0] ground_truth(x) = W_truth*x .+ b_truth
Next, we generate our training data by passing random vectors into the ground truth function. We'll also add Gaussian noise using
randn() so that it's not too easy for Flux to figure out the model.
x_train = [ 5 .* rand(5) for _ in 1:10_000 ] y_train = [ ground_truth(x) + 0.2 .* randn(2) for x in x_train ]
There are two important things to note in this example which differ from real machine learning problems:
Our variables are individual vectors, stored inside another vector. Usually, we would have a collection of N-dimensional arrays (N >= 2) as our data.
In a real learning scenario, we would not have access to our ground truth, only the training examples.
Next, we define the model we want to use to learn the data. We'll use the same form that we used for our training data:
model(x) = W*x .+ b
We need to set the parameters of the model (
b) to some initial values. It's fairly common to use random values, so we'll do that:
W = rand(2, 5) b = rand(2)
You can learn more about defining models in this video:
A loss function evaluates a machine learning model's performance. In other words, it measures how far the model is from its target prediction. Flux lets you define your own custom loss function, or you can use one of the Loss Functions that Flux provides.
For this example, we'll define a loss function that measures the squared distance from the predicted output to the actual output:
function loss(x, y) ŷ = model(x) sum((y .- ŷ).^2) end
You train a machine learning model by running an optimization algorithm (optimiser) that finds the best parameters (
b). The best parameters for a model are the ones that achieve the best score of the
loss function. Flux provides Optimisers that you can use to train a model.
For this tutorial, we'll use a classic gradient descent optimiser with learning rate η = 0.01:
opt = Descent(0.01)
Training a model is the process of computing the gradients with respect to the parameters for each input in the data. At every step, the optimiser updates all of the parameters until it finds a good value for them. This process can be written as a loop: we iterate over the examples in
y_train and update the model for each example.
To indicate that we want all derivatives of
b, we write
ps = Flux.params(W, b). This is a convenience function that Flux provides so that we don't have to explicitly list every gradient we want. Check out the section on Taking Gradients if you want to learn more about how this works.
We can now execute the training procedure for our model:
train_data = zip(x_train, y_train) ps = Flux.params(W, b) for (x,y) in train_data gs = Flux.gradient(ps) do loss(x,y) end Flux.Optimise.update!(opt, ps, gs) end
Note: With this pattern, it is easy to add more complex learning routines that make use of control flow, distributed compute, scheduling optimisations, etc. Note that the pattern above is a simple Julia for loop but it could also be replaced with a while loop.
While writing your own loop is powerful, sometimes you just want to do the simple thing without writing too much code. Flux lets you do this with Flux.train!, which runs one training epoch over a dataset.
Flux.train! computes gradients and updates model parameters for every sample or batch of samples. In our case, we could have replaced the above loop with the following statement:
Flux.train!(loss, Flux.params(W, b), train_data, opt)
For more ways to train a model in Flux, see Training.
The training loop we ran modified
b to be closer to the values used to generate the training data (
b). We can see how well we did by printing out the difference between the learned and actual matrices.
W maximum(abs, W .- W_truth)
Because the data and initialization are random, your results may vary slightly, but in most cases, the largest difference between the elements of learned and actual
W matrix is no more than 4%.
Finally, create a file with extension
.jl with the code above in any IDE and run it as
julia name-of-your-file.jl. You can use the Julia VSCode extension to edit and run Julia code. Alternatively, you can run Julia code on a Jupyter notebook (see IJulia). Here is the full version of the code:
using Flux # Define the ground truth model. We aim to recover W_truth and b_truth using # only examples of ground_truth() W_truth = [1 2 3 4 5; 5 4 3 2 1] b_truth = [-1.0; -2.0] ground_truth(x) = W_truth*x .+ b_truth # Generate the ground truth training data as vectors-of-vectors x_train = [ 5 .* rand(5) for _ in 1:10_000 ] y_train = [ ground_truth(x) + 0.2 .* randn(2) for x in x_train ] # Define and initialize the model we want to train model(x) = W*x .+ b W = rand(2, 5) b = rand(2) # Define pieces we need to train: loss function, optimiser, examples, and params function loss(x, y) ŷ = model(x) sum((y .- ŷ).^2) end opt = Descent(0.01) train_data = zip(x_train, y_train) ps = Flux.params(W, b) # Execute a training epoch for (x,y) in train_data gs = gradient(ps) do loss(x,y) end Flux.Optimise.update!(opt, ps, gs) end # An alternate way to execute a training epoch # Flux.train!(loss, Flux.params(W, b), train_data, opt) # Print out how well we did W maximum(abs, W .- W_truth)
Congratulations! You have created and trained a model using Flux. Now, you can continue exploring Flux's capabilities:
Flux Model Zoo contains various demonstrations of Flux.
JuliaAcademy offers introductory courses to Julia and Flux.
As you continue to progress through your Flux and Julia journey, please feel free to share it on Twitter and tag us, we would love to see what awesome things the #FluxML community is up to.