Model Templates

... Calculating Tax Expenses ...

So how does the Affine template work? We don't want to duplicate the code above whenever we need more than one affine layer:

W₁, b₁ = randn(...)
affine₁(x) = W₁*x + b₁
W₂, b₂ = randn(...)
affine₂(x) = W₂*x + b₂
model = Chain(affine₁, affine₂)

Here's one way we could solve this: just keep the parameters in a Julia type, and define how that type acts as a function:

type MyAffine
  W
  b
end

# Use the `MyAffine` layer as a model
(l::MyAffine)(x) = l.W * x + l.b

# Convenience constructor
MyAffine(in::Integer, out::Integer) =
  MyAffine(randn(out, in), randn(out))

model = Chain(MyAffine(5, 5), MyAffine(5, 5))

model(x1) # [-1.54458,0.492025,0.88687,1.93834,-4.70062]

This is much better: we can now make as many affine layers as we want. This is a very common pattern, so to make it more convenient we can use the @net macro:

@net type MyAffine
  W
  b
  x -> x * W + b
end

The function provided, x -> x * W + b , will be used when MyAffine is used as a model; it's just a shorter way of defining the (::MyAffine)(x) method above. (You may notice that W and x have swapped order in the model; this is due to the way batching works, which will be covered in more detail later on.)

However, @net does not simply save us some keystrokes; it's the secret sauce that makes everything else in Flux go. For example, it analyses the code for the forward function so that it can differentiate it or convert it to a TensorFlow graph.

The above code is almost exactly how Affine is defined in Flux itself! There's no difference between "library-level" and "user-level" models, so making your code reusable doesn't involve a lot of extra complexity. Moreover, much more complex models than Affine are equally simple to define.

Models in templates

@net models can contain sub-models as well as just array parameters:

@net type TLP
  first
  second
  function (x)
    l1 = σ(first(x))
    l2 = softmax(second(l1))
  end
end

Just as above, this is roughly equivalent to writing:

type TLP
  first
  second
end

function (self::TLP)(x)
  l1 = σ(self.first(x))
  l2 = softmax(self.second(l1))
end

Clearly, the first and second parameters are not arrays here, but should be models themselves, and produce a result when called with an input array x . The Affine layer fits the bill, so we can instantiate TLP with two of them:

model = TLP(Affine(10, 20),
            Affine(20, 15))
x1 = rand(20)
model(x1) # [0.057852,0.0409741,0.0609625,0.0575354 ...

You may recognise this as being equivalent to

Chain(
  Affine(10, 20), σ
  Affine(20, 15), softmax)

given that it's just a sequence of calls. For simple networks Chain is completely fine, although the @net version is more powerful as we can (for example) reuse the output l1 more than once.

Constructors

Affine has two array parameters, W and b . Just like any other Julia type, it's easy to instantiate an Affine layer with parameters of our choosing:

a = Affine(rand(10, 20), rand(20))

However, for convenience and to avoid errors, we'd probably rather specify the input and output dimension instead:

a = Affine(10, 20)

This is easy to implement using the usual Julia syntax for constructors:

Affine(in::Integer, out::Integer) =
  Affine(randn(in, out), randn(1, out))

In practice, these constructors tend to take the parameter initialisation function as an argument so that it's more easily customisable, and use Flux.initn by default (which is equivalent to randn(...)/100 ). So Affine 's constructor really looks like this:

Affine(in::Integer, out::Integer; init = initn) =
  Affine(init(in, out), init(1, out))

Supported syntax

The syntax used to define a forward pass like x -> x*W + b behaves exactly like Julia code for the most part. However, it's important to remember that it's defining a dataflow graph, not a general Julia expression. In practice this means that anything side-effectful, or things like control flow and println s, won't work as expected. In future we'll continue to expand support for Julia syntax and features.