BatchView
struct
defined in module
MLUtils
BatchView(data, batchsize; partial=true, collate=nothing)
BatchView(data; batchsize=1, partial=true, collate=nothing)
Create a view of the given
data
that represents it as a vector of batches. Each batch will contain an equal amount of observations in them. The batch-size can be specified using the parameter
batchsize
. In the case that the size of the dataset is not dividable by the specified
batchsize
, the remaining observations will be ignored if
partial=false
. If
partial=true
instead the last batch-size can be slightly smaller.
Note that any data access is delayed until
getindex
is called.
If used as an iterator, the object will iterate over the dataset once, effectively denoting an epoch.
For
BatchView
to work on some data structure, the type of the given variable
data
must implement the data container interface. See
ObsView
for more info.
data
: The object describing the dataset. Can be of any type as long as it implements
getobs
and
numobs
(see Details for more information).
batchsize
: The batch-size of each batch. It is the number of observations that each batch must contain (except possibly for the last one).
partial
: If
partial=false
and the number of observations is not divisible by the batch-size, then the last mini-batch is dropped.
collate
: Batching behavior. If
nothing
(default), a batch is
getobs(data, indices)
. If
false
, each batch is
[getobs(data, i) for i in indices]
. When
true
, applies
batch
to the vector of observations in a batch, recursively collating arrays in the last dimensions. See
batch
for more information and examples.
using
MLUtils
X
,
Y
=
MLUtils
.
load_iris
(
)
A
=
BatchView
(
X
,
batchsize
=
30
)
@
assert
typeof
(
A
)
<:
BatchView
<:
AbstractVector
@
assert
eltype
(
A
)
<:
SubArray
{
Float64
,
2
}
@
assert
length
(
A
)
==
5
# Iris has 150 observations
@
assert
size
(
A
[
1
]
)
==
(
4
,
30
)
# Iris has 4 features
# 5 batches of size 30 observations
for
x
in
BatchView
(
X
,
batchsize
=
30
)
@
assert
typeof
(
x
)
<:
SubArray
{
Float64
,
2
}
@
assert
numobs
(
x
)
===
30
end
# 7 batches of size 20 observations
# Note that the iris dataset has 150 observations,
# which means that with a batchsize of 20, the last
# 10 observations will be ignored
for
(
x
,
y
)
in
BatchView
(
(
X
,
Y
)
,
batchsize
=
20
,
partial
=
false
)
@
assert
typeof
(
x
)
<:
SubArray
{
Float64
,
2
}
@
assert
typeof
(
y
)
<:
SubArray
{
String
,
1
}
@
assert
numobs
(
x
)
==
numobs
(
y
)
==
20
end
# collate tuple observations
for
(
x
,
y
)
in
BatchView
(
(
rand
(
10
,
3
)
,
[
"
a
"
,
"
b
"
,
"
c
"
]
)
,
batchsize
=
2
,
collate
=
true
,
partial
=
false
)
@
assert
size
(
x
)
==
(
10
,
2
)
@
assert
size
(
y
)
==
(
2
,
)
end
# randomly assign observations to one and only one batch.
for
(
x
,
y
)
in
BatchView
(
shuffleobs
(
(
X
,
Y
)
)
,
batchsize
=
20
)
@
assert
typeof
(
x
)
<:
SubArray
{
Float64
,
2
}
@
assert
typeof
(
y
)
<:
SubArray
{
String
,
1
}
end
There is
1
method for MLUtils.BatchView
: