kfolds
function
defined in module
MLUtils
kfolds(n::Integer, k = 5) -> Tuple
Compute the train/validation assignments for
k
repartitions of
n
observations, and return them in the form of two vectors. The first vector contains the index-vectors for the training subsets, and the second vector the index-vectors for the validation subsets respectively. A general rule of thumb is to use either
k = 5
or
k = 10
. The following code snippet generates the indices assignments for
k = 5
julia
>
train_idx
,
val_idx
=
kfolds
(
10
,
5
)
;
Each observation is assigned to the validation subset once (and only once). Thus, a union over all validation index-vectors reproduces the full range
1:n
. Note that there is no random assignment of observations to subsets, which means that adjacent observations are likely to be part of the same validation subset.
julia> train_idx
5-element Array{Array{Int64,1},1}:
[3,4,5,6,7,8,9,10]
[1,2,5,6,7,8,9,10]
[1,2,3,4,7,8,9,10]
[1,2,3,4,5,6,9,10]
[1,2,3,4,5,6,7,8]
julia> val_idx
5-element Array{UnitRange{Int64},1}:
1:2
3:4
5:6
7:8
9:10
kfolds(data, [k = 5])
Repartition a
data
container
k
times using a
k
folds strategy and return the sequence of folds as a lazy iterator. Only data subsets are created, which means that no actual data is copied until
getobs
is invoked.
Conceptually, a k-folds repartitioning strategy divides the given
data
into
k
roughly equal-sized parts. Each part will serve as validation set once, while the remaining parts are used for training. This results in
k
different partitions of
data
.
In the case that the size of the dataset is not dividable by the specified
k
, the remaining observations will be evenly distributed among the parts.
for
(
x_train
,
x_val
)
in
kfolds
(
X
,
k
=
10
)
# code called 10 times
# nobs(x_val) may differ up to ±1 over iterations
end
Multiple variables are supported (e.g. for labeled data)
for
(
(
x_train
,
y_train
)
,
val
)
in
kfolds
(
(
X
,
Y
)
,
k
=
10
)
# ...
end
By default the folds are created using static splits. Use
shuffleobs
to randomly assign observations to the folds.
for
(
x_train
,
x_val
)
in
kfolds
(
shuffleobs
(
X
)
,
k
=
10
)
# ...
end
See
leavepout
for a related function.
There are
3
methods for MLUtils.kfolds
:
The following pages link back here: