kfolds
function defined in module
MLUtils
kfolds(n::Integer, k = 5) -> Tuple
Compute the train/validation assignments for
k repartitions of
n observations, and return them in the form of two vectors. The first vector contains the index-vectors for the training subsets, and the second vector the index-vectors for the validation subsets respectively. A general rule of thumb is to use either
k = 5 or
k = 10. The following code snippet generates the indices assignments for
k = 5
julia
>
train_idx
,
val_idx
=
kfolds
(
10
,
5
)
;
Each observation is assigned to the validation subset once (and only once). Thus, a union over all validation index-vectors reproduces the full range
1:n. Note that there is no random assignment of observations to subsets, which means that adjacent observations are likely to be part of the same validation subset.
julia> train_idx
5-element Array{Array{Int64,1},1}:
[3,4,5,6,7,8,9,10]
[1,2,5,6,7,8,9,10]
[1,2,3,4,7,8,9,10]
[1,2,3,4,5,6,9,10]
[1,2,3,4,5,6,7,8]
julia> val_idx
5-element Array{UnitRange{Int64},1}:
1:2
3:4
5:6
7:8
9:10
kfolds(data, [k = 5])
Repartition a
data container
k times using a
k folds strategy and return the sequence of folds as a lazy iterator. Only data subsets are created, which means that no actual data is copied until
getobs is invoked.
Conceptually, a k-folds repartitioning strategy divides the given
data into
k roughly equal-sized parts. Each part will serve as validation set once, while the remaining parts are used for training. This results in
k different partitions of
data.
In the case that the size of the dataset is not dividable by the specified
k, the remaining observations will be evenly distributed among the parts.
for
(
x_train
,
x_val
)
in
kfolds
(
X
,
k
=
10
)
# code called 10 times
# nobs(x_val) may differ up to ±1 over iterations
endMultiple variables are supported (e.g. for labeled data)
for
(
(
x_train
,
y_train
)
,
val
)
in
kfolds
(
(
X
,
Y
)
,
k
=
10
)
# ...
end
By default the folds are created using static splits. Use
shuffleobs to randomly assign observations to the folds.
for
(
x_train
,
x_val
)
in
kfolds
(
shuffleobs
(
X
)
,
k
=
10
)
# ...
end
See
leavepout for a related function.
There are
3
methods for MLUtils.kfolds:
The following pages link back here: