leavepout
function
defined in module
MLUtils
leavepout(n::Integer, [size = 1]) -> Tuple
Compute the train/validation assignments for
k ≈ n/size
repartitions of
n
observations, and return them in the form of two vectors. The first vector contains the index-vectors for the training subsets, and the second vector the index-vectors for the validation subsets respectively. Each validation subset will have either
size
or
size+1
observations assigned to it. The following code snippet generates the index-vectors for
size = 2
.
julia
>
train_idx
,
val_idx
=
leavepout
(
10
,
2
)
;
Each observation is assigned to the validation subset once (and only once). Thus, a union over all validation index-vectors reproduces the full range
1:n
. Note that there is no random assignment of observations to subsets, which means that adjacent observations are likely to be part of the same validation subset.
julia> train_idx
5-element Array{Array{Int64,1},1}:
[3,4,5,6,7,8,9,10]
[1,2,5,6,7,8,9,10]
[1,2,3,4,7,8,9,10]
[1,2,3,4,5,6,9,10]
[1,2,3,4,5,6,7,8]
julia> val_idx
5-element Array{UnitRange{Int64},1}:
1:2
3:4
5:6
7:8
9:10
leavepout(data, p = 1)
Repartition a
data
container using a k-fold strategy, where
k
is chosen in such a way, that each validation subset of the resulting folds contains roughly
p
observations. Defaults to
p = 1
, which is also known as "leave-one-out" partitioning.
The resulting sequence of folds is returned as a lazy iterator. Only data subsets are created. That means no actual data is copied until
getobs
is invoked.
for
(
train
,
val
)
in
leavepout
(
X
,
p
=
2
)
# if nobs(X) is dividable by 2,
# then numobs(val) will be 2 for each iteraton,
# otherwise it may be 3 for the first few iterations.
end
See
kfolds
for a related function.
There are
3
methods for MLUtils.leavepout
:
The following pages link back here: