`splitobs`

function defined in module MLUtils


			splitobs(n::Int; at) -> Tuple

Compute the indices for two or more disjoint subsets of the range 1:n with splits given by at.

Examples


			splitobs(data; at, shuffle=false) -> Tuple

Partition the data into two or more subsets. When at is a number (between 0 and 1) this specifies the proportion in the first subset. When at is a tuple, each entry specifies the proportion an a subset, with the last having 1-sum(at). In all there are length(at)+1 subsets returned.

If shuffle=true, randomly permute the observations before splitting.

Supports any datatype implementing the numobs and getobs interfaces -- including arrays, tuples & NamedTuples of arrays.

Examples


			julia> splitobs(permutedims(1:100); at=0.7)  # simple 70%-30% split, of a matrix
([1 2 … 69 70], [71 72 … 99 100])

julia> data = (x=ones(2,10), n=1:10)  # a NamedTuple, consistent last dimension
(x = [1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0], n = 1:10)

julia> splitobs(data, at=(0.5, 0.3))  # a 50%-30%-20% split, e.g. train/test/validation
((x = [1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0], n = 1:5), (x = [1.0 1.0 1.0; 1.0 1.0 1.0], n = 6:8), (x = [1.0 1.0; 1.0 1.0], n = 9:10))

julia> train, test = splitobs((permutedims(1.0:100.0), 101:200), at=0.7, shuffle=true);  # split a Tuple

julia> vec(test[1]) .+ 100 == test[2]
true

Methods

There are 2 methods for MLUtils.splitobs:

splitobs.jl:17

splitobs.jl:71

Backlinks

The following pages link back here:

Data containers

FastAI.jl , learner.jl , tasks/taskdata.jl , MLUtils.jl , splitobs.jl