splitobs

function defined in module MLUtils


			splitobs(n::Int; at) -> Tuple

Compute the indices for two or more disjoint subsets of the range 1:n with splits given by at.

Examples


			
			
			
			julia
			>
			 
			

			splitobs
			(
			100
			,
			 
			
			at
			=
			0.7
			)
			

			
			(
			
			1
			:
			70
			,
			 
			
			71
			:
			100
			)
			

			

			
			julia
			>
			 
			

			splitobs
			(
			100
			,
			 
			
			at
			=
			
			(
			0.1
			,
			 
			0.4
			)
			)
			

			
			(
			
			1
			:
			10
			,
			 
			
			11
			:
			50
			,
			 
			
			51
			:
			100
			)

			splitobs(data; at, shuffle=false) -> Tuple

Partition the data into two or more subsets. When at is a number (between 0 and 1) this specifies the proportion in the first subset. When at is a tuple, each entry specifies the proportion an a subset, with the last having 1-sum(at). In all there are length(at)+1 subsets returned.

If shuffle=true, randomly permute the observations before splitting.

Supports any datatype implementing the numobs and getobs interfaces -- including arrays, tuples & NamedTuples of arrays.

Examples


			julia> splitobs(permutedims(1:100); at=0.7)  # simple 70%-30% split, of a matrix
([1 2 … 69 70], [71 72 … 99 100])

julia> data = (x=ones(2,10), n=1:10)  # a NamedTuple, consistent last dimension
(x = [1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0], n = 1:10)

julia> splitobs(data, at=(0.5, 0.3))  # a 50%-30%-20% split, e.g. train/test/validation
((x = [1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0], n = 1:5), (x = [1.0 1.0 1.0; 1.0 1.0 1.0], n = 6:8), (x = [1.0 1.0; 1.0 1.0], n = 9:10))

julia> train, test = splitobs((permutedims(1.0:100.0), 101:200), at=0.7, shuffle=true);  # split a Tuple

julia> vec(test[1]) .+ 100 == test[2]
true
Methods

There are 2 methods for MLUtils.splitobs: