ObsView
	
struct defined in module 
	MLUtils
			ObsView(data, [indices])
			Used to represent a subset of some 
			data of arbitrary type by storing which observation-indices the subset spans. Furthermore, subsequent subsettings are accumulated without needing to access actual data.
			The main purpose for the existence of 
			ObsView is to delay data access and movement until an actual batch of data (or single observation) is needed for some computation. This is particularily useful when the data is not located in memory, but on the hard drive or some remote location. In such a scenario one wants to load the required data only when needed.
			Any data access is delayed until 
			getindex is called, and even 
			getindex returns the result of 
	
			
			obsview which in general avoids data movement until 
	
			
			getobs is called. If used as an iterator, the view will iterate over the dataset once, effectively denoting an epoch. Each iteration will return a lazy subset to the current observation.
			
			
			data : The object describing the dataset. Can be of any type as long as it implements 
	
			
			getobs and 
	
			
			numobs (see Details for more information).
			
			
			indices : Optional. The index or indices of the observation(s) in 
			data that the subset should represent. Can be of type 
			Int or some subtype of 
			AbstractVector.
			
			
			getindex : Returns the observation(s) of the given index/indices. No data is copied aside from the required indices.
			
			
			numobs : Returns the total number observations in the subset.
			
			
			getobs : Returns the underlying data that the 
			ObsView represents at the given relative indices. Note that these indices are in "subset space", and in general will not directly correspond to the same indices in the underlying data set.
			For 
			ObsView to work on some data structure, the desired type 
			MyType must implement the following interface:
			
			getobs(data::MyType, idx) : Should return the observation(s) indexed by 
			idx. In what form is up to the user. Note that 
			idx can be of type 
			Int or 
			AbstractVector.
			
			numobs(data::MyType) : Should return the total number of observations in 
			data
The following methods can also be provided and are optional:
			
			getobs(data::MyType) : By default this function is the identity function. If that is not the behaviour that you want for your type, you need to provide this method as well.
			
			obsview(data::MyType, idx) : If your custom type has its own kind of subset type, you can return it here. An example for such a case are 
			SubArray for representing a subset of some 
			AbstractArray.
			
			getobs!(buffer, data::MyType, [idx]) : Inplace version of 
			getobs(data, idx). If this method is provided for 
			MyType, then 
			eachobs can preallocate a buffer that is then reused every iteration. Note: 
			buffer should be equivalent to the return value of 
			getobs(::MyType, ...), since this is how 
			buffer is preallocated by default.
			
			
			
			
			X
			,
			 
			Y
			 
			=
			 
			
			
	
			MLUtils
			.
			
	
			load_iris
			(
			)
			
			
			# The iris set has 150 observations and 4 features
			
			
			@
			assert
			
			 
			
			size
			(
			X
			)
			 
			==
			 
			
			(
			4
			,
			150
			)
			
			
			# Represents the 80 observations as a ObsView
			
			
			v
			 
			=
			 
			
			ObsView
			(
			X
			,
			 
			
			21
			:
			100
			)
			
			
			@
			assert
			
			 
			
	
			numobs
			(
			v
			)
			 
			==
			 
			80
			
			
			@
			assert
			
			 
			
			typeof
			(
			v
			)
			 
			<:
			 
			ObsView
			
			# getobs indexes into v
			
			
			@
			assert
			
			 
			
	
			getobs
			(
			v
			,
			 
			
			1
			:
			10
			)
			 
			==
			 
			
			X
			[
			:
			,
			 
			
			21
			:
			30
			]
			
			
			# Use `obsview` to avoid boxing into ObsView
			
			# for types that provide a custom "subset", such as arrays.
			
			# Here it instead creates a native SubArray.
			
			
			v
			 
			=
			 
			
	
			obsview
			(
			X
			,
			 
			
			1
			:
			100
			)
			
			
			@
			assert
			
			 
			
	
			numobs
			(
			v
			)
			 
			==
			 
			100
			
			
			@
			assert
			
			 
			
			typeof
			(
			v
			)
			 
			<:
			 
			SubArray
			
			
			# Also works for tuples of arbitrary length
			
			
			subset
			 
			=
			 
			
	
			obsview
			(
			
			(
			X
			,
			 
			Y
			)
			,
			 
			
			1
			:
			100
			)
			
			
			@
			assert
			
			 
			
	
			numobs
			(
			subset
			)
			 
			==
			 
			100
			
			
			@
			assert
			
			 
			
			typeof
			(
			subset
			)
			 
			<:
			 
			Tuple
			 
			# tuple of SubArray
			
			
			# Use as iterator
			
			
			for
			
			 
			x
			 
			in
			 
			
			ObsView
			(
			X
			)
			
			
    
			
			@
			assert
			
			 
			
			typeof
			(
			x
			)
			 
			<:
			 
			
			SubArray
			{
			Float64
			,
			1
			}
			
			end
			
			
			# iterate over each individual labeled observation
			
			
			for
			
			 
			
			(
			x
			,
			 
			y
			)
			 
			in
			 
			
			ObsView
			(
			
			(
			X
			,
			 
			Y
			)
			)
			
			
    
			
			@
			assert
			
			 
			
			typeof
			(
			x
			)
			 
			<:
			 
			
			SubArray
			{
			Float64
			,
			1
			}
			
    
			
			@
			assert
			
			 
			
			typeof
			(
			y
			)
			 
			<:
			 
			String
			
			end
			
			
			# same but in random order
			
			
			for
			
			 
			
			(
			x
			,
			 
			y
			)
			 
			in
			 
			
			ObsView
			(
			
	
			shuffleobs
			(
			
			(
			X
			,
			 
			Y
			)
			)
			)
			
			
    
			
			@
			assert
			
			 
			
			typeof
			(
			x
			)
			 
			<:
			 
			
			SubArray
			{
			Float64
			,
			1
			}
			
    
			
			@
			assert
			
			 
			
			typeof
			(
			y
			)
			 
			<:
			 
			String
			
			end
			
			
			# Indexing: take first 10 observations
			
			
			
			x
			,
			 
			y
			 
			=
			 
			
			
			ObsView
			(
			
			(
			X
			,
			 
			Y
			)
			)
			[
			
			1
			:
			10
			]There are
			4
			methods for MLUtils.ObsView:
		
The following pages link back here:
Keypoint regression, Performant data pipelines, fastai API comparison