BatchView

struct defined in module MLUtils


			BatchView(data, batchsize; partial=true, collate=nothing)
BatchView(data; batchsize=1, partial=true, collate=nothing)

Create a view of the given data that represents it as a vector of batches. Each batch will contain an equal amount of observations in them. The batch-size can be specified using the parameter batchsize. In the case that the size of the dataset is not dividable by the specified batchsize, the remaining observations will be ignored if partial=false. If partial=true instead the last batch-size can be slightly smaller.

Note that any data access is delayed until getindex is called.

If used as an iterator, the object will iterate over the dataset once, effectively denoting an epoch.

For BatchView to work on some data structure, the type of the given variable data must implement the data container interface. See ObsView for more info.

Arguments

  • data : The object describing the dataset. Can be of any type as long as it implements getobs and numobs (see Details for more information).

  • batchsize : The batch-size of each batch. It is the number of observations that each batch must contain (except possibly for the last one).

  • partial : If partial=false and the number of observations is not divisible by the batch-size, then the last mini-batch is dropped.

  • collate: Batching behavior. If nothing (default), a batch is getobs(data, indices). If false, each batch is [getobs(data, i) for i in indices]. When true, applies batch to the vector of observations in a batch, recursively collating arrays in the last dimensions. See batch for more information and examples.

Examples


			
			
			
			using
			
			 

	
			MLUtils
			

			
			
			X
			,
			 
			Y
			 
			=
			 
			
			

	
			MLUtils
			.
			

	
			load_iris
			(
			)
			

			

			
			A
			 
			=
			 
			

			BatchView
			(
			X
			,
			 
			

	
			batchsize
			=
			30
			)
			

			
			@
			assert
			
			 
			
			typeof
			(
			A
			)
			 
			<:
			 

			BatchView
			 
			<:
			 
			AbstractVector
			

			
			@
			assert
			
			 
			
			eltype
			(
			A
			)
			 
			<:
			 
			
			SubArray
			{
			Float64
			,
			2
			}
			

			
			@
			assert
			
			 
			
			length
			(
			A
			)
			 
			==
			 
			5
			 
			# Iris has 150 observations
			

			
			@
			assert
			
			 
			
			size
			(
			
			A
			[
			1
			]
			)
			 
			==
			 
			
			(
			4
			,
			30
			)
			 
			# Iris has 4 features
			

			

			# 5 batches of size 30 observations
			

			
			for
			
			 
			x
			 
			in
			 
			

			BatchView
			(
			X
			,
			 
			

	
			batchsize
			=
			30
			)
			
			
    
			
			@
			assert
			
			 
			
			typeof
			(
			x
			)
			 
			<:
			 
			
			SubArray
			{
			Float64
			,
			2
			}
			
    
			
			@
			assert
			
			 
			

	
			numobs
			(
			x
			)
			 
			===
			 
			30
			

			end
			

			

			# 7 batches of size 20 observations
			

			# Note that the iris dataset has 150 observations,
			

			# which means that with a batchsize of 20, the last
			

			# 10 observations will be ignored
			

			
			for
			
			 
			
			(
			x
			,
			 
			y
			)
			 
			in
			 
			

			BatchView
			(
			
			(
			X
			,
			 
			Y
			)
			,
			 
			

	
			batchsize
			=
			20
			,
			 
			
			partial
			=
			false
			)
			
			
    
			
			@
			assert
			
			 
			
			typeof
			(
			x
			)
			 
			<:
			 
			
			SubArray
			{
			Float64
			,
			2
			}
			
    
			
			@
			assert
			
			 
			
			typeof
			(
			y
			)
			 
			<:
			 
			
			SubArray
			{
			String
			,
			1
			}
			
    
			
			@
			assert
			
			 
			

	
			numobs
			(
			x
			)
			 
			==
			 
			

	
			numobs
			(
			y
			)
			 
			==
			 
			20
			

			end
			

			

			# collate tuple observations
			

			
			for
			
			 
			
			(
			x
			,
			 
			y
			)
			 
			in
			 
			

			BatchView
			(
			
			(
			
			rand
			(
			10
			,
			 
			3
			)
			,
			 
			
			[
			
			"
			a
			"
			,
			 
			
			"
			b
			"
			,
			 
			
			"
			c
			"
			]
			)
			,
			 
			

	
			batchsize
			=
			2
			,
			 
			
			collate
			=
			true
			,
			 
			
			partial
			=
			false
			)
			
			
    
			
			@
			assert
			
			 
			
			size
			(
			x
			)
			 
			==
			 
			
			(
			10
			,
			 
			2
			)
			
    
			
			@
			assert
			
			 
			
			size
			(
			y
			)
			 
			==
			 
			
			(
			2
			,
			)
			

			end
			

			

			

			# randomly assign observations to one and only one batch.
			

			
			for
			
			 
			
			(
			x
			,
			 
			y
			)
			 
			in
			 
			

			BatchView
			(
			

	
			shuffleobs
			(
			
			(
			X
			,
			 
			Y
			)
			)
			,
			 
			

	
			batchsize
			=
			20
			)
			
			
    
			
			@
			assert
			
			 
			
			typeof
			(
			x
			)
			 
			<:
			 
			
			SubArray
			{
			Float64
			,
			2
			}
			
    
			
			@
			assert
			
			 
			
			typeof
			(
			y
			)
			 
			<:
			 
			
			SubArray
			{
			String
			,
			1
			}
			

			end
Methods

There is 1 method for MLUtils.BatchView: