Presizing vision datasets for performance

In this tutorial, we'll look at how we can improve the performance of computer vision data pipelines by presizing. Presizing is a dataset preprocessing step executed before any training in which all images are loaded and saved at a smaller size. This can improve performance if image loading is a bottleneck.

Presizing is useful when the original image sizes in a dataset are much larger than the size we use during training; if the images are downsized every time they are loaded anyway, a lot of work can be avoided by doing this once before training. For example, if we train an image classification model on resized image crops of size (160, 160) then we can presize every image so that the shorter length is at least 160.

Let's first look at the performance difference at this makes. Conveniently, the ImageNette dataset, comes in 3 sizes: original, 320px and 160px. Let's download them and create data containers that load the images and do nothing else:


			
			
			
			using
			
			 

	
			FastAI
			

			
			using
			
			 

	
			FastAI
			.

	
			Datasets
			

			

			
			p_160
			 
			=
			 
			
			load
			(
			

	
			datasets
			(
			
			[
			
			"
			imagenette2-160
			"
			]
			)
			)
			

			
			p_320
			 
			=
			 
			
			load
			(
			

	
			datasets
			(
			
			[
			
			"
			imagenette2-320
			"
			]
			)
			)
			

			
			p_orig
			 
			=
			 
			
			load
			(
			

	
			datasets
			(
			
			[
			
			"
			imagenette2
			"
			]
			)
			)

			p"/home/lorenz/.julia/datadeps/fastai-imagenette2/imagenette2"

			
			
			
			
			imagedatacontainer
			(
			dir
			)
			 
			=
			 
			

	
			mapobs
			(

	
			loadfile
			,
			 
			

	
			filterobs
			(

	
			isimagefile
			,
			 
			

	
			loadfolderdata
			(
			dir
			)
			)
			)
			

			

			
			data_160
			 
			=
			 
			
			imagedatacontainer
			(
			p_160
			)
			

			
			data_320
			 
			=
			 
			
			imagedatacontainer
			(
			p_320
			)
			

			
			data_orig
			 
			=
			 
			
			imagedatacontainer
			(
			p_orig
			)

			mapobs(loadfile, DataSubset(::loadfolderdata, ::Vector{Int64}, ObsDim.Undefined())
 13394 observations)

Now every observation is simply a single image, but the sizes vary:


			
			
			
			using
			
			 
			MosaicViews
			

			
			mosaicview
			(
			
			[
			

	
			getobs
			(
			data_160
			,
			 
			2000
			)
			,
			 
			

	
			getobs
			(
			data_320
			,
			 
			2000
			)
			,
			 
			

	
			getobs
			(
			data_orig
			,
			 
			2000
			)
			]
			
			;
			 
			
			nrow
			=
			1
			)

Let's see how long it takes to load a random subset of 100 images:


			
			
			
			idxs
			 
			=
			 
			
			rand
			(
			
			1
			:
			

	
			numobs
			(
			data_160
			)
			,
			 
			100
			)
			

			
			for
			
			 
			
			(
			name
			,
			 
			data
			)
			 
			in
			 
			
			zip
			(
			
			(
			
			"
			160px
			"
			,
			 
			
			"
			320px
			"
			,
			 
			
			"
			Original
			"
			)
			,
			 
			
			(
			data_160
			,
			 
			data_320
			,
			 
			data_orig
			)
			)
			
			
        
			
			println
			(
			
			"
			Size: 
			$
			name
			"
			)
			
    
			
			@
			time
			 
			
			for
			
			 
			i
			 
			in
			 
			idxs
			
			
        
			

	
			getobs
			(
			data
			,
			 
			i
			)
			
    
			end
			

			end

			Size: 160px
  0.339104 seconds (615.64 k allocations: 55.800 MiB, 61.72% compilation time)
Size: 320px
  0.471360 seconds (23.98 k allocations: 79.808 MiB, 3.75% gc time)
Size: Original
  0.982968 seconds (408.88 k allocations: 176.542 MiB, 2.34% gc time, 8.73% compilation time)

Quite a difference! The 320px version loads about 4 times slower than the 160px version; after all there are 4 times as many pixels.

Note that image loading is only a part of the data pipeline. Optimizing it with presizing only makes sense if it becomes a bottleneck. See Performant data pipelines for a more general discussion.

Implementing presizing

Next we'll look at how to do presizing ourselves. For an image classification dataset, this entails copying the folder structure but replacing every image with a downscaled version. Let's say, as above, we want to train a model on images of size (160, 160). Since we still want to use random crops during training, we don't want to do the cropping yet. Instead we downscale the image while preserving the aspect ratio so that the smallest side is still at least 160 pixels long.

The following function does just that for a single image:


			
			
			
			using
			
			 
			Images
			

			

			
			function
			 
			
			presizeimage
			(
			image
			,
			 
			sz
			)
			
			
    
			
			ratio
			 
			=
			 
			
			maximum
			(
			
			sz
			 
			./
			 
			
			size
			(
			image
			)
			)
			
    
			
			newsz
			 
			=
			 
			
			round
			.
			
			(
			Int
			,
			 
			
			
			size
			(
			image
			)
			 
			.*
			 
			ratio
			)
			
    
			
			σ
			 
			=
			
			 
			0.25
			 
			.*
			 
			(
			 
			
			1
			 
			./
			 
			
			(
			ratio
			,
			 
			ratio
			)
			)
			
    
			
			k
			 
			=
			 
			
			
			KernelFactors
			.
			
			gaussian
			(
			σ
			)
			
    
			
			return
			 
			
			imresize
			(
			
			imfilter
			(
			image
			,
			 
			k
			,
			 
			
			NA
			(
			)
			)
			,
			 
			newsz
			)
			

			end

			presizeimage (generic function with 1 method)

			
			
			
			SZ
			 
			=
			 
			
			(
			160
			,
			 
			160
			)
			

			
			image
			 
			=
			 
			

	
			getobs
			(
			data_orig
			,
			 
			2000
			)
			

			
			presizeimage
			(
			image
			,
			 
			SZ
			)

Now we need to run this over every image in a folder. To speed things up, we run this in parallel using Threads.@threads.


			
			
			
			using
			
			 
			FilePathsBase
			

			

			
			DSTDIR
			 
			=
			 
			
			Path
			(
			
			mktempdir
			(
			)
			)
			

			

			
			function
			 
			
			presizeimagedir
			(
			srcdir
			,
			 
			dstdir
			,
			 
			sz
			)
			
			
    
			
			pathdata
			 
			=
			 
			

	
			filterobs
			(

	
			isimagefile
			,
			 
			

	
			loadfolderdata
			(
			srcdir
			)
			)
			
    
			
    
			# create directories beforehand
			
    
			
			for
			
			 
			i
			 
			in
			
			 
			1
			:
			

	
			numobs
			(
			pathdata
			)
			
			
        
			
			mkpath
			(
			
			parent
			(
			

	
			getobs
			(
			pathdata
			,
			 
			i
			)
			)
			)
			
    
			end
			
    
			
    
			
			
			Threads
			.
			
			@
			threads
			 
			
			for
			
			 
			i
			 
			in
			
			 
			1
			:
			

	
			numobs
			(
			pathdata
			)
			
			
        
			
			srcp
			 
			=
			 
			

	
			getobs
			(
			pathdata
			,
			 
			i
			)
			
        
			
			p
			 
			=
			 
			
			relpath
			(
			srcp
			,
			 
			srcdir
			)
			
        
			
			dstp
			 
			=
			 
			
			joinpath
			(
			dstdir
			,
			 
			p
			)
			
        
			
        
			
			img
			 
			=
			 
			

	
			loadfile
			(
			srcp
			)
			
        
			
			img_presized
			 
			=
			 
			
			presizeimage
			(
			img
			,
			 
			sz
			)
			
        
			
			save
			(
			
			
			string
			(
			dstp
			)
			 
			*
			 
			
			"
			.jpg
			"
			,
			 
			img_presized
			)
			
    
			end
			  

			end

			presizeimagedir (generic function with 1 method)

			
			
			
			@
			time
			 
			
			presizeimagedir
			(
			p_orig
			,
			 
			DSTDIR
			,
			 
			SZ
			)

			 54.180756 seconds (13.75 M allocations: 237.152 GiB, 8.43% gc time, 0.04% compilation time)

We can now load the created dataset as a regular image classification dataset:


			
			
			
			
			data
			 
			=
			 
			
			loadtaskdata
			(
			DSTDIR
			,
			 
			ImageClassification
			)
			;

Remarks

  • Keypoint and segmentation data: Presizing can of course be useful with other image datasets like those for segmantic segmentation and with keypoint data. You have to be more careful when presizing those, though, since the target variable is affected by the resizing: if an image is downsized, then any segmentation masks and keypoints on it also need to be downsized.

  • Progressive resizing: Presizing can also be used in conjunction with progressive resizing, a technique pioneered by Jeremy Howard, where the training starts with small image sizes for speed and uses larger image sizes later for better performance. This can improve convergence speed quite a bit.