Due to the size of our training dataset and the limitations of our computing resource, the CNNs we have constructed may not be able to reach high accuracy in validation throughout the training & validation process. For example, the recognition of 20 bird species can only reach 66.28% accuracy for validation.
In the diagram, the validation loss increases as the the training accuracy decreases, and it is the sign of overfitting. Through data augmentation, we can increase the size of the dataset dramatically. For example, the following code utilizing the
classificationAugmentationPipeline function can perform data augmentation:
imdsTrainAugmented = transform(imdsTrain,@classificationAugmentationPipeline,'IncludeInfo',true);
imds_cat = imageDatastore(cat(1, imdsTrain.Files, imdsTrainAugmented1.UnderlyingDatastore.Files));
imds_cat.Labels = cat(1, imdsTrain.Labels, imdsTrainAugmented1.UnderlyingDatastore.Labels);
function [dataOut,info] = classificationAugmentationPipeline(dataIn,info)
dataOut = cell([size(dataIn,1),2]);
for idx = 1:size(dataIn,1)
% Randomized Gaussian blur
temp = imgaussfilt(temp,1.5*rand);
% Add salt and pepper noise
%temp = imnoise(temp,'salt & pepper');
% Add randomized rotation and scale
tform = randomAffine2d('Scale',[0.95,1.05],'Rotation',[-15 15], 'XTranslation',[-15 15],'YTranslation',[-15 15], 'XReflection',true,'XShear',[-30 30]);
outputView = affineOutputView(size(temp),tform);
temp = imwarp(temp,tform,'OutputView',outputView);
%temp = jitterColorHSV(temp,'Hue',[0.05 0.15]);
% Form second column expected by trainNetwork which is expected response,
% the categorical label in this case
dataOut(idx,:) = {temp,info.Label(idx)};
However, it will also dramatically slow down the training process!
The more practical solution is to use transfer learning - utilize and modify the pre-trained networks for image recognition and retrain it with our bird images. It will boost the validation accuracy dramatically!
There are two ways to utilize the existing pre-trained network: (1) using Matlab script to replace the the last convolution layer, the classification layer and the output layer. (2) using
Deep Network Designer provided by the Matlab to conduct all the network modifications, data input configuration and training through a visual programming interface.
The reason to replace the last convolutional layer is to adapt the pre-trained network to recognize new image dataset without changing the underlying lower-level feature extraction. The most of the lower-level feature extraction processes for different images (e.g. birds, cars, faces, etc.) are very similar, so we can take advantage of the existing networks.
The reason to replace the classification and output layers is to match the number of classes we are classifying. For example, both Alexnet and ResNet-18 are trained by over one million images in 1000 categories. However, in our bird recognition project, we only have 20 categories.
In addition, we have to resize the input images to match the pre-trained networks. For example, the Alexnet requires the input images to be 227 x 227, and our birds dataset images are 224 x 224.
In the following diagram, by utilizing pre-trained
Alexnet, the validation data accuracy can reaches over 93%!
In the following diagram, by utilizing pre-trained ResNet-18 the validation data accuracy now reaches over 96%!
Unit 4 Project: Transfer Learning Using Alexnet & ResNet-18 for Birds Recognition
The goal of this project is to improve the performance of bird recognition to over 90% in the Unit 3 project by retraining existing CNNs such as Alexnet and ResNet-18 provided in the Matlab Deep Learning Toolbox.
Project Deadline: 09/03/2020