Creating an convolutional neural network to detect dog breeds in pictures

Credit: Gabriel Crismariu

It’s taken me long months of caffeine-fuelled hard work but I’ve finally finished my Udacity data scientist nano degree.

Out of all the things I’ve been taught, none fascinated me quite like convolutional neural networks. Simply put they feel magical to me. The idea that a computer can learn to detect patterns and features on its own in order to correctly classify pictures still amazes me. This is why I decided to develop an algorithm to detect dog breeds in pictures as my capstone project.

The aim of the project is to allow users to upload a picture of a dog or a human, identify the dog breed if it’s a dog or return the most look alike breed if it’s a human.

First the program will need to be able to check if a dog or human is present in the picture. If none is present, the user should be returned an error message. Then the program should be able to accurately identify dog breeds. Detecting lookalike breed in a human picture is mostly done for comical purpose and the accuracy can’t be accurately assessed. So this is not a real part of the problem.

The aim of the project is to allow users to upload a picture of a dog or a human, identify the dog breed if it’s a dog or return the most look alike breed if it’s a human.

First the program will need to be able to check if a dog or human is present in the picture. If none is present, the user should be returned an error message. Then the program should be able to accurately identify dog breeds. Detecting lookalike breed in a human picture is mostly done for comical purpose and the accuracy can’t be accurately assessed. So this is not a real part of the problem.

In order to solve that problem we will apply the following steps:

  1. Detect humans in pictures
  2. Detect dogs in pictures
  3. Create a CNN from scratch to detect dog breeds
  4. Use transfer learning to create a CNN to detect dog breeds
  5. Write the final algorithm

To start the work we import datasets of pre classified dogs and human pictures that we will use as our test and train sets.

Importing libraries and the classified dog images
Importing human images

1. Detecting humans in pictures

In order to detect faces in pictures we will work with the expectation that users will only input pictures with faces in clear view. This is communicated to users to avoid frustration. This also us to detect humans by detecting faces.

To do so we simply import one of OpenCV face detectors named Haar feature-based cascade classifiers.

Using the face detector on a test picture

We then write a simple detector with a Boolean output.

When tested on a subset of the dog training dataset and a subset of the human images I got the following results:

  • 100% of human faces detected
  • 11% of human faces incorrectly detected amongst dogs

This function does an OK job at detecting humans in pictures.

2. Detecting dogs in pictures

In order to detect dogs in pictures we used a pre trained model: ResNet-50. This model has been trained on a very popular image dataset called ImageNet. This dataset contains over 10 million pictures and 1000 categories, out of which 118 are dog breeds. The logic used to detect dogs in images is then to check if the model prediction of a picture is amongst these 118 dog breeds.

Loading the model trained on ImageNet dataset

In order to get prediction from the model, all images need to be turned into tensors.

Functions to turn one image or several into (1,224,224,3) tensors

When can then predict from the tensor the ImageNet category.

Function to predict the ImageNet label

And finally we can use the predicted label to check if a dog is detected in the picture.

Function to detect if dog is in a picture

I evaluated the accuracy of this dog detector by running it on the same images used to test the human detector:

  • 100% of dogs were detected in dog pictures
  • No dogs were detected in human pictures

Which indicates that the detector is doing a good job.

3. Create a CNN from scratch to detect dog breeds

The first step into creating a convolutional neural network from scratch was to think about the right architecture. When using CNNs to analyse pictures the aim tends to be turning an object that is wide and large but shallow, our 224x224 pixels by 3 channels tensor created above, into an object that is not very wide nor large but deep. This is a way to turn spatial information into features maps that are not reliant on the spatial positions of said features.

This is why I have used convolutional layers with a growing number of filters (to create a growing number of feature maps) and have followed each of them with a pooling layer of stride 2, which reduces the size of the input by two. Finally I decided to use a global pooling average instead of dense fully connected layers as my last layer before the output layer. It has been proven to do be more robust to spatial translations within the image and it also removes a large number of parameters to be trained.

Defining the architecture of the network
Training the model

I then compiled the model, trained it across 50 epochs and saved the weights giving the lowest loss (defined as categorical cross entropy since this is a multi class model). The model defined this way gave an accuracy of 8.85% on the test set, which while better than a random guess is not great.

4. Load an existing CNN to detect dog breeds

In order to try and improve accuracy, I then decided to use transfer learning. The basis of transfer learning is to use an existing powerful model striped out of its final layer. In general the further down the model a layer is the more specialised it is to the task the model was trained on. A new network is created which takes as input bottleneck features from the model which is retrained on the last layer for the specific task at end. I used ResNet50 for the transfer learning.

Loading the bottleneck features given for this exercise
Definng the new model taking as input the bottlenecks features

I then used the same method to train the model (50 epochs, categorical cross entropy as loss and using a check pointer to save the best weights).

Training the new model with the transferred learning from ResNet50

This new model takes much less time to train and performs much better since it comes up with an accuracy of 79.54%!

The final step to finish the dog breed predictor is to create a function that can take a picture path as an input, turn it into bottleneck features and then predict the breed with the new model.

Function to predict the breed using the model with transferred learning

5. Write the final algorithm

The last step in the process is pretty straight forward as it consists in putting all the pieces together.

Creating a table with all dog breeds and a picture to return when detecting lookalike of human
Function taking a picture as input an returning a breed prediction or an error if no face is detected

I tried the function with a couple of dog pictures and human pictures. And it works rather well as shown below.

In order to improve the model the first option is to increase the number of epochs. This will make the training time longer but should improve the model accuracy.

Hyper parameters tuning with the use of cross validation (for instance with GridSearchCV) could also help improve accuracy.

Finally data augmentation could also help improve the model by feeding more pictures with unusual spatialisation in the training set.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store