Home » AI » How to recognize objects by image processing techniques such as Canny edge detection, SIFT, and SVM.

How to recognize objects by image processing techniques such as Canny edge detection, SIFT, and SVM.

Object recognition is one of the biggest research area in the world because it can be applied in many important fields like health, military, transportation etc.

This tutorial gives you an introduction to basic steps

The objects recognition isn’t a result of one steps-process. It is a result of applying many theories and techniques in many steps.

Here, I will explain one method that can be applied in objects recognition by image processing techniques. I assume that you have the basics of key concepts in machine learning. For example vector, matrix, scalar, supervised classification, unsupervised classification etc…

Following figure shows the basic steps in the proposed method.


Here’s an eleven-step process for objects recognition. But I’m not going to explain the mathematical aspects and the theories behind the techniques and the algorithms that are used in each of these steps.

Now, we look at steps in detail.

Step 1 – Collecting images

Firstly, we should have a dataset (in this case images) of the desired object (Ex: collect images of different types of hand gestures, if you want to recognize signs in any sign language).

The dataset can be divided into two groups.

1 .Training dataset – Used to train the algorithm (building the model)

2. Test dataset – Used to test and evaluate the trained algorithm.

Therefore, images are collected from various sources like the internet, video streams, capturing by a camera etc…

Good dataset increases the accuracy and performance of the training process and the objects recognition processes. So, it is very good to have images with different angles, with different lighting level, different distances between the camera and the object.

Step 02 – Checking dataset (images) manually.

In this step, all the low quality images are removed from the collected dataset. Sometimes, images may be blurred when capturing. Some images may have low resolution. So, those kind of images are eliminated.

Step 03 – Converting to grayscale image.

After completing step 02, we have the set of images as our dataset. Next, the images should be converted to grayscale from RGB. This done because of reducing the complexity of the model. For more information about the grayscale images click the link here.

Step 04 – Removing noise from images

Many reasons may be affected by the noise of an image. Some reasons are bad environmental conditions, insufficient light level, dust particle etc… Therefore, to reduce the noise from the images, the Gaussian filtering algorithm can be used.  For more about of the Gaussian filtering can be found here.

Step 05 – Segmentation

Segmentation is the process of dividing an image into different parts. For example, we can partition an image into background and foreground. So, segmentation can be done by the Otsu thresholding segmentation approach. Find more details about the Otsu thresholding approach from this link.

Step 06 – Edge detection

After doing the segmentation, an edge detection algorithm can be applied to identify and detect the presence of sharp discontinuities in an image. One of the mostly used edge detection algorithm is the Canny edge detection algorithm. Find more information about Canny edge detection algorithm from here.

Step 07 – Cropping

Now, crop the image by removing background. The cropping is done to reduce complexity and eliminate unwanted data.

Step 08 – Feature extraction

Then, Feature extraction is extracting information from the raw data of an image that is most relevant for discrimination between the classes and discarding redundant information. One of feature extraction technique is SIFT (Scale-invariant feature vector). The result of this step is a multi-dimensional feature vector, also known as a descriptor. More information can be found here.

Step 09 – Clustering

After creating the descriptors, those descriptors are clustered using the K-Means algorithm. If we have 10 classes of image, then k is equal to 10. Find more on k-means clustering form here.

Step 10 – Creating a histogram.

After the clustering, the center of each cluster is used by BOVW as the visual dictionary’s vocabularies. BOVW (Bag of visual words) is a technique that is used to create vocabularies and descriptors of an image and to create frequency histogram of features. Here, the histogram is a vector with a k number of elements. Vocabularies are created by key point and descriptors of an image. Find more about BOVW from here.

Step 11 – Building the model.

To build the model, we use Support Vector Machine (SVM –A supervised learning algorithm) as training algorithm and histograms as the dataset. SVM trains the database by using the histogram. Find more about support vector machine from here.

After doing all the steps, we have a model. It can be used to recognize objects in the future. For an example, Assume that we want to recognize an object in a new image or in an image from the dataset using the model. So, we have to do step 03 to step 10 on that image. Then, insert the histogram of that image to the trained-model as input. Then, it will find out the most suitable class (also known as recognition).

Following table show the each step and corresponding techniques used.

Step Algorithm
Removing noise Gaussian function
Segmentation Otsu thresholding
Edge detection Canny edge detection
Feature extraction SIFT
Clustering k-means clustering
Creating histogram BOVW
Building the model SVM

So, Feel free to comment and share. Thank you.

Please follow us:
, , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *