As humans, we are remarkably good at visual data compared to textual data. For example, if I give you a detailed description of the facial features of a person, it will be hard for you to visualize his image and remember it, but If I show you the picture of his face, you will be able to visualize his face and remember it in seconds! Human vision is amazingly beautiful and complex. It is far more advanced than any computer algorithm ever written.
But unlike humans, computers can't understand visual data, they are very good at understanding textual data, even better than us humans! For example, if we tell a computer about the features of an apple, for instance, it is red in color, spherical in shape, etc., it is easy for a computer to understand and remember this data compared to a simple photo of an apple.
This makes it very different how humans see the world and computers understand the world. This is the biggest challenge in the field of computer vision!
How computers see an image?
First Photographic Camera (Fig2)
Modern digital camera (Fig3)
With the help of digital cameras, we have been successful in capturing an image and convert it to a digital format that computers can understand, but this is just the first and probably the easiest part in computer vision, understanding what's in the photo is much more difficult!
Consider this image, our human brains can easily understand that this is a flower in a split second, this is because we already have a million years’ worth of evolutionary content of how an image looks from our ancestors and it is all inside our DNA! this data helps us immediately understand what this is.
But computers don't have this kind of advantages, to a computer the same image looks like (Fig5), Just a massive array of integer values, which represent the intensities across the color spectrum
Now the task of a computer vision algorithm is to understand this matrix of integers how a human brain does, to make this work, we use an algorithm very similar to how a human brain operates, using machine learning. With machine learning, we train the computer with thousands of images of flowers this eventually helps the algorithm understand what those numbers in a specific organization actually represent.
As long as we feed enough data to train the model, the algorithm will be able to identify and differentiate any image
OpenCV stands for “Open Source Computer Vision” is a library for computer vision and machine learning software library invented by Intel in 1999.
OpenCV has C++, Python, Java, and MATLAB interfaces and supports Windows, Linux, Android, and Mac OS. OpenCV has been written natively in C++.
You can install OpenCv using pip package manager in python
pip install opencv-python
To check if the installation is successful or not, run the following script
Typical tasks in computer vision
1. Image classification:
This involves categorizing a given image into one of the many predefined categories in a model. For example, let us consider a binary classification of two categories
i. Tourist spot
ii. Not a tourist spot
If an image of Eiffel Tower is given to the model, It should be classified into category-I (Tourist spot), and if an image of a regular house is given to the model, it should be classified into category-II (Not a tourist spot).
2. Image localization:
Now let use consider the same example of an image of Eiffel Tower, now our goal is to identify where exactly the Eiffel Tower is located in a given image. This is image localization
3. Object Detection:
This is mainly used in self driving cars and other autonomous robots. The goal here is to detect each and every object in a frame of image and to categorize time, for example a car, a pedestrian, a billboard, a traffic signal, a road sign, etc.
4. Object Identification
This is slightly different from object identification, the goal here is to determine where a specific object appears in an image or not. Given an image of a car and a picture of traffic, the algorithm tells whether the car is present in the picture of traffic.
5. Object tracking:
The purpose of object tracking is to track an object that is in motion over time, utilizing consecutive video frames as the input. robots. This algorithm is performed on a series of frames or a video, the goal here is to track the movement of a specific object in a series of frames, mostly used in self following drones, etc.