Most of our perception of the world around us comes from our sight. No surprise, then, that emulating human vision is one of the most challenging tasks in trying to create Artificial Intelligence. Computer vision is the scientific discipline that deals with creating machines capable of emulating the human ability to "see". Different areas can be identified in computer vision, ranging from object detection to image restoration. In this article, we will deal with the specific task of image classification.
The human visual system
As modern physiology has explained, in the human visual system the cerebral cortex plays a crucial role. In fact, the light reflected from surrounding objects passes through the pupil and is collected by the retina where the energy of the photons is converted into a signal for the brain. The system of collection and transduction of light coming from the outside has several limitations, including two-dimensionality, the poor quality of the image collected in the peripheral areas of the retina, and the presence of a blind spot. Despite these limitations, the human brain is able to reconstruct an almost always faithful image starting from the available information, showing an active role in human vision. The role of the brain in creating images is evident considering some famous optical illusions such as the Kanizsa triangle. When ad-hoc misleading elements are inserted in an image, our brain is attempting to create a coherent image, which does not necessarily corresponds to what is actually in the image. These considerations highlight the sophistication of the human brain and give the measure of the difficulties encountered in trying to replicate its behavior within a machine.
Computer vision models
The first attempts to use computer vision date back to the late 60s, while the following decades were dedicated to the development of theoretical studies and mathematical formulations. Thanks to the advent of modern Deep Learning techniques, facilitated by the growing availability of computational power, computer vision has had a new impulse and has managed to tackle tasks that cannot be solved with traditional techniques.
What image classification is
As a branch of computer vision, image classification consists in recognizing if an image corresponds to one of a given set of objects. We have been used to classifying objects since early childhood, when we first started gathering different images of different objects. In this way, our brain has been trained to identify the features that classify objects. All these models developed in our brain are independent from scale, rotation and light conditions, since we are able to recognize an object despite its size, its relative position in space, and its color.
For a machine, an image is a collection of pixels, each with a color defined by a set of values. But using pixels to recognize an object could work only if it is always exactly in the same position. If only the object is slightly rotated or reduced in scale, the single pixel does not bring any information. To build a robust classifier it is necessary to extract a set of image features, which allows the object to be recognized.
The pre processing phase
For example, morphological features include some parameters related to the shape and the distribution of colors in the image. Some examples are the Aspect Ratio, the Roundness, and the Shape Factor. Other features could be based on the identification of interesting regions (also called landmarks) of the images such as edges and corners.
Once these features have been extracted, the dataset of images can be converted into a plain table, containing the information for each image.
In this case, we have considered the task of classifying fruits and vegetables  starting from some images taken from the web. Extracting geometric features from the set of images we can get a table with 3114 records for the training set and 359 images for the test set, belonging to 36 types of fruits and vegetables. This table obtained by training images can be used to train a standard classification machine learning algorithm, such as Neural Networks or Decision Trees.
Building image classifiers
We used Rulex Platform to build some classifiers and test them to evaluate their accuracy. To test how accurate a method is in inferencing, a training test experiment can be performed. The data are divided into two sets: the training set, used to train the model, and the test set used to evaluate its performance. This can be done, for example, using the Split Data task in Rulex Factory. In this specific case, data has already been divided into training and test sets so no need to split them.
In Rulex Factory, we built two flows: the first one is based on the training set and creates a classifier model, while the second flow is based on the test set, as well as the generated model, to see how images are recognized. The test set and the model can be merged in the same flow using a Select Flows task in Rulex Factory.
We tested four different classification algorithms: Neural Networks, Support Vector Machines, Decision Trees, and the Logic Learning Machine. The accuracy of each method is computed as the percentage of images correctly classified in the test set. These are the results obtained through training/test evaluation.
Notice that using rule-based methods, such as Decision Trees and the Logic Learning Machine, helps humans in better understanding how an image classifier is working, consequently improving the level of trust in it.
Black box methods
The recent development of Convolutional Neural Networks (CNN) has allowed image classification methods to become more powerful. CNN are networks with a complex architecture, which make it possible to automatically obtain shift-invariant features of an image. Several libraries, such as Keras , PyTorch , and Yolo , implement CNN, but you can also implement them in Rulex Factory if you use the Python Bridge task, which allows you to embed Python code in your Rulex Factory flow.
The disadvantage of this class of techniques is that the way an image is classified is very complex and almost impossible to understand for a human being. For this reason, these methods are usually referred to as black box. Black box techniques may be risky when decisions are taken based on models which are impossible to decipher.
There are many documented examples of image classifiers that failed without any apparent explanation. There are also many examples of how images can be maliciously altered in order to deceive the Neural Network . Computer vision can be misled by inserting confusing elements, in the same way as a human brain can be tricked with optical illusions. The attempt to deceive machines is usually called adversarial attacks and is a cause of concern for systems that uses machine learning to automate security- and safety-critical tasks.
Explainable AI methods
For this reason, much effort in computer vision has recently been focused on eXplainable AI, i.e. methods that provide explanations on how the classifier works. Having such information could help to understand the weak points of image classification algorithms and to improve them.
A famous example involves the classification of huskies and wolves . Some data scientists built a wolf/husky classifier that was performing perfectly on their training data, but was failing dramatically when classifying new images. Using some eXplainable machine learning techniques, data scientists understood that the machine was not using the animal itself but the snow surrounding it to perform its classification. In this case, the classifier was just recognizing whether there was some snow in the picture. Probably some more images of huskies without snow were needed to improve the quality of the classifier!
Research in the field of eXplainable AI for image classification is still immature, even if promising, but will see many improvements in the near future, since understanding the decisions of a machine is the only way to trust it.
Edited by Enrico Ferrari