The newest ideas of pattern recognition. Simple case, one-dimensional separation

In general, three methods of pattern recognition can be distinguished: The enumeration method. In this case, a comparison is made with the database, where for each type of object all possible modifications of the display are presented. For example, for optical image recognition, you can apply the method of enumerating the type of an object at different angles, scales, displacements, deformations, etc. For letters, you need to iterate over the font, font properties, etc. In the case of sound image recognition, respectively, a comparison with some well-known patterns (for example, a word spoken by several people).

The second approach is a deeper analysis of the characteristics of the image. In the case of optical recognition, this may be the determination of various geometric characteristics. The sound sample in this case is subjected to frequency, amplitude analysis, etc.

The next method is the use of artificial neural networks (ANN). This method requires either a large number of examples of the recognition task during training, or a special neural network structure that takes into account the specifics of this task. However, it is distinguished by higher efficiency and productivity.

4. History of pattern recognition

Let us briefly consider the mathematical formalism of pattern recognition. An object in pattern recognition is described by a set of basic characteristics (features, properties). The main characteristics can be of a different nature: they can be taken from an ordered set of the real line type, or from a discrete set (which, however, can also be endowed with a structure). This understanding of the object is consistent both with the need for practical applications of pattern recognition and with our understanding of the mechanism of human perception of an object. Indeed, we believe that when a person observes (measures) an object, information about it comes through a finite number of sensors (analyzed channels) to the brain, and each sensor can be associated with the corresponding characteristic of the object. In addition to the features that correspond to our measurements of the object, there is also a selected feature, or a group of features, which we call classifying features, and finding out their values ​​for a given vector X is the task that natural and artificial recognition systems perform.

It is clear that in order to establish the values ​​of these features, it is necessary to have information about how the known features are related to the classifying ones. Information about this relationship is given in the form of precedents, that is, a set of descriptions of objects with known values ​​of classifying features. And according to this precedent information, it is required to build a decision rule that will set the arbitrary description of the object of the value of its classifying features.

This understanding of the problem of pattern recognition has been established in science since the 50s of the last century. And then it was noticed that such a production is not at all new. Well-proven methods of statistical data analysis, which were actively used for many practical tasks, such as, for example, technical diagnostics, were faced with such a formulation and already existed. Therefore, the first steps of pattern recognition passed under the sign of the statistical approach, which dictated the main problem.

The statistical approach is based on the idea that the initial space of objects is a probabilistic space, and the features (characteristics) of objects are random variables given on it. Then the task of the data scientist was to put forward a statistical hypothesis about the distribution of features, or rather about the dependence of classifying features on the rest, from some considerations. The statistical hypothesis, as a rule, was a parametrically specified set of feature distribution functions. A typical and classical statistical hypothesis is the hypothesis of the normality of this distribution (there are a great many varieties of such hypotheses in statistics). After formulating the hypothesis, it remained to test this hypothesis on precedent data. This check consisted in choosing some distribution from the initially given set of distributions (the distribution hypothesis parameter) and assessing the reliability (confidence interval) of this choice. Actually, this distribution function was the answer to the problem, only the object was classified not uniquely, but with some probabilities of belonging to classes. Statisticians have also developed an asymptotic justification for such methods. Such justifications were made according to the following scheme: a certain quality functional of the choice of distribution (confidence interval) was established and it was shown that with an increase in the number of precedents, our choice with a probability tending to 1 became correct in the sense of this functional (confidence interval tending to 0). Looking ahead, we can say that the statistical view of the recognition problem turned out to be very fruitful not only in terms of the developed algorithms (which include methods of cluster and discriminant analysis, nonparametric regression, etc.), but also subsequently led Vapnik to create a deep statistical theory of recognition .

Nevertheless, there is a strong argument in favor of the fact that the problems of pattern recognition are not reduced to statistics. Any such problem, in principle, can be considered from a statistical point of view, and the results of its solution can be interpreted statistically. To do this, it is only necessary to assume that the space of objects of the problem is probabilistic. But from the point of view of instrumentalism, the criterion for the success of the statistical interpretation of a certain recognition method can only be the justification of this method in the language of statistics as a branch of mathematics. Justification here means the development of basic requirements for the problem that ensure success in applying this method. However, at the moment, for most of the recognition methods, including those that directly arose within the framework of the statistical approach, such satisfactory justifications have not been found. In addition, the most commonly used statistical algorithms at the moment, such as Fisher's linear discriminant, Parzen window, EM algorithm, nearest neighbors, not to mention Bayesian belief networks, have a strongly pronounced heuristic nature and may have interpretations different from statistical ones. And finally, to all of the above, it should be added that in addition to the asymptotic behavior of recognition methods, which is the main issue of statistics, the practice of recognition raises questions of the computational and structural complexity of methods that go far beyond the framework of probability theory alone.

In total, contrary to the aspirations of statisticians to consider pattern recognition as a section of statistics, completely different ideas entered into the practice and ideology of recognition. One of them was caused by research in the field of visual pattern recognition and is based on the following analogy.

As already noted, in everyday life people constantly solve (often unconsciously) the problems of recognizing various situations, auditory and visual images. Such a capability for computers is, at best, a matter of the future. From this, some pioneers of pattern recognition concluded that the solution of these problems on a computer should, in general terms, simulate the processes of human thinking. The most famous attempt to approach the problem from this side was the famous study of F. Rosenblatt on perceptrons.

By the mid-50s, it seemed that neurophysiologists had understood the physical principles of the brain (in the book "The New Mind of the King", the famous British theoretical physicist R. Penrose interestingly questions the neural network model of the brain, substantiating the essential role of quantum mechanical effects in its functioning ; although, however, this model was questioned from the very beginning. Based on these discoveries, F. Rosenblatt developed a model for learning to recognize visual patterns, which he called the perceptron. Rosenblatt's perceptron is the following function (Fig. 1):

Fig 1. Schematic of the Perceptron

At the input, the perceptron receives an object vector, which in the works of Rosenblatt was a binary vector showing which of the screen pixels is blacked out by the image and which is not. Further, each of the signs is fed to the input of the neuron, the action of which is a simple multiplication by a certain weight of the neuron. The results are fed to the last neuron, which adds them up and compares the total amount with a certain threshold. Depending on the results of the comparison, the input object X is recognized as necessary or not. Then the task of learning pattern recognition was to select the weights of neurons and the threshold value so that the perceptron would give correct answers on precedent visual images. Rosenblatt believed that the resulting function would be good at recognizing the desired visual image even if the input object was not among the precedents. From bionic considerations, he also came up with a method for selecting weights and a threshold, which we will not dwell on. Let's just say that his approach was successful in a number of recognition problems and gave rise to a whole line of research on learning algorithms based on neural networks, of which the perceptron is a special case.

Further, various generalizations of the perceptron were invented, the function of neurons was complicated: now neurons could not only multiply input numbers or add them and compare the result with thresholds, but apply more complex functions to them. Figure 2 shows one of these neuron complications:

Rice. 2 Diagram of the neural network.

In addition, the topology of the neural network could be much more complicated than the one considered by Rosenblatt, for example, this:

Rice. 3. Diagram of Rosenblatt's neural network.

Complications led to an increase in the number of adjustable parameters during training, but at the same time increased the ability to tune in to very complex patterns. Research in this area is now going in two closely related areas - both various network topologies and various tuning methods are being studied.

Neural networks are currently not only a tool for solving pattern recognition problems, but have been used in research on associative memory and image compression. Although this line of research overlaps strongly with the problems of pattern recognition, it is a separate section of cybernetics. For the recognizer at the moment, neural networks are nothing more than a very specific, parametrically defined set of mappings, which in this sense does not have any significant advantages over many other similar learning models that will be briefly listed below.

In connection with this assessment of the role of neural networks for proper recognition (that is, not for bionics, for which they are of paramount importance now), I would like to note the following: neural networks, being an extremely complex object for mathematical analysis, with their proper use, allow us to find very non-trivial laws in the data. Their difficulty for analysis, in the general case, is explained by their complex structure and, as a result, practically inexhaustible possibilities for generalizing a wide variety of regularities. But these advantages, as often happens, are a source of potential errors, the possibility of retraining. As will be discussed later, such a twofold view of the prospects of any learning model is one of the principles of machine learning.

Another popular direction in recognition is logical rules and decision trees. In comparison with the above-mentioned recognition methods, these methods most actively use the idea of ​​expressing our knowledge about the subject area in the form of probably the most natural (on a conscious level) structures - logical rules. An elementary logical rule means a statement like “if unclassified features are in the ratio X, then the classified ones are in the ratio Y”. An example of such a rule in medical diagnostics is the following: if the patient's age is over 60 years old and he has previously had a heart attack, then do not perform the operation - the risk of a negative outcome is high.

To search for logical rules in the data, 2 things are needed: to determine the measure of the “informativeness” of the rule and the space of rules. And the task of finding rules after that turns into a task of complete or partial enumeration in the space of rules in order to find the most informative of them. The definition of information content can be introduced in a variety of ways, and we will not dwell on this, considering that this is also some parameter of the model. The search space is defined in the standard way.

After finding sufficiently informative rules, the phase of “assembling” the rules into the final classifier begins. Without discussing in depth the problems that arise here (and there are a considerable number of them), we list 2 main methods of “assembly”. The first type is a linear list. The second type is weighted voting, when a certain weight is assigned to each rule, and the classifier refers the object to the class for which the largest number of rules voted.

In fact, the rule building phase and the "assembly" phase are performed together and, when building a weighted vote or list, the search for rules on parts of the case data is called again and again to ensure a better fit between the data and the model.

Iteration method. In this method, a comparison is made with a certain database, where for each of the objects there are different options for modifying the display. For example, for optical image recognition, you can apply the iteration method at different angles or scales, offsets, deformations, etc. For letters, you can iterate over the font or its properties. In the case of sound pattern recognition, there is a comparison with some known patterns (a word spoken by many people). Further, a deeper analysis of the characteristics of the image is performed. In the case of optical recognition, this may be the definition of geometric characteristics. The sound sample in this case is subjected to frequency and amplitude analysis.

The next method is use of artificial neural networks(INS). It requires either a huge number of examples of the recognition task, or a special neural network structure that takes into account the specifics of this task. But, nevertheless, this method is characterized by high efficiency and productivity.

Methods based on estimates of the distribution densities of feature values. Borrowed from the classical theory of statistical decisions, in which the objects of study are considered as realizations of a multidimensional random variable distributed in the feature space according to some law. They are based on the Bayesian decision-making scheme, which appeals to the initial probabilities of objects belonging to a particular class and conditional feature distribution densities.

The group of methods based on the estimation of the distribution densities of feature values ​​is directly related to the methods of discriminant analysis. The Bayesian approach to decision making is one of the most developed parametric methods in modern statistics, for which the analytical expression of the distribution law (the normal law) is considered to be known and only a small number of parameters (mean vectors and covariance matrices) need to be estimated. The main difficulties in applying this method are considered to be the need to remember the entire training sample to calculate density estimates and high sensitivity to the training sample.

Methods based on assumptions about the class of decision functions. In this group, the type of the decision function is considered to be known and its quality functional is given. Based on this functional, the optimal approximation to the decision function is found from the training sequence. The decision rule quality functional is usually associated with an error. The main advantage of the method is the clarity of the mathematical formulation of the recognition problem. The possibility of extracting new knowledge about the nature of an object, in particular, knowledge about the mechanisms of interaction of attributes, is fundamentally limited here by a given structure of interaction, fixed in the chosen form of decision functions.

Prototype comparison method. This is the easiest extensional recognition method in practice. It applies when the recognizable classes are shown as compact geometric classes. Then the center of the geometric grouping (or the object closest to the center) is chosen as the prototype point.

To classify an indeterminate object, the prototype closest to it is found, and the object belongs to the same class as it. Obviously, no generalized images are formed in this method. Various types of distances can be used as a measure.

Method of k nearest neighbors. The method lies in the fact that when classifying an unknown object, a given number (k) of geometrically nearest feature space of other nearest neighbors with already known belonging to a class is found. The decision to assign an unknown object is made by analyzing information about its nearest neighbors. The need to reduce the number of objects in the training sample (diagnostic precedents) is a disadvantage of this method, since this reduces the representativeness of the training sample.

Based on the fact that different recognition algorithms behave differently on the same sample, the question arises of a synthetic decision rule that would use the strengths of all algorithms. For this, there is a synthetic method or sets of decision rules that combine the most positive aspects of each of the methods.

In conclusion of the review of recognition methods, we present the essence of the above in a summary table, adding some other methods used in practice.

Table 1. Classification table of recognition methods, comparison of their areas of application and limitations

Classification of recognition methods

Application area

Limitations (disadvantages)

Intensive recognition methods

Methods based on density estimates

Problems with a known distribution (normal), the need to collect large statistics

The need to enumerate the entire training set during recognition, high sensitivity to non-representativeness of the training set and artifacts

Assumption based methods

Classes should be well separable

The form of the decision function must be known in advance. The impossibility of taking into account new knowledge about correlations between features

Boolean Methods

Problems of small dimension

When selecting logical decision rules, a complete enumeration is necessary. High labor intensity

Linguistic Methods

The task of determining the grammar for a certain set of statements (descriptions of objects) is difficult to formalize. Unresolved theoretical problems

Extensional methods of recognition

Prototype comparison method

Problems of small dimension of feature space

High dependence of classification results on the metric. Unknown optimal metric

k nearest neighbor method

High dependence of classification results on the metric. The need for a complete enumeration of the training sample during recognition. Computational complexity

Grade Calculation Algorithms (ABO)

Problems of small dimension in terms of the number of classes and features

Dependence of classification results on the metric. The need for a complete enumeration of the training sample during recognition. High technical complexity of the method

Collective decision rules (CRC) is a synthetic method.

Problems of small dimension in terms of the number of classes and features

Very high technical complexity of the method, the unresolved number of theoretical problems, both in determining the areas of competence of particular methods, and in the particular methods themselves

Sun, Mar 29, 2015

Currently, there are many tasks in which it is required to make some decision depending on the presence of an object in the image or to classify it. The ability to "recognize" is considered the main property of biological beings, while computer systems do not fully possess this property.

Consider the general elements of the classification model.

Class- a set of objects that have common properties. For objects of the same class, the presence of "similarity" is assumed. An arbitrary number of classes can be defined for the recognition task, more than 1. The number of classes is denoted by the number S. Each class has its own identifying class label.

Classification- the process of assigning class labels to objects, according to some description of the properties of these objects. A classifier is a device that receives a set of features of an object as input and produces a class label as a result.

Verification- the process of matching an object instance with a single object model or class description.

Under way we will understand the name of the area in the space of attributes, in which many objects or phenomena of the material world are displayed. sign- a quantitative description of a particular property of the object or phenomenon under study.

feature space this is an N-dimensional space defined for a given recognition task, where N is a fixed number of measured features for any objects. The vector from the feature space x corresponding to the object of the recognition problem is an N-dimensional vector with components (x_1,x_2,…,x_N), which are the feature values ​​for the given object.

In other words, pattern recognition can be defined as the assignment of initial data to a certain class by extracting essential features or properties that characterize this data from the general mass of irrelevant details.

Examples of classification problems are:

  • character recognition;
  • speech recognition;
  • establishing a medical diagnosis;
  • weather forecast;
  • face recognition
  • classification of documents, etc.

Most often, the source material is the image received from the camera. The task can be formulated as obtaining feature vectors for each class in the considered image. The process can be viewed as a coding process, which consists in assigning a value to each feature from the feature space for each class.

If we consider 2 classes of objects: adults and children. As features, you can choose height and weight. As follows from the figure, these two classes form two non-intersecting sets, which can be explained by the chosen features. However, it is not always possible to choose the correct measured parameters as features of classes. For example, the selected parameters are not suitable for creating non-overlapping classes of football players and basketball players.

The second task of recognition is the selection of characteristic features or properties from the original images. This task can be attributed to preprocessing. If we consider the task of speech recognition, we can distinguish such features as vowels and consonants. The attribute must be a characteristic property of a particular class, while being common to this class. Signs that characterize the differences between - interclass signs. Features common to all classes do not carry useful information and are not considered as features in the recognition problem. The choice of features is one of the important tasks associated with the construction of a recognition system.

After the features are determined, it is necessary to determine the optimal decision procedure for classification. Consider a pattern recognition system designed to recognize various M classes, denoted as m_1,m_2,…,m 3. Then we can assume that the image space consists of M regions, each containing points corresponding to an image from one class. Then the recognition problem can be considered as the construction of boundaries separating M classes based on the accepted measurement vectors.

The solution of the problem of image preprocessing, feature extraction and the problem of obtaining the optimal solution and classification is usually associated with the need to evaluate a number of parameters. This leads to the problem of parameter estimation. In addition, it is obvious that feature extraction can use additional information based on the nature of the classes.

Comparison of objects can be made on the basis of their representation in the form of measurement vectors. It is convenient to represent measurement data as real numbers. Then the similarity of the feature vectors of two objects can be described using the Euclidean distance.

where d is the dimension of the feature vector.

There are 3 groups of pattern recognition methods:

  • Sample comparison. This group includes classification by nearest mean, classification by distance to nearest neighbor. Structural recognition methods can also be included in the sample comparison group.
  • Statistical Methods. As the name implies, statistical methods use some statistical information when solving a recognition problem. The method determines the belonging of an object to a specific class based on the probability. In some cases, this comes down to determining the a posteriori probability of an object belonging to a certain class, provided that the features of this object have taken the appropriate values. An example is the Bayesian decision rule method.
  • Neural networks. A separate class of recognition methods. A distinctive feature from others is the ability to learn.

Classification by nearest mean

In the classical approach of pattern recognition, in which an unknown object for classification is represented as a vector of elementary features. A feature-based recognition system can be developed in various ways. These vectors can be known to the system in advance as a result of training or predicted in real time based on some models.

A simple classification algorithm consists of grouping class reference data using the class expectation vector (mean).

where x(i,j) is the j-th reference feature of class i, n_j is the number of reference vectors of class i.

Then the unknown object will belong to class i if it is much closer to the expectation vector of class i than to the expectation vectors of other classes. This method is suitable for problems in which the points of each class are located compactly and far from the points of other classes.

Difficulties will arise if the classes have a slightly more complex structure, for example, as in the figure. In this case, class 2 is divided into two non-overlapping sections, which are poorly described by a single average value. Also, class 3 is too elongated, samples of the 3rd class with large values ​​of x_2 coordinates are closer to the average value of the 1st class than the 3rd class.

The described problem in some cases can be solved by changing the calculation of the distance.

We will take into account the characteristic of the "scatter" of class values ​​- σ_i, along each coordinate direction i. The standard deviation is equal to the square root of the variance. The scaled Euclidean distance between the vector x and the expectation vector x_c is

This distance formula will reduce the number of classification errors, but in reality, most problems cannot be represented by such a simple class.

Classification by distance to nearest neighbor

Another approach to classification is to assign an unknown feature vector x to the class to which this vector is closest to a separate sample. This rule is called the nearest neighbor rule. Nearest neighbor classification can be more efficient even when the classes are complex or when the classes overlap.

This approach does not require assumptions about the distribution models of feature vectors in space. The algorithm uses only information about known reference samples. The solution method is based on calculating the distance x to each sample in the database and finding the minimum distance. The advantages of this approach are obvious:

  • at any time you can add new samples to the database;
  • tree and grid data structures reduce the number of calculated distances.

In addition, the solution will be better if you look in the database not for one nearest neighbor, but for k. Then, for k > 1, it provides the best sample of the distribution of vectors in the d-dimensional space. However, the efficient use of k values ​​depends on whether there is enough in each region of space. If there are more than two classes, then it is more difficult to make the right decision.

Literature

  • M. Castrillon, . O. Deniz, . D. Hernández and J. Lorenzo, “A comparison of face and facial feature detectors based on the Viola-Jones general object detection framework,” International Journal of Computer Vision, no. 22, pp. 481-494, 2011.
  • Y.-Q. Wang, "An Analysis of Viola-Jones Face Detection Algorithm," IPOL Journal, 2013.
  • L. Shapiro and D. Stockman, Computer vision, Binom. Knowledge Lab, 2006.
  • Z. N. G., Recognition methods and their application, Soviet radio, 1972.
  • J. Tu, R. Gonzalez, Mathematical Principles of Pattern Recognition, Moscow: “Mir” Moscow, 1974.
  • Khan, H. Abdullah and M. Shamian Bin Zainal, "Efficient eyes and mouth detection algorithm using combination of viola jones and skin color pixel detection" International Journal of Engineering and Applied Sciences, no. Vol. 3 no 4, 2013.
  • V. Gaede and O. Gunther, "Multidimensional Access Methods," ACM Computing Surveys, pp. 170-231, 1998.
  • tutorial

For a long time I wanted to write a general article containing the very basics of Image Recognition, a kind of guide on basic methods, telling when to apply them, what tasks they solve, what can be done in the evening on the knee, and what is better not to think about without having a team of people in 20.

I have been writing some articles on Optical Recognition for a long time, so a couple of times a month various people write to me with questions on this topic. Sometimes you get the feeling that you live with them in different worlds. On the one hand, you understand that a person is most likely a professional in a related topic, but knows very little about optical recognition methods. And the most annoying thing is that he tries to apply a method from a nearby field of knowledge, which is logical, but does not work completely in Image Recognition, but does not understand this and is very offended if he starts telling him something from the very basics. And considering that telling from the basics is a lot of time, which is often not there, it becomes even sadder.

This article is designed so that a person who has never dealt with image recognition methods can, within 10-15 minutes, create in his head a certain basic picture of the world corresponding to the topic, and understand in which direction he should dig. Many of the methods described here are applicable to radar and audio processing.
I'll start with a couple of principles that we always begin to tell a potential customer, or a person who wants to start doing Optical Recognition:

  • When solving a problem, always go from the simplest. It is much easier to hang an orange label on a person than to follow a person, highlighting him in cascades. It is much easier to take a camera with a higher resolution than to develop a super-resolution algorithm.
  • A strict problem statement in optical recognition methods is orders of magnitude more important than in system programming problems: one extra word in the TK can add 50% of the work.
  • In recognition problems, there are no universal solutions. You can not make an algorithm that will simply "recognize any inscription." A sign on the street and a sheet of text are fundamentally different objects. It is probably possible to make a general algorithm (a good example from Google), but this will require a lot of work from a large team and consist of dozens of different subroutines.
  • OpenCV is the bible, which has many methods, and with which you can solve 50% of the volume of almost any problem, but OpenCV is only a small part of what can be done in reality. In one study, it was written in the conclusions: "The problem is not solved by OpenCV methods, therefore, it is unsolvable." Try to avoid this, not be lazy and soberly evaluate the current task every time from scratch, without using OpenCV templates.
It is very difficult to give some kind of universal advice, or tell how to create some kind of structure around which you can build a solution to arbitrary computer vision problems. The purpose of this article is to structure what can be used. I will try to break the existing methods into three groups. The first group is pre-filtering and image preparation. The second group is the logical processing of the filtering results. The third group is decision-making algorithms based on logical processing. The boundaries between groups are very arbitrary. To solve a problem, it is far from always necessary to apply methods from all groups; sometimes two are enough, and sometimes even one.

The list of methods presented here is not complete. I propose to add in the comments critical methods that I did not write and attribute 2-3 accompanying words to each.

Part 1. Filtering

In this group, I placed methods that allow you to select areas of interest in images without analyzing them. Most of these methods apply some kind of uniform transformation to all points in the image. At the filtering level, the image is not analyzed, but the points that are filtered can be considered as areas with special characteristics.
Threshold binarization, histogram area selection
The simplest transformation is the binarization of the image by the threshold. For RGB and grayscale images, the threshold is the color value. There are ideal problems in which such a transformation is sufficient. Suppose you want to automatically select items on a white sheet of paper:




The choice of the threshold by which binarization takes place largely determines the process of binarization itself. In this case, the image was binarized by the average color. Typically, binarization is done with an algorithm that adaptively selects a threshold. Such an algorithm can be the choice of expectation or mode. And you can choose the largest peak of the histogram.

Binarization can give very interesting results when working with histograms, including the situation if we consider an image not in RGB, but in HSV. For example, segment the colors of interest. On this principle, it is possible to build both a label detector and a human skin detector.
Classical filtering: Fourier, LPF, HPF
Classical filtering methods from radar and signal processing can be successfully applied in a variety of Pattern Recognition tasks. The traditional method in radar, which is almost never used in images in its pure form, is the Fourier transform (more specifically, the FFT). One of the few exceptions where the 1D Fourier transform is used is image compression. For image analysis, a one-dimensional transformation is usually not enough, you need to use a much more resource-intensive two-dimensional transformation.

Few people actually calculate it, usually it is much faster and easier to use the convolution of the region of interest with a ready-made filter sharpened to high (HPF) or low (LPF) frequencies. Such a method, of course, does not allow spectrum analysis, but in a specific video processing task, it is usually not an analysis that is needed, but a result.


The simplest examples of filters that emphasize low frequencies (Gaussian filter) and high frequencies (Gabor filter).
For each image point, a window is selected and multiplied with a filter of the same size. The result of such a convolution is the new value of the point. When implementing LPF and HPF, images of this type are obtained:



Wavelets
But what if we use some arbitrary characteristic function for convolution with the signal? Then it will be called "Wavelet Transform". This definition of wavelets is not correct, but traditionally, in many teams, wavelet analysis is the search for an arbitrary pattern in an image using convolution with a model of this pattern. There is a set of classical functions used in wavelet analysis. These include the Haar wavelet, the Morlet wavelet, the Mexican hat wavelet, and so on. Haar primitives, about which there were several of my previous articles ( , ), refer to such functions for a two-dimensional space.


Above are 4 examples of classical wavelets. 3D Haar wavelet, 2D Meyer wavelet, Mexican Hat wavelet, Daubechies wavelet. A good example of using the extended interpretation of wavelets is the problem of finding a glint in the eye, for which the glint itself is a wavelet:

Classical wavelets are usually used for , or for their classification (to be described below).
Correlation
After such a free interpretation of wavelets on my part, it is worth mentioning the actual correlation underlying them. When filtering images, this is an indispensable tool. A classic application is video stream correlation to find offsets or optical streams. The simplest shift detector is also, in a sense, a difference correlator. Where the images do not correlate, there was movement.

Function filtering
An interesting class of filters is filtering functions. These are purely mathematical filters that allow you to detect a simple mathematical function in an image (line, parabola, circle). An accumulative image is built, in which for each point of the original image a set of functions that generate it is drawn. The most classical transformation is the Hough transformation for lines. In this transformation, for each point (x;y), a set of points (a;b) of the line y=ax+b is drawn, for which equality is true. Get beautiful pictures:


(the first plus for the one who is the first to find a catch in the picture and such a definition and explain it, the second plus for the one who is the first to say what is shown here)
The Hough transform allows you to find any parameterizable functions. For example circles. There is a modified transformation that allows you to search for any . This transformation is terribly fond of mathematicians. But when processing images, it, unfortunately, does not always work. Very slow speed, very high sensitivity to the quality of binarization. Even in ideal situations, I preferred to get by with other methods.
The counterpart of the Hough transform for lines is the Radon transform. It is calculated through the FFT, which gives a performance gain in a situation where there are a lot of points. In addition, it can be applied to a non-binarized image.
Contour filtering
A separate class of filters is border and contour filtering. Paths are very useful when we want to move from working with an image to working with objects in that image. When an object is quite complex, but well distinguished, then often the only way to work with it is to select its contours. There are a number of algorithms that solve the problem of contour filtering:

The most commonly used is Kenny, who works well and whose implementation is in OpenCV (Sobel is also there, but he looks for contours worse).



Other filters
Above are filters, modifications of which help to solve 80-90% of tasks. But besides them, there are more rare filters used in local tasks. There are dozens of such filters, I will not list them all. Of interest are iterative filters (for example ), as well as ridgelet and curvlet transforms, which are an alloy of classical wavelet filtering and analysis in the radon transform field. Beamlet transform works beautifully on the border of wavelet transform and logical analysis, allowing you to highlight the contours:

But these transformations are very specific and tailored for rare tasks.

Part 2. Logical processing of filtering results

Filtering gives a set of data suitable for processing. But often you can’t just take and use this data without processing it. In this section, there will be several classic methods that allow you to go from the image to the properties of objects, or to the objects themselves.
Morphology
The transition from filtering to logic, in my opinion, is the methods of mathematical morphology ( , ). In fact, these are the simplest operations of increasing and eroding binary images. These methods allow you to remove noise from a binary image by increasing or decreasing the available elements. Based on mathematical morphology, there are contouring algorithms, but usually they use some kind of hybrid algorithms or algorithms in conjunction.
contour analysis
In the section on filtering, algorithms for obtaining boundaries have already been mentioned. The resulting boundaries are quite simply converted into contours. For the Canny algorithm this happens automatically, for other algorithms additional binarization is required. You can get a contour for a binary algorithm, for example, with the beetle algorithm.
The contour is a unique characteristic of an object. Often this allows you to identify the object along the contour. There is a powerful mathematical apparatus that allows you to do this. The apparatus is called contour analysis ( , ).

To be honest, I have never managed to apply contour analysis in real problems. Too ideal conditions are required. Either there is no border, or there is too much noise. But, if you need to recognize something under ideal conditions, then contour analysis is a great option. It works very fast, beautiful mathematics and understandable logic.
Singular points
Keypoints are unique characteristics of an object that allow the object to be associated with itself or with similar object classes. There are dozens of ways to select such points. Some methods highlight special points in neighboring frames, some after a long period of time and when lighting changes, some allow you to find special points that remain so even when the object rotates. Let's start with methods that allow us to find special points that are not so stable, but are quickly calculated, and then we will go in increasing complexity:
First grade. Singular points that are stable for seconds. Such points are used to guide an object between adjacent video frames, or to converge images from neighboring cameras. These points include local maxima of the image, corners in the image (the best of the detectors, perhaps, the Haris detector), points at which the dispersion maxima are reached, certain gradients, etc.
Second class. Singular points that are stable when changing lighting and small movements of the object. Such points serve primarily for training and subsequent classification of object types. For example, a pedestrian classifier or a face classifier is the product of a system built on just such points. Some of the previously mentioned wavelets may be the basis for such points. For example, Haar primitives, glare search, search for other specific features. These points include points found by the method of histograms of directional gradients (HOG).
Third class. stable points. I know only about two methods that give complete stability and about their modifications. This and . They allow you to find key points even when you rotate the image. The calculation of such points takes longer than other methods, but for a rather limited time. Unfortunately, these methods are patented. Although, in Russia it is impossible to patent algorithms, so use it for the domestic market.

Part 3. Training

The third part of the story will be devoted to methods that do not work directly with the image, but which allow you to make decisions. Basically, these are various methods of machine learning and decision making. Recently, Yandyks posted on Habr on this topic, there is a very good selection. Here it is in the text version. For a serious study of the subject, I strongly recommend that you look at them. Here I will try to identify several basic methods used specifically in pattern recognition.
In 80% of situations, the essence of learning in the recognition problem is as follows:
There is a test sample on which there are several classes of objects. Let it be the presence / absence of a person in the photo. For each image, there is a set of features that have been highlighted by some feature, be it Haar, HOG, SURF, or some wavelet. The learning algorithm must build such a model, according to which it will be able to analyze the new image and decide which of the objects is in the image.
How it's done? Each of the test images is a point in the feature space. Its coordinates are the weight of each feature in the image. Let our signs be: “The presence of eyes”, “The presence of a nose”, “The presence of two hands”, “The presence of ears”, etc. We will allocate all these signs with the detectors that we have, which are trained on body parts similar to human. For a person in such a space, the correct point will be . For the monkey, dot for the horse. The classifier is trained on a sample of examples. But not all of the photographs showed hands, others had no eyes, and in the third the monkey had a human nose due to a classifier error. The trainable human classifier automatically splits the feature space in such a way as to say: if the first feature lies in the range 0.5 In essence, the purpose of the classifier is to draw in the feature space areas that are characteristic of objects of classification. This is how the successive approximation to the answer for one of the classifiers (AdaBoost) in two-dimensional space will look like:


There are many classifiers. Each of them works better in some of its tasks. The task of selecting a classifier for a specific task is largely an art. Here are some nice pictures on the topic.
Simple case, one-dimensional separation
Let's take an example of the simplest case of classification, when the feature space is one-dimensional, and we need to separate 2 classes. The situation occurs more often than it might seem: for example, when you need to distinguish two signals, or compare a pattern with a sample. Let's say we have a training sample. In this case, an image is obtained, where the X-axis will be a measure of similarity, and the Y-axis will be the number of events with such a measure. When the desired object is similar to itself, a left Gaussian is obtained. When not similar - right. The value X=0.4 separates the samples so that an erroneous decision minimizes the probability of making any wrong decision. It is the search for such a separator that is the task of classification.


Little note. The criterion that minimizes the error will not always be optimal. The following graph is a graph of an actual iris recognition system. For such a system, the criterion is chosen in such a way as to minimize the probability of false admission of an outsider to the object. Such a probability is called "error of the first kind", "probability of false alarm", "false positive". In English literature "False Access Rate".
) AdaBusta is one of the most common classifiers. For example, the Haar cascade is built on it. Usually used when binary classification is needed, but nothing prevents teaching for more classes.
SVM ( , , , ) One of the most powerful classifiers with many implementations. In principle, on the learning tasks that I encountered, it worked similarly to adabusta. It is considered quite fast, but its training is more difficult than that of Adabusta and requires the choice of the correct kernel.

There are also neural networks and regression. But to briefly classify them and show how they differ, an article much larger than this is needed.
________________________________________________
I hope I was able to give a quick overview of the methods used without diving into the math and description. Maybe this will help someone. Although, of course, the article is incomplete and there is not a word about working with stereo images, or about LSM with the Kalman filter, or about the adaptive Bayesian approach.
If you like the article, then I will try to make the second part with a selection of examples of how existing ImageRecognition problems are solved.

And finally

What to read?
1) Once I really liked the book "Digital Image Processing" by B. Yana, which is written simply and clearly, but at the same time almost all mathematics is given. Good for getting familiar with existing methods.
2) The classic of the genre is R Gonzalez, R. Woods "Digital Image Processing". For some reason, it was more difficult for me than the first one. Much less mathematics, but more methods and pictures.
3) “Image processing and analysis in machine vision problems” - written on the basis of a course taught at one of the departments of PhysTech. A lot of methods and their detailed description. But in my opinion, the book has two big minuses: the book is strongly focused on the software package that is attached to it, in the book too often the description of a simple method turns into mathematical jungle, from which it is difficult to take out the structural diagram of the method. But the authors have made a convenient site, where almost all the content is presented - wiki.technicalvision.ru Add tags