Face recognition features versus templates pdf
For instance, the array of grey levels may be suitably preprocessed before matching see also [2]. Several full templates per each face may be used to account for the recognition from different viewpoints. Still another important variation is to use, even for a single viewpoint, multiple templates. A face is stored then as a set of distinct ive smaller templates [1]. A rather different and more complex approach is to use a single template together with a qualitative prior model of how a generic face transforms under a change of viewpoint.
The deformation model is then heuristically built into the metric used by the matching measure; this is the idea underlying the technique of elastic templates see [9], [8], [36]. In order to investigate the two approaches described above, we have developed two new algorithms and tested them on the same database. Although we do not claim that our findings are relevant to how human recognition proceeds, they could well provide some hint as to how it could.
Experimental Setup The database we used for the comparison of the different strategies is composed of images: four for each of 47 people. Of the four pictures available, the first two were taken in the same session a time interval of a few minutes , whereas the other pictures were taken at intervals of some weeks 2 to 4.
The pictures were acquired with a CCD camera at a reso- lution of x pixels as frontal views. The subjects were asked to look into the camera, but no particular efforts were made to ensure perfectly frontal images. The pictures were taken randomly during the day time. In all of the recognition experiments, the learning set had an empty intersection with the testing set. Geometric, Feature-based Matching As we have mentioned previously, the very fact that face recognition is possible even at coarse resolution, when the single facial features are hardly resolved in detail, implies that the overall geometrical configuration of the face features is sufficient for discrimination.
The overall configuration can be described by a vector of numerical data representing the posi- tion and size of the main facial features: eyes and eyebrows, nose, and mouth.
This information can be supplemented by the shape of the face outline. The first three requirements are satisfied by the set of features we have adopted, whereas their information content is characterized by the experiments described later. One of the first attempts at automatic recognition of faces by using a vector of geometrical features was due to Kanade [18] in Using a robust feature detector built from simple modules used within a backtracking strategy , a set of 16 features was computed.
Analysis of the inter and intraclass variances revealed some of the parameters to be ineffective, yielding a vector of reduced dimensionality Kanade's system achieved a peak performance of 0. The computer procedures we implemented are loosely based on Kanade's work and will be detailed in the next section. The database used is, however, more meaningful in the sense of being larger both in the number of classes 47 different 1 Visual inspection of the database revealed no significant deviation from a frontal view but no quantitative analysis was done.
Normalization One of the most critical issues in using a vector of geomet- rical features is that of proper normalization. The extracted features must be somehow normalized in order to be inde- pendent of position, scale, and rotation of the face in the image plane. Translation dependency can be eliminated once the origin of coordinates is set to a point that can be detected with good accuracy in each image.
The approach we have followed achieves scale and rotation invariance by setting the interocular distance and the direction of the eye-to-eye axis. We will describe the steps of the normalization procedure in some detail since they are themselves of some interest an alternative more recent strategy that is even faster and has comparable performance can be found in [32].
This normalization rescales the template and image energy distribution so that their average and variances match. The eyes of one of the authors without eyebrows were used as a template to locate eyes on the image to be normalized. To cope with scale variations, a set of five eye templates was used; these were obtained by scaling the original one the set of scales used is 0. Eye position was then determined looking for the maximum absolute value of the normalized correlation values one for each of the templates.
To make correlation more robust against illumination gradients, each image was prenormalized by dividing each pixel by the average intensity over a suitably large neighborhood. It is well known that correlation is computationally expen- sive. Additionally, eyes of different people can be markedly different.
These difficulties2 can be significantly reduced by using hierarchical correlation as proposed by Burt in [10]. Gaussian pyramids of the prenormalized image and templates are built. Correlation is performed starting from the lowest resolution level, progressively reducing the area of computation from 2 Influence of eye shape can be further reduced by introducing a modified correlation coefficient.
Let f! The new cross correlation is considered: 2 whose normalized form is similar to 1. The newly introduced coefficient introduces the possibility of local deformation in the computation of similarity. The interplay of the two techniques hierarchical correlation and modified correlation coefficient proved very effective, yielding no errors on the available database.
Once the eyes have been detected, scale is preadjusted using the ratio of the scale of the best responding template to the reference template. The position of the left and right eye is then refined using the same technique with a left and a right eye template. The resulting normalization proved to be good. Once the eyes have been independently located, rotation can be fixed by imposing the direction of the eye-to-eye axis, which we assumed to be horizontal in the natural reference frame.
Feature Extraction Face recognition, although difficult, presents a set of in- teresting constraints that can be exploited in the recovery of facial features. The first important constraint is bilateral symmetry. Another set of constraints derives from the fact that almost every face has two eyes, one nose, and one mouth with a very similar layout. Although this may make the task of face classification more difficult, it can ease the task of feature extraction. The following paragraphs briefly explore the implication of bilateral symmetry and expose some ideas on how anthropometric measures can be used to focus the search of a particular facial feature and to validate results obtained through simple image processing techniques [4], [5].
A very useful technique for the extraction of facial features is that of integral projections. Let J x, y be our image. Projec- tions can be extremely effective in determining the position of features, provided the window on which they act is suitably located to avoid misleading interferences. In the original work of Kanade, the projection analysis was performed on a binary picture obtained by applying a Laplacian operator Fig.
Horizontal and vertical edge dominance maps. The use of a Laplacian operator, however, does not provide information on edge that is gradient directions. We have chosen, therefore, to perform edge projection analysis by partitioning the edge map in terms of edge directions. There are two main directions in our constrained face pictures: horizontal and vertical see Fig.
Once eyes have been located using template matching, the search for the other features can take advantage of the knowledge of their average layout an initial estimate, to be updated when new faces are added to the database, can be derived by manually locating features on a single face.
The vertical position is guessed using anthropo- metric standards. A first, refined estimate of their real position is obtained looking for peaks of the horizontal projection of the vertical gradient for the nose and for valleys of the horizontal projection of the intensity for the mouth the line between the lips is the darkest structure in the area due to its configuration.
The peaks and valleys are then rated using their prominence and distance from the expected location height and depth are weighted by a Gaussian factor. The ones with the highest rating are taken to be the vertical position of nose and mouth. Having established the vertical position, the search is limited to smaller windows.
The nose is delimited horizontally, searching for peaks in the vertical projection of the horizontal edge map whose height is above the average value in the searched window. The nose boundaries are estimated from the left- and right-most peaks. Mouth height is computed using the same technique but applied to the vertical gradient component.
The use of directional information is quite effective at this stage, cleaning much of the noise that would otherwise impair the feature extraction process. Mouth width is finally computed, thresh- 3 A pixel is considered to be in the vertical edge map if the magnitude of the vertical component of the gradient at that pixel is greater than the horizontal one. The gradient is computed using a Gaussian regularization of the image. Only points where the gradient intensity is above an automatically selected threshold are considered [34,4].
Typical edge projections data. The search is once again limited to a focused window, just above the eyes, and the eyebrows are found using the vertical gradient map. Our eyebrow detector looks for pairs of peaks of gradient intensity with opposite direction. Pairs from one eye are compared with those of the other one; the most similar pair in terms of the distance from the eye center and thickness is selected as the correct one.
Given this information, the upper and lower boundaries of the left eyebrow are followed, and the set of features shown in Fig. No hairline information is considered because it may change considerably in time. Again, we have attempted to exploit the natural constraints of faces. Because the face outline is essentially elliptical, dynamic programming has been used to follow the outline on a gradient intensity map of an elliptical projection of the face image see Figs.
The reason for using an elliptical coordinate system is that a typical face outline is approximately represented by a line. The computation of the cost function to be minimized deviation from the assumed shape, an ellipse represented as a line is simplified, resulting in a serial dynamic problem that can be efficiently solved see [5].
A pictorial presentation of the features is given in Fig. Recognition Performance Detection of the features listed above associates with each face a D numerical vector. Recognition is then performed with a Bayes classifier. Our main experiment aims to characterize the performance of the feature-based technique as a function of the number of classes to be discriminated.
Other experiments try to assess performance when the possibility of rejection is introduced. We assume that the feature vectors for a single person are distributed according to a Gaussian distribution.
Different people are characterized only by the average value while the distribution is common. Geometrical features white used in the face recognition experi- ments. An unknown vector is then associated with the nearest neighbor in the databse, i. The feature vectors are expressed as a linear combination of the eigenvectors of the covariance matrix, sorted by decreas- ing eigenvalue; the first components in the new expression of the vectors are the most effective in capturing the variance of the data.
The fraction of the total variance captured by the first n eigenvectors is reported in Fig. The performance that can be achieved using the first n principal components is reported in Fig. A useful data on the robustness of the classification is given by an estimate of the intraclass variability as opposed to the interclass variability. In our experiments, each class was represented by a single element the arithmetic average of the available examples so that the maximum distance reduces to the distance from the representing vector.
The HyperBF classifier used in previous experiments of 3-D object recognition [25], [6] allows the automatic choice of the appropriate metric, which is still, however, a weighted Euclidean metric. Feature vectors are expanded in the eigenvectors of the covariance matrix of the available data. Performances obtained with an increasing number of principal components. The values reported for the different experiments are average values of the ratio computed for the different classes; if the value is higher, then the classes are more discriminable.
It is important to note that the the vectors of geometrical features extracted by our system have low stability, i. This suggests that performance could be improved with more accurate feature detectors see also [19] and [18], where the use of manually extracted features is studied. It is not clear, however, how to design more accurate feature detectors. An important issue is how performance scales with the size of the database. To obtain these data, a number of recognition experiments have been conducted on randomly chosen subsets of classes at the different required cardinalities random subsets at each cardinality.
The plots in Fig. As expected, both data exhibit a monotonically decreasing trend for increasing cardinality of the classes. A possible way to enhance the robustness of classification is the introduction of a rejection threshold. The classifier can then suspend classification if the input is not sufficiently similar to any of the available models.
Rejection could trigger the action of a different classifier or the use of a different recognition strategy such as voice identification.
Performance as a function of the number of classes to be discrim- inated. Analysis of the classifier as a function of the rejection threshold. If the distance of a given input vector from all of the stored models exceeds the rejection threshold, the vector is rejected. A possible figure of merit of a classifier with rejection is given by the recognition performance with no errors vectors are either correctly recognized or rejected.
The average performance of our classifier as a function of the rejection threshold is given in Fig. Template Matching Strategy The other class of approaches to automated face recognition has, at its core, a very simple recognition technique based on the use of whole image grey-level templates. The most direct of the matching procedures is correlation and is the basis of the work of Baron [1].
The system we implemented is an extension of the little-known work of Baron which is extensively described in [1]. First, the image is normalized using the same technique described in the previous section. The location of the four masks relative to the normalized eye position is the same for the whole database. Different regions used in the template matching strategy. When attempting recognition, the unclassified image is compared, in turn, with all of the database images, returning a vector of matching scores one per feature computed through normalized cross correlation.
The unknown person is then classified as the one giving the highest cumulative score. The main difference between our approach and that of Baron lies in the window selection procedure. It is then possible to automatically add a complete database entry for an unknown person, thereby easing the update of available information. Another minor difference can be found in the eye location procedure used for normalization; the one we propose uses a single template at different scales as opposed to the 16 of Baron's.
As already pointed out, correlation is sensitive to illumi- nation gradients and the question arises as to whether there is a way to preprocess the compared images to get rid of this confounding effect.
Trying to decide experimentally this point, we ran four different experiments of recognition using correlation on images preprocessed in different ways. Recognition performance for different preprocessings as a function of the intereyes distance.
See text for preprocessing labels. Recognition performance with error information as a function of the rejection threshold. The best results have been obtained using gradient information.
The performance, as a function of the rejection threshold and the size of the database, is reported in Figs. An interesting question is the dependency of recognition on the resolution of the available image.
To investigate this dependency, recognition was attempted on a multiresolution representation of the available pictures a Gaussian pyramid of the preprocessed image. The range of resolution was 1 8 the Gaussian pyramids having four levels. As the performance plots reveal, recognition is stable on a range 1t4, implying that correlation-based recognition is possible at a good performance level using templates i.
A prag- matic answer is to assign the discrimination power based on the recognition performance using each single feature. The experimental analysis shows that the features we used can be sorted by decreasing performance as follows: 1 eyes 2 nose 3 mouth 4 whole face template. The recognition rate achieved with a single feature eyes, nose, or mouth is remarkable and consistent with the human ability of recognizing familiar people from a single facial characteristic. The similarity score of the different features can be integrated to obtain a global score.
Comparison of an unknown image with the whole database can take advantage of special-purpose hardware or distributed processing. An efficient strategy for template matching has recently been proposed in [31].
Average performance for recognition based on each single feature. The strategy adopted in the reported experiment is the second one; the feature's scores are simply added. The integration of more features has a beneficial effect on recognition and on the robustness of the classification see the plots in Figs.
Interestingly, the whole face is the least discriminating template. This is partly due to the difficulty of perfectly normalizing the scale of the pictures. It is also expected that the whole face is the template most sensitive to slight deformations due to deviations from frontal views. In this approach, both geometrical and holistic feature information is used at the same time.
Geometrical information plays a role when the mask stored in the database is used to locate the corresponding zone on the unknown image, whereas holistic information is taken into account by the pixel-by- pixel comparison of the correlation procedure.
Performance can be increased by using templates from more than one image per person. Network diagram of the hyper basis functions technique. Conclusion We have investigated performance of automatic techniques for face recognition from images of frontal views. Two dif- ferent approaches have been compared in terms of two simple new algorithms that we have developed and implemented. The two approaches are as follows: identification through a vector of geometrical features and identification through a template matching strategy.
Our use of template matching is superior in recognition performance on our database. Share This Paper. Background Citations. Methods Citations. Results Citations. Figures and Topics from this paper. Template matching Computation Algorithm. Citation Type. Has PDF. Publication Type. More Filters. Face recognition by feature orientation and feature geometry matching.
Proceedings of 13th International Conference on Pattern Recognition. View 2 excerpts, cites methods. Review of Face recognition algorithms. Face detection is one of the most relevant applications of image processing and biometric systems. In this paper, we summarized the procedure of face recognition and some common methods, including … Expand. Detection of Facial Features. An algorithm for the automatic features detection in 2D color images of human faces is presented. It first identifies the eyes by means of a template matching technique, a neural network classifier, … Expand.
View 1 excerpt, cites background. Face Recognition: Shape versus Texture. Feature Based Face Recognition and Detection. Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult.
Face recognition is the biometric method, to identify … Expand. View 1 excerpt. Occluded face recognition based on Gabor wavelets. International Conference on Image Processing. A new feature based approach to frontal face recognition with Gabor wavelets is presented.
The feature points are automatically extracted using the local characteristics of each individual face in … Expand. Study of Face Recognition Techniques. Highly Influenced. View 3 excerpts, cites background. A robust face recognition method using edge-based features.
Face recognition, as one of the most interesting and successful applications of image understanding and machine vision, has grown a lot of attention during the past years. Basically human faces are … Expand.
0コメント