identification of human skin regions using color and...

Identification of Human Skin Regions Using Color and Texture

Mark Smith University of Central

ArkansasConway, Arkansas 72035

Ray Hashemi Armstrong Atlantic

UniversitySavannah, Georgia 31419

Leslie SearsArmstrong Atlantic

UniversitySavannah, Georgia 31419

Abstract

A novel approach identifying and segmenting skin regions within images is presented in this paper. The identification and recognition of facial regions are a central focus of this work. A set of standard images containing facial/skin objects is first manually segmented into the interested regions. These regions are utilized in the training the system. Dominant color features (i.e., the most frequently occurring quantized colors) along with texture features generated from the co-occurrence matrix are extracted from the training regions. An example image is then presented to the system. The image undergoes a standard image segmentation algorithm that splits the image into consistent objects. The same color/texture features are extracted from the example regions. A similarity measurement is computed and the regions of the example image are subsequently classified as skin/non skin regions. Results are shown for several standard mpeg such as Foreman, Salesman, Miss America, and others.

1. Introduction

The convenience in capturing and encoding of digital images has caused a massive amount of visual information to be produced and processed rapidly. Hence, efficient tools and systems for searching and retrieving visual information are needed. There is especially a need to identify people and human features as they appear in images and video sequences. The identification of humans within images allows for a more intelligent and efficient retrieval system. Other applications consist of classifying video scenes as active or background and allows for filtering large regions of the video. This system focuses on visual information pertaining to skin, especially facial

regions, which even increases the complexity of the matching and retrieval algorithm significantly.Skin identification and segmentaion has received much interest in recent years as a research topic[1], [3].There are currently several general-purpose systems available for skin and facial identification: QBIC [2], PhotoBook [5], Virage [4], and VisualSEEk [1]. Our system focuses only on the analysis and identificationof human skin objects [6-7]. The following steps of our system are summarized below:

• A set of training images undergo manual image segmentation. All skin regions are identified.

• The Dominant Color feature is extracted from each skin region.

• Texture features consisting of the co-occurrence matrix are extracted from each skin region.

• Example images are automatically segmented using a standard region segmentation algorithm.

• Dominant color and co-occurrence texture features are extracted from each region.

• A similarity measurement is computed between the training objects and each example region.

• A sufficiently small similarity measurement results in the example region classified as a skin region.

2. Training Set

The primary objective of this step is to provide a representative set of color and texture features corresponding to true skin data. The steps involved with this portion consist of the following:

1. Manually segment skin textures within a set of images

2. Extract Dominant Color Features3. Extract texture features from Co-occurrence

Matrix

The training set utilized by this system was extracted from the Foreman video. Examples of the manual segmentation of this video sequence is shown below:

.

The next step extracts the Dominant Colors/Texture features from these manually segmented images.

3. Dominant ColorsThe color feature utilized in this measurement

consists of all quantized RGB colorshaving a concentration greater than 5% extracted from a given object. These colors arealso referred to as the dominant colors of the object. The Dominant Colors are one of the MPEG-7 features. A color quantization step is performed that quantizes the total number of colors to only 16. The color processing to detect those 16 with 5% or more concentration is illustrated below in Fig. 2:

4. Texture FeaturesThe Gray-Level Co-occurrence Matrix (GLCM) is

one of the most popular statistical texture measurements [2] and has been used as the primary component in a wide range of image segmentation applications [4]. The GLCM is a second-orderstatistical measurement; second-order statistics take into account the relationship between groups of two (usually neighboring) pixels in the original image. In

contrast, first-order stats, (e.g., mean and variance), do not consider any neighborhood associations.

4.1 Computing Co-Occurrence MatrixThe process by which the GLCM is computed is

outlined as follows:1. The GLCM computation utilizes the relation

between two pixels at a time; one is called the reference and the other the neighbor pixel.

2. A displacement vector d, distance in horizontal direction or distance in vertical direction.

3. A displacement vector is selected and determines the relationship between the pixels in the image. Utilizing only neighboring pixels (d = 1) is the most commonly used distance

4. There are 8 possible relationships (i.e., displacement vectors) that can be formed between neighboring pixels (directions between neighboring pixels are shown in parenthesis –the first component refers to the horizontal displacement, whereas the second parameter refers to the vertical displacement.Positive horizontal values represent right

neighboring pixels while negative values correspond to left neighbors. Positive vertical values represent aneighboring pixel above the reference pixel, while negative values correspond to a neighboring pixel below the reference pixel.

The values extracted from the co-occurrence matrix represent an excellent texture measurement and is stored in a vector as defined as follows. The co-occurrence matrix is best represented as a probability density as shown in Equation (1) below:

0

iji N M

iji j

VP

C=

=

∑∑(1)

The co-occurrence matrix is computed over each facial object and maintained for matching with example objects. A seven dimensional vector, consisting of:• entropy• •homogeneity• energy• mean• standard deviation• coefficient of correlation• contrastprovides the texture features utilized by this system.

Fig. 1. Foreman test sequence between 126-146.

Fig. 2. Detection of Dominant Colors in Images.

(a) (b) (c)Fig. 3. Examples of initial image segmentation for HappyGranny sequences for (a) Frame 90 (b) Frame 91 (c) Frame 92.

5. Segmenting Example ImagesThere are several image segmentation algorithms

available in the literature, thus providing many different choices for this step in our system. The image segmentation process used in this system is fully described in an earlier work of the authors [4] and is applied to each frame of the video sequence.

The results of our image segmentation algorithm applied to 3 frames of the HappyGranny sequence are shown in Fig. 3. Note the inconsistencies between segmented regions.

6. Similarity Measurement

After the example image has been segmented into meaning objects, the dominant color and co-occurrence texture features are extracted from each object. Each object is then compared with the features extracted from the training objects. The similarity function is defined as the norm difference between the vector features as shown below:

| | | || | | |

t obj ct cobj

t ct

T T D DDiff

T D− −

= + (2)

The resulting difference is then compared with an empirically computed threshold T as

T Diff> (4)If Diff is less than T as shown in equation (4), the

object is classified as skin region.

7. Testing and ResultsThe system is tested on the following

images/videos - Car, Tennis, Claire, and Susie. All regions identified as skin objects are outlined in purple.The facial objects are automatically segmented using the algorithm described in section 5 and the color/texture features are extracted as described in sections 3 and 4. The training objects/features have already been processed and the similarity measurement is applied as in section 6.

arching/matching algorithm given in section 3.

Fig. 4. Carphone test sequence between 168-182.

Fig. 5. Tennis test sequence between 0-20.

Fig. 6. Claire test sequence between 71-88.

Fig. 7. Susie test sequence between 53-73.

The following video clips where segmented and then compared using the proposed system. The following results pertaining to video clip comparison are shown below:

Table 1 Results

Video #Objects #Skin Correct PercentCorrectLet It Be 35 9 8 97.1

Say the Word 23 12 10 91.3Taxman 9 6 5 88.9

Love me do 29 11 10 96.5

Overall, the results are very positive and the system does a good job classifying the skin regionsbased on the specified criteria.

.8. Conclusion and Future Work

The algorithm described in this work accurately classifies human skin regions based on test set criteria provided by the user The system is successfully tested on a variety of standard mpeg-4 videos, consisting of many human skin regions - totaling nearly 100 skin objects in all. The algorithm performs well under a variety of different conditions and circumstances. The results and the technology derived from this work has proven to be very exciting and we look forward to developing a series of additional applications from this work.

9. References

[1] Y. Deng and B. S. Manjunath, “Unsupervised segmentation of color-texture regions in images and video,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 22, no. 6, pp. 939-954, 2001.[2] T. Aach, A Kaup, and R. Mester, “Statistical model-based change detection in moving video,” IEEE Trans. on Signal Processing, vol. 31, no 2, pp. 165-180, March 1993.[3] A. Nagasak and Y. Tanka, “Automatic video indexing and full video search for object appearances,” in Visual Database System II, Elsevier, 1992, pp. 113-127.[4] M. Smith and A. Khotanzad, “Unsupervised object-based video segmentation using color and texture features,” IEEESouthwest Symposium on Image Analysis, March, 2006.[5] J. Goldberger and H. Greenspan,” Context-based segmentation of image sequences,” IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 28, no. 3, pp. 463-468, March 2006.[6] H. Tao, “Object tracking with Bayesian estimation of dynamic layer representations,” IEEE Trans. On

Pattern Anal. And Mach Intelli., vol. 24, no. 1, pp. 75- 89, January 2002.[7] F. Porikli, “Real-time video object segmentation for MPEG encoded video sequences,”TR-2004-011, pp. 178-189, March 2004.

identification of human skin regions using color and...

Documents