Apparatus and method for detecting speaking person's eyes and face
First Claim
1. An apparatus for detecting a speaking person'"'"'s eye and face, the apparatus comprising:
- an eye position detecting means for detecting pixels having a strong gray characteristic to determine areas having locality and texture characteristics as eye candidate areas among areas formed by the detected pixels, in an input red, blue, and green (RGB) image;
a face position determining means for creating search templates by matching a model template to two areas extracted from the eye candidate areas, and determining an optimum search template among the created search templates by using the value normalizing the sum of a probability distance for the chromaticity of pixels within the area of a search template, and horizontal edge sizes calculated in the positions of the left and right eyes, a mouth and a nose estimated by the search template; and
an extraction position stabilizing means for forming a minimum boundary rectangle by the optimum search template, and increasing count values corresponding to the minimum boundary rectangle area and reducing count values corresponding to an area other than the minimum boundary rectangle area, among count values of individual pixels, stored in a shape memory, to output the area in which count values above a predetermined value are positioned, as eye and face areas.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus for detecting the position of a human face in an input image or video image and a method thereof are provided. The apparatus includes an eye position detecting means for detecting pixels having a strong gray characteristic to determine areas having locality and texture characteristics as eye candidate areas among areas formed by the detected pixels, in an input red, blue, and green (RGB) image, a face position determining means for creating search templates by matching a model template to two areas extracted from the eye candidate areas, and determining an optimum search template among the created search templates by using the value normalizing the sum of a probability distance for the chromaticity of pixels within the area of a search template, and horizontal edge sizes calculated in the positions of the left and right eyes, a mouth and a nose estimated by the search template, and an extraction position stabilizing means for forming a minimum boundary rectangle by the optimum search template, and increasing count values corresponding to the minimum boundary rectangle area and reducing count values corresponding to an area other than the minimum boundary rectangle area, among count values of individual pixels, stored in a shape memory, to output the area in which count values above a predetermined value are positioned, as eye and face areas. The apparatus is capable of accurately and quickly detecting a speaking person'"'"'s eyes and face in an image, and is tolerant of image noise.
135 Citations
32 Claims
-
1. An apparatus for detecting a speaking person'"'"'s eye and face, the apparatus comprising:
-
an eye position detecting means for detecting pixels having a strong gray characteristic to determine areas having locality and texture characteristics as eye candidate areas among areas formed by the detected pixels, in an input red, blue, and green (RGB) image;
a face position determining means for creating search templates by matching a model template to two areas extracted from the eye candidate areas, and determining an optimum search template among the created search templates by using the value normalizing the sum of a probability distance for the chromaticity of pixels within the area of a search template, and horizontal edge sizes calculated in the positions of the left and right eyes, a mouth and a nose estimated by the search template; and
an extraction position stabilizing means for forming a minimum boundary rectangle by the optimum search template, and increasing count values corresponding to the minimum boundary rectangle area and reducing count values corresponding to an area other than the minimum boundary rectangle area, among count values of individual pixels, stored in a shape memory, to output the area in which count values above a predetermined value are positioned, as eye and face areas. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
a strong gray extraction unit for interpreting an input RGB image signal to extract pixels that represent a strong gray characteristic;
an area formation unit for forming areas by combining adjacent pixels with each other among the extracted pixels;
an area shape interpreting unit for detecting a locality characteristic for each formed area;
a texture extraction unit for detecting a texture characteristic for each formed area; and
an eye candidate determining unit for determining areas in which the locality and texture characteristics, respectively, are greater than predetermined values as eye candidate areas, among the formed areas.
-
-
3. The apparatus of claim 1, wherein the face position determining means comprises:
-
a face template creation unit for creating search templates by matching a previously provided model template to the positions of the two areas extracted from the eye candidate areas to perform similarity transformation on the matched model template to create a search template in an input RGB image;
a probability distance operation unit for calculating a normalized probability distance for normalizing the sum of the probability distances for chromaticity of pixels within a search template area in an RGB image, with respect to the size of the search template;
an edge feature interpreting unit for detecting horizontal edge feature values of an RGB image input from the positions of eyes, a nose, and a mouth estimated in the search template; and
an optimum search template determining unit for determining an optimum search template among a plurality of search templates created by the face template creation unit, according to the values obtained by setting predetermined weights on the normalized probability distance and the horizontal edge feature values.
-
-
4. The apparatus of claim 1, wherein the extraction position stabilizing means comprises:
-
a shape memory for storing the count values of the number of pixels corresponding to the size of the input RGB image;
a minimum boundary rectangle formation unit for forming a minimum boundary rectangle in which a face image is included within the optimum search template;
a shape memory renewal unit for increasing the count values corresponding to an area of the minimum boundary rectangle area and reducing the count values corresponding to an area outside the minimum boundary rectangle area, among count values of individual pixels stored in the shape memory; and
a tracking position extraction unit for outputting an area in which count values above a predetermined value are positioned in the shape memory as a speaking person'"'"'s eye and face areas.
-
-
5. The apparatus of claim 2, wherein the strong gray extraction unit extracts pixels of the RGB image, in each of which the difference between a maximum value and a minimum value of a color component representing a color is less than a predetermined value and the maximum value is less than another predetermined value, as pixels having a strong gray characteristic.
-
6. The apparatus of claim 2, wherein the area shape interpreting unit comprises a circularity interpreting unit for computing a circularity value of each area, and
wherein the eye candidate determining unit removes an area, the circularity value of which is less than a predetermined value, from the eye candidate areas. -
7. The apparatus of claim 2, wherein the area shape interpreting unit comprises a height-width ratio interpreting unit for computing the height-width ratio of each area;
- and
wherein the eye candidate determining unit removes an area, the height-width ratio of which is less than a predetermined value or is greater than another predetermined value, from the eye candidate areas.
- and
-
8. The apparatus of claim 2, wherein the area shape interpreting unit comprises an area size interpreting unit for computing the size of each area relative to the size of the overall image, and
wherein the eye candidate determining unit removes an area, the relative size of which is greater than a predetermined value, from the eye candidate areas. -
9. The apparatus of claim 2, wherein the texture extraction unit comprises a morphology interpreting unit with a minimum morphology filter for computing the texture response of each area;
- and
wherein the eye candidate determining unit removes an area, the texture characteristic value of which is less than a predetermined value, from the eye candidate areas.
- and
-
10. The apparatus of claim 2, wherein the texture extraction unit comprises a horizontal edge interpreting unit with a differential filter for detecting the horizontal edge of each area;
wherein the eye candidate determining unit removes an area, the horizontal edge characteristic value of which is less than a predetermined value, from the eye candidate areas.
-
11. The apparatus of claim 3, wherein the model template is formed of a rectangle including two circles indicative of the left and right eyes, in which the base of the rectangle is located between nose and mouth portions.
-
12. The apparatus of claim 3, wherein the probability distance d is calculated by the following equation:
-
13. The apparatus of claim 3, wherein the edge feature interpreting unit detects a first horizontal edge size of the input RGB image corresponding to the mouth and nose positions estimated in the search template, and a second horizontal edge size of the input RGB image corresponding to an area matched to the search template, except the positions of eyes, nose and mouth, and calculates the edge component ratio that normalizes the ratio of the first horizontal edge size to the second horizontal edge size.
-
14. The apparatus of claim 13, wherein the edge feature interpreting unit detects the horizontal edge size of areas of the RGB image corresponding to eyes normalized over the size of the circles indicative of the eye position, and
wherein the optimum search template determining unit determines a template, having the smallest sum of the normalized probability distance, the edge component ratio, and the normalized horizontal edge size of areas of the RGB image corresponding to the eyes which are each set with predetermined weights, as an optimum search template. -
15. The apparatus of claim 3, wherein, if an area that is formed by superimposing a plurality of search templates is located independently of an area formed by superimposing other search templates, the optimum search template determining unit determines optimum search templates of independent areas.
-
16. The apparatus of claim 4, further comprising a speed &
- shape interpreting unit for computing the size and moving speed of the minimum boundary rectangle to control the range of values increased or reduced by the shape memory renewal unit.
-
17. A method of detecting a speaking person'"'"'s eye and face areas, the method comprising the steps of:
-
(a) detecting pixels having a strong gray characteristic to determine areas having locality and texture characteristics as eye candidate areas among areas formed by the detected pixels, in an input red, blue, and green (RGB) image;
(b) creating search templates by matching a model template to two areas extracted from the eye candidate areas, and determining an optimum search template among the created search templates by using the value normalizing the sum of a probability distance for the chromaticity of pixels within the area of a search template, and horizontal edge sizes in the positions of the left and right eyes, a mouth and a nose, estimated by the search template, in the RGB image; and
(c) forming a minimum boundary rectangle by the optimum search template, and increasing count values corresponding to the minimum boundary rectangle area and reducing count values corresponding to an area other than the minimum boundary rectangle area, among count values of individual pixels, stored in a shape memory, to output the area, in which count values above a predetermined value are positioned, as eye and face areas. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
(a1) interpreting an input RGB image signal to extract pixels that represent a strong gray characteristic;
(a2) forming areas by combining adjacent pixels with each other among the extracted pixels;
(a3) detecting a locality characteristic in each formed area;
(a4) detecting a texture characteristic in each formed area; and
(a5) determining areas, in which the locality and texture characteristics, respectively, are greater than predetermined values, among the formed areas, as eye candidate areas.
-
-
19. The method of claim 17, wherein the step (b) comprises the steps of:
-
(b1) creating search templates in the RGB image by matching a previously provided model template to the positions of the two areas extracted from the eye candidate areas, to perform similarity transformation on the matched model template;
(b2) calculating a normalized probability distance for normalizing the sum of the probability distance for chromaticity of pixels within a search template area by the size of the search template, in the RGB image;
(b3) detecting horizontal edge feature values of the RGB image input from the positions of eyes, a nose, and a mouth estimated in the search template; and
(b4) determining an optimum search template among a plurality of search templates created by the face template creation unit, by using the values obtained by setting predetermined weights on the normalized probability distance and the horizontal edge feature value.
-
-
20. The apparatus of claim 17, wherein the step (c) comprises the steps of
(c1) forming the minimum boundary rectangle in which a face image is included within the optimum search template; -
(c2) increasing the count values corresponding to an area of the minimum boundary rectangle and reducing the count values corresponding to an area outside the minimum boundary rectangle area, among count values of individual pixels stored in the shape memory; and
(c3) outputting an area in which count values above a predetermined value are positioned in the shape memory as a speaking person'"'"'s eye and face areas.
-
-
21. The method of claim 18, wherein, in the step (a1), pixels of the RGB image, for each of which the difference between a maximum value and a minimum value of a color component representing a color is less than a predetermined value, and the maximum value is less than another predetermined value, are extracted as pixels having a strong gray characteristic.
-
22. The method of claim 18, wherein, in the step (a3), the circularity value of each area is calculated, and
wherein, in the step (a5), an area, the circularity value of which is less than a predetermined value, is removed from the eye candidate areas. -
23. The method of claim 18, wherein, in the step (a3), the height-width ratio of each area is calculated;
- and
wherein an area, the height-width ratio of which is less than a predetermined value or is greater than another predetermined value, is removed from the eye candidate areas.
- and
-
24. The method of claim 18, wherein, in the step (a3), the size of each area relative to the size of the overall image is calculated, and
wherein, in the step (a5), an area, the relative size of which is greater than a predetermined value, is removed from the eye candidate areas. -
25. The method of claim 18, wherein, in the step (a4), the texture response of each area is calculated;
- and
wherein, in the step (a5), an area, the texture characteristic value of which is less than a predetermined value, is removed from the eye candidate areas.
- and
-
26. The method of claim 18, wherein, in the step (a4), the horizontal edge of each area is detected;
- and
wherein, in the step (a5), an area, the horizontal edge characteristic value of which is less than a predetermined value, is removed from the eye candidate areas.
- and
-
27. The method of claim 19, wherein the model template is formed of a rectangle including two circles indicative of the left and right eyes, the base of which is located between noise and mouth portions.
-
28. The method of claim 19, wherein the probability distance d is calculated by the following equation:
-
29. The method of claim 19, wherein, in the step (b3), a first horizontal edge size of the input RGB image corresponding to the mouth and nose positions estimated in the search template, and a second horizontal edge size of the input RGB image corresponding to an area matched to the search template, except the positions of eyes, nose and mouth, are detected, and the edge component ratio that is a ratio of the first horizontal edge size to the second horizontal edge size is calculated.
-
30. The method of claim 29, wherein the step (b3) further comprises the step of detecting the horizontal edge size of areas of the RGB image corresponding to normalized by the size of the circles indicative of the eye positions, and
wherein, in the step (b4), a template, having the smallest sum of the normalized probability distance, the edge component ratio, and the normalized horizontal edge size of the areas of the RGB image corresponding to the eyes, which are each set with predetermined weights, is determined as an optimum search template. -
31. The method of claim 19, wherein, in the step (b4), if an area that is formed by superimposing a plurality of search templates is located independently of an area formed by superimposing other search templates, the optimum search template determining unit determines optimum search templates of independent areas.
-
32. The method of claim 20, after the step (c1), further comprising the step of computing the size and moving speed of the minimum boundary rectangle to control the range of values increased or reduced by the shape memory renewal unit.
Specification