Multi-modal interface apparatus and method
First Claim
Patent Images
1. Multi-modal interface apparatus, comprising:
- an image input unit configured to continually input an image of a user'"'"'s entire face during the user'"'"'s operation on a display;
a face image processing unit configured to extract a feature from the user'"'"'s face image, the first time that the user'"'"'s operation of predetermined object on the display is detected a recognition decision unit configured to store the feature of the user'"'"'s face image as a dictionary pattern and the user'"'"'s operation selecting the predetermined object as an event, the dictionary pattern representing the user'"'"'s gaze for the predetermined object, and to recognize the user'"'"'s face image newly inputted through said image input unit by referring to the dictionary pattern; and
an object control unit configured to execute the event for the predetermined object on the display, when the user'"'"'s face image newly inputted is recognized as the dictionary pattern, wherein said recognition decision unit deletes the dictionary pattern corresponding to the predetermined object, when said object control unit detects movement or deletion of the predetermined object on the display.
0 Assignments
0 Petitions
Accused Products
Abstract
In the multi-modal interface apparatus of the present invention, a gaze object detection section always detects a user'"'"'s gaze object. The user inputs at least one medium of sound information, character information, image information and operation information through a media input section. In order to effectively input and output information between the user and the apparatus, a personified image presentation section presents a personified image to the user based on the user'"'"'s gaze object. A control section controls a reception of the inputted media from the media input section based on the user'"'"'s gaze object.
125 Citations
19 Claims
-
1. Multi-modal interface apparatus, comprising:
-
an image input unit configured to continually input an image of a user'"'"'s entire face during the user'"'"'s operation on a display;
a face image processing unit configured to extract a feature from the user'"'"'s face image, the first time that the user'"'"'s operation of predetermined object on the display is detected a recognition decision unit configured to store the feature of the user'"'"'s face image as a dictionary pattern and the user'"'"'s operation selecting the predetermined object as an event, the dictionary pattern representing the user'"'"'s gaze for the predetermined object, and to recognize the user'"'"'s face image newly inputted through said image input unit by referring to the dictionary pattern; and
an object control unit configured to execute the event for the predetermined object on the display, when the user'"'"'s face image newly inputted is recognized as the dictionary pattern, wherein said recognition decision unit deletes the dictionary pattern corresponding to the predetermined object, when said object control unit detects movement or deletion of the predetermined object on the display. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
wherein said recognition decision unit indicates to said face image processing unit to collect a plurality of the user'"'"'s face images when said object control unit first detects the user'"'"'s operation of the predetermined object on the display. -
3. The multi-modal interface apparatus according to claim 2,
wherein said recognition decision unit indicates to said face image processing unit to extract the feature from the plurality of the user'"'"'s face images when a number corresponding to the plurality of the user'"'"'s face images is equal to predetermined number. -
4. The multi-modal interface apparatus according to claim 1,
wherein said recognition decision unit recognizes an inputted user'"'"'s face image by comparing the feature of the inputted user'"'"'s face image with the dictionary pattern, when said object control unit detects a mouse event other than focusing the predetermined object on the display and if the dictionary pattern corresponding to the predetermined object is already registered. -
5. The multi-modal interface apparatus according to claim 4,
wherein said recognition decision unit recognizes an inputted user'"'"'s face image by comparing the feature of the inputted user'"'"'s face image with the dictionary pattern, when said object control unit detects the user'"'"'s operation other than the mouse event on the display. -
6. The multi-modal interface apparatus according to claim 5,
wherein, said object control unit automatically executes the event corresponding to the dictionary pattern on the display, when said recognition decision unit determines that the feature of the inputted user'"'"'s face image coincides with the dictionary pattern. -
7. The multi-modal interface apparatus according to claim 1,
wherein the predetermined object is a window on the display, and wherein the event is focusing of the window. -
8. The multi-modal interface apparatus according to claim 1,
wherein said recognition decision unit utilizes a subspace method in case of generation processing of the dictionary pattern and recognition processing of the newly inputted user'"'"'s face image. -
9. The multi-modal interface apparatus according to claim 1,
wherein the feature includes pixel values of the user'"'"'s face area, and wherein a distribution of the pixel values represents the user'"'"'s face direction and pupil position.
-
-
10. A multi-modal interface method, comprising the steps of:
-
continually inputting an image of a user'"'"'s entire face during the user'"'"'s operation on a display;
extracting a feature from the user'"'"'s face image, the first time that the user'"'"'s operation of a predetermined object on the display is detected;
storing the feature of the user'"'"'s face image as a dictionary pattern and the user'"'"'s operation selecting the predetermined object as an event, the dictionary pattern representing the user'"'"'s gaze for the predetermined object;
recognizing a newly inputted user'"'"'s face image by referring to the dictionary pattern;
executing the event for the predetermined object on the display, when the newly inputted user'"'"'s face image is recognized as the dictionary pattern; and
deleting the dictionary pattern corresponding to the predetermined object, when movement or deletion of the predetermined object on the display is detected. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
collecting a plurality of the user'"'"'s face images, when the user'"'"'s operation of the predetermined object is first detected.
-
-
12. The multi-modal interface method according to claim 11, further comprising the step of:
extracting the feature from the plurality of the user'"'"'s face images, when a number corresponding to the plurality of the user'"'"'s face images is equal to a predetermined number.
-
13. The multi-modal interface method according to claim 10, further comprising the step of:
recognizing an inputted user'"'"'s face image by comparing the feature of the inputted user'"'"'s face image with the dictionary pattern, when a mouse event other than focusing the predetermined object on the display is detected and if the dictionary pattern corresponding to the predetermined object is already registered.
-
14. The multi-modal interface method according to claim 13, further comprising the step of:
recognizing the inputted user'"'"'s face image by comparing the feature of the inputted user'"'"'s face image with the dictionary pattern, when the user'"'"'s operation other than the mouse event on the display is detected.
-
15. The multi-modal interface method according to claim 14, further comprising the step of:
automatically executing the event corresponding to the dictionary pattern on the display, when the feature of the inputted user'"'"'s face image coincides with the dictionary pattern.
-
16. The multi-modal interface method according to claim 10,
wherein the predetermined object is a window on the display, and wherein the event is focusing of the window. -
17. The multi-modal interface method according to claim 10, further comprising the step of:
utilizing a subspace method when storing the dictionary pattern and recognizing the newly inputted user'"'"'s face image.
-
18. The multi-modal interface method according to claim 10,
wherein the feature includes pixel values of the user'"'"'s face area, and wherein a distribution of the pixel values represents the user'"'"'s face direction and pupil position.
-
19. A computer readable memory containing computer readable instructions, comprising:
-
an instruction unit to continually input an image of a user'"'"'s entire face during the user'"'"'s operation on a display;
an instruction unit to extract a feature from the user'"'"'s face image, the first time that the user'"'"'s operation of predetermined object on the display is detected;
an instruction unit to store the feature of the user'"'"'s face image as a dictionary pattern and the user'"'"'s operation selecting the predetermined object as an event, the dictionary pattern representing the user'"'"'s gaze for the predetermined object;
an instruction unit to recognize a newly inputted user'"'"'s face image by referring to the dictionary pattern; and
an instruction unit to execute the event for the predetermined object on the display, when the newly inputted user'"'"'s face image is recognized as the dictionary pattern;
wherein said dictionary pattern corresponding to the predetermined object is deleted when movement or deletion of the predetermined object on the display is detected.
-
Specification