Multi-modal interface apparatus and method

US 6,345,111 B1
Filed: 06/13/2000
Issued: 02/05/2002
Est. Priority Date: 02/28/1997
Status: Expired due to Term

First Claim

Patent Images

1. Multi-modal interface apparatus, comprising:

an image input unit configured to continually input an image of a user'"'"'s entire face during the user'"'"'s operation on a display;

a face image processing unit configured to extract a feature from the user'"'"'s face image, the first time that the user'"'"'s operation of predetermined object on the display is detected a recognition decision unit configured to store the feature of the user'"'"'s face image as a dictionary pattern and the user'"'"'s operation selecting the predetermined object as an event, the dictionary pattern representing the user'"'"'s gaze for the predetermined object, and to recognize the user'"'"'s face image newly inputted through said image input unit by referring to the dictionary pattern; and

an object control unit configured to execute the event for the predetermined object on the display, when the user'"'"'s face image newly inputted is recognized as the dictionary pattern, wherein said recognition decision unit deletes the dictionary pattern corresponding to the predetermined object, when said object control unit detects movement or deletion of the predetermined object on the display.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In the multi-modal interface apparatus of the present invention, a gaze object detection section always detects a user'"'"'s gaze object. The user inputs at least one medium of sound information, character information, image information and operation information through a media input section. In order to effectively input and output information between the user and the apparatus, a personified image presentation section presents a personified image to the user based on the user'"'"'s gaze object. A control section controls a reception of the inputted media from the media input section based on the user'"'"'s gaze object.

125 Citations

19 Claims

1. Multi-modal interface apparatus, comprising:
- an image input unit configured to continually input an image of a user'"'"'s entire face during the user'"'"'s operation on a display;
  
  a face image processing unit configured to extract a feature from the user'"'"'s face image, the first time that the user'"'"'s operation of predetermined object on the display is detected a recognition decision unit configured to store the feature of the user'"'"'s face image as a dictionary pattern and the user'"'"'s operation selecting the predetermined object as an event, the dictionary pattern representing the user'"'"'s gaze for the predetermined object, and to recognize the user'"'"'s face image newly inputted through said image input unit by referring to the dictionary pattern; and
  
  an object control unit configured to execute the event for the predetermined object on the display, when the user'"'"'s face image newly inputted is recognized as the dictionary pattern, wherein said recognition decision unit deletes the dictionary pattern corresponding to the predetermined object, when said object control unit detects movement or deletion of the predetermined object on the display.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The multi-modal interface apparatus according to claim 1,
- 3. The multi-modal interface apparatus according to claim 2,wherein said recognition decision unit indicates to said face image processing unit to extract the feature from the plurality of the user'"'"'s face images when a number corresponding to the plurality of the user'"'"'s face images is equal to predetermined number.
- 4. The multi-modal interface apparatus according to claim 1,wherein said recognition decision unit recognizes an inputted user'"'"'s face image by comparing the feature of the inputted user'"'"'s face image with the dictionary pattern, when said object control unit detects a mouse event other than focusing the predetermined object on the display and if the dictionary pattern corresponding to the predetermined object is already registered.
- 5. The multi-modal interface apparatus according to claim 4,wherein said recognition decision unit recognizes an inputted user'"'"'s face image by comparing the feature of the inputted user'"'"'s face image with the dictionary pattern, when said object control unit detects the user'"'"'s operation other than the mouse event on the display.
- 6. The multi-modal interface apparatus according to claim 5,wherein, said object control unit automatically executes the event corresponding to the dictionary pattern on the display, when said recognition decision unit determines that the feature of the inputted user'"'"'s face image coincides with the dictionary pattern.
- 7. The multi-modal interface apparatus according to claim 1,wherein the predetermined object is a window on the display, and wherein the event is focusing of the window.
- 8. The multi-modal interface apparatus according to claim 1,wherein said recognition decision unit utilizes a subspace method in case of generation processing of the dictionary pattern and recognition processing of the newly inputted user'"'"'s face image.
- 9. The multi-modal interface apparatus according to claim 1,wherein the feature includes pixel values of the user'"'"'s face area, and wherein a distribution of the pixel values represents the user'"'"'s face direction and pupil position.

10. A multi-modal interface method, comprising the steps of:
- continually inputting an image of a user'"'"'s entire face during the user'"'"'s operation on a display;
  
  extracting a feature from the user'"'"'s face image, the first time that the user'"'"'s operation of a predetermined object on the display is detected;
  
  storing the feature of the user'"'"'s face image as a dictionary pattern and the user'"'"'s operation selecting the predetermined object as an event, the dictionary pattern representing the user'"'"'s gaze for the predetermined object;
  
  recognizing a newly inputted user'"'"'s face image by referring to the dictionary pattern;
  
  executing the event for the predetermined object on the display, when the newly inputted user'"'"'s face image is recognized as the dictionary pattern; and
  
  deleting the dictionary pattern corresponding to the predetermined object, when movement or deletion of the predetermined object on the display is detected.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The multi-modal Interface method according to claim 10, further comprising the step of:
12. The multi-modal interface method according to claim 11, further comprising the step of:
- extracting the feature from the plurality of the user'"'"'s face images, when a number corresponding to the plurality of the user'"'"'s face images is equal to a predetermined number.
13. The multi-modal interface method according to claim 10, further comprising the step of:
- recognizing an inputted user'"'"'s face image by comparing the feature of the inputted user'"'"'s face image with the dictionary pattern, when a mouse event other than focusing the predetermined object on the display is detected and if the dictionary pattern corresponding to the predetermined object is already registered.
14. The multi-modal interface method according to claim 13, further comprising the step of:
- recognizing the inputted user'"'"'s face image by comparing the feature of the inputted user'"'"'s face image with the dictionary pattern, when the user'"'"'s operation other than the mouse event on the display is detected.
15. The multi-modal interface method according to claim 14, further comprising the step of:
- automatically executing the event corresponding to the dictionary pattern on the display, when the feature of the inputted user'"'"'s face image coincides with the dictionary pattern.
16. The multi-modal interface method according to claim 10,wherein the predetermined object is a window on the display, and wherein the event is focusing of the window.
17. The multi-modal interface method according to claim 10, further comprising the step of:
- utilizing a subspace method when storing the dictionary pattern and recognizing the newly inputted user'"'"'s face image.
18. The multi-modal interface method according to claim 10,wherein the feature includes pixel values of the user'"'"'s face area, and wherein a distribution of the pixel values represents the user'"'"'s face direction and pupil position.

19. A computer readable memory containing computer readable instructions, comprising:
- an instruction unit to continually input an image of a user'"'"'s entire face during the user'"'"'s operation on a display;
  
  an instruction unit to extract a feature from the user'"'"'s face image, the first time that the user'"'"'s operation of predetermined object on the display is detected;
  
  an instruction unit to store the feature of the user'"'"'s face image as a dictionary pattern and the user'"'"'s operation selecting the predetermined object as an event, the dictionary pattern representing the user'"'"'s gaze for the predetermined object;
  
  an instruction unit to recognize a newly inputted user'"'"'s face image by referring to the dictionary pattern; and
  
  an instruction unit to execute the event for the predetermined object on the display, when the newly inputted user'"'"'s face image is recognized as the dictionary pattern;
  
  wherein said dictionary pattern corresponding to the predetermined object is deleted when movement or deletion of the predetermined object on the display is detected.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Yamaguchi, Osamu, Fukui, Kazuhiro
Primary Examiner(s)
Johns, Andrew W.
Assistant Examiner(s)
Werner, Brian P.

Application Number

US09/593,296
Time in Patent Office

602 Days
Field of Search

382/115, 382/116, 382/117, 382/118, 382/124-127, 345/358, 345/340, 345/163, 345/333, 345/334, 345/342, 345/156, 345/145, 345/146, 345/158, 345/863, 713/186, 713/200, 340/5.53, 340/5.83, 907/3, 907/6
US Class Current

382/118
CPC Class Codes

G06F 2203/0381   Multimodal input, i.e. inte...

G06F 3/011   Arrangements for interactio...

G06F 3/017   Gesture based interaction, ...

G06F 3/038   Control and interface arran...

G06V 40/16   Human faces, e.g. facial pa...

H04N 21/42201   biosensors, e.g. heat senso...

Multi-modal interface apparatus and method

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

125 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Multi-modal interface apparatus and method

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

125 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others