Multi-modal interface apparatus and method
First Claim
Patent Images
1. Multi-modal interface apparatus, comprising:
- gaze object detection means for detecting a user'"'"'s gaze object;
media input means for inputting at least one medium of sound information, character information, image information and operation information from the user;
personified image presentation means for displaying an agent image to the user, the agent image expressing predetermined gesture and looks as non-language message, the predetermined gesture and looks being different based on a kind of the user'"'"'s gaze object;
information output means for outputting at least one medium of sound information, character information and image information to the user;
interpretation rule memory means for previously storing a plurality of rules consisting of a present status information, a gaze object information, a kind of input information or output information, an interpretation result information, wherein the present status information represents a status of the apparatus, the gaze object information represents the kind of the user'"'"'s gaze object, the kind of input information or output information represents signal status of said media input means or said information output means, the interpretation result information represents a next event of the apparatus;
control rule memory means for previously storing a plurality of rules consisting of the present status information, an event condition information, an action information, a next status information, wherein the event condition information corresponds to the interpretation result information of said interpretation rule memory means, the action information represents processing list to execute, the next status information represents a next desired status of the apparatus; and
control means for searching the interpretation result information from the interpretation rule memory means based on the present status of the apparatus as the present status information, the user'"'"'s gaze object as the gaze object information, the signal status of said media input means or said information output means as the kind of input information or output information when the user'"'"'s gaze object is detected by said gaze object detection means, for searching the action information and the next status information from the control rule memory means based on the present status of the apparatus as the present status information and the interpretation result information as the event condition information if the interpretation result information is retrieved from said interpretation rule memory means, for controlling said media input means to execute at least one of start, end, interruption, restart of input processing based on the action information, for controlling said personified image presentation means to display the agent image whose gesture and looks represent a reception or a completion of the user'"'"'s input in synchronization with the input processing of said media input means, and for controlling a least one of start, end, interpretation, restart of the at least one medium outputted by said information output means based on the action information.
1 Assignment
0 Petitions
Accused Products
Abstract
In the multi-modal interface apparatus of the present invention, a gaze object detection section always detects a user'"'"'s gaze object. The user inputs at least one medium of sound information, character information, image information and operation information through a media input section. In order to effectively input and output information between the user and the apparatus, a personified image presentation section presents a personified image to the user based on the user'"'"'s gaze object. A control section controls a reception of the inputted media from the media input section based on the user'"'"'s gaze object.
-
Citations
29 Claims
-
1. Multi-modal interface apparatus, comprising:
-
gaze object detection means for detecting a user'"'"'s gaze object; media input means for inputting at least one medium of sound information, character information, image information and operation information from the user; personified image presentation means for displaying an agent image to the user, the agent image expressing predetermined gesture and looks as non-language message, the predetermined gesture and looks being different based on a kind of the user'"'"'s gaze object; information output means for outputting at least one medium of sound information, character information and image information to the user; interpretation rule memory means for previously storing a plurality of rules consisting of a present status information, a gaze object information, a kind of input information or output information, an interpretation result information, wherein the present status information represents a status of the apparatus, the gaze object information represents the kind of the user'"'"'s gaze object, the kind of input information or output information represents signal status of said media input means or said information output means, the interpretation result information represents a next event of the apparatus; control rule memory means for previously storing a plurality of rules consisting of the present status information, an event condition information, an action information, a next status information, wherein the event condition information corresponds to the interpretation result information of said interpretation rule memory means, the action information represents processing list to execute, the next status information represents a next desired status of the apparatus; and control means for searching the interpretation result information from the interpretation rule memory means based on the present status of the apparatus as the present status information, the user'"'"'s gaze object as the gaze object information, the signal status of said media input means or said information output means as the kind of input information or output information when the user'"'"'s gaze object is detected by said gaze object detection means, for searching the action information and the next status information from the control rule memory means based on the present status of the apparatus as the present status information and the interpretation result information as the event condition information if the interpretation result information is retrieved from said interpretation rule memory means, for controlling said media input means to execute at least one of start, end, interruption, restart of input processing based on the action information, for controlling said personified image presentation means to display the agent image whose gesture and looks represent a reception or a completion of the user'"'"'s input in synchronization with the input processing of said media input means, and for controlling a least one of start, end, interpretation, restart of the at least one medium outputted by said information output means based on the action information. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A multi-modal interface method, comprising the steps of:
-
detecting a user'"'"'s gaze object; inputting at least one medium of sound information, character information, image information and operation information from the user; displaying an agent image to the user, the agent image expressing predetermined gesture and looks as non-language message, the predetermined gesture and the looks being different based on a kind of the user'"'"'s gaze object; outputting at least one medium of sound information, character information and image information to the user; storing a plurality of rules consisting of a present status information, a gaze object information, a kind of input information or output information, an interpretation result information in an interpretation rule memory, wherein the present status information represents a status of the apparatus, the gaze object information represents the kind of the user'"'"'s gaze object, the kind of input information or output information represents signal status of a media input means or an information output means, the interpretation result information represents a next event of the apparatus; storing a plurality of rules consisting of the present status information, an event condition information, an action information, a next status information in a control rule memory, wherein the event condition information corresponds to the interpretation result information of the interpretation rule memory, the action information represents processing list to execute, the next status information represents a next desired status of the apparatus; searching the interpretation result information from said interpretation rule memory based on the present status of the apparatus as the present status information, the user'"'"'s gaze object as the gaze object information, the signal status of said media input means or said information output means as the kind of input information or output information, when the user'"'"'s gaze object is detected by said detecting step; searching the action information and the next status information from said control rule memory based on the present status of the apparatus as the present status information and the interpretation result information as the event condition information, if the interpretation result information is retrieved from said interpretation rule memory; controlling the input processing to execute at least one of start, end, interpretation, restart based on the action information; controlling the gesture and looks of the agent image to represent a reception or a completion of the user'"'"'s input in synchronization with the input processing; and controlling at least one of start, end, interruption, restart of the at least one medium outputted by said information output means based on the action information.
-
-
8. A computer readable memory containing computer readable instructions, comprising:
-
instruction means for causing a computer to detect a user'"'"'s gaze object; instruction means for causing a computer to input at least one medium of sound information, character information, image information and operation information from the user; instruction means for causing a computer to display an agent image to the user, the agent image expressing predetermined gesture and looks as non-language message, the predetermined gesture and the looks being different based on a kind of the user'"'"'s gaze object; instruction means for causing a computer to output at least one medium of sound information, character information and image information to the user; instruction means for causing a computer to store a plurality of rules consisting of a present status information, a gaze object information, a kind of input information or output information, an interpretation result information in an interpretation rule memory, wherein the present status information represents a status of the apparatus, the gaze object information represents the kind of the user'"'"'s gaze object, the kind of input information or output information represents signal status of a media input means or an information output means, the interpretation result information represents a next event of the apparatus; instruction means for causing a computer to store a plurality of rules consisting of the present status information, an event condition information, an action information, a next status information in a control rule memory, wherein the event condition information corresponds to the interpretation result information of the interpretation rule memory, the action information represents processing list to execute, the next status information represents a next desired status of the apparatus; instruction means for causing a computer to search the interpretation result information from said interpretation rule memory based on the present status of the apparatus as the present status information, the user'"'"'s gaze object as the gaze object information, the signal status of said media input means or said information output means as the kind of input information or output information, when the user'"'"'s gaze object is detected by said detection means; instruction means for causing a computer to search the action information and the next status information from said control rule memory based on the present status of the apparatus as the present status information and the interpretation result information as the event condition information, if the interpretation result information is retrieved from said interpretation rule memory; instruction means for causing a computer to control the input processing to execute at least one of start, end, interruption, restart based on the action information; instruction means for causing a computer to control the gesture and looks of the agent image to represent a reception or a completion of the user'"'"'s input in synchronization with the input processing; and instruction means for causing a computer to control at least one of start, end, interruption, restart of the at least one medium outputted by said information output means based on the action information.
-
-
9. Multi-modal interface apparatus, comprising:
-
image input means for inputting a user'"'"'s image including face and hand; recognition means for extracting the face and hand from the user'"'"'s image, and for recognizing a face position as the user'"'"'s view position, a finger end position as the user'"'"'s gesture part, and a reference object pointed by the finger end as the user'"'"'s gesture input in a world coordinate space; a location information memory means for storing a representative position and a direction by unit of a label information in the world coordinate space, wherein the label information is one of a presentation position of a personified image, the face, the finger end and the reference object, a plurality of the presentation positions being predetermined, the face, the finger end and the object being recognized by said recognition means; control means for deciding a visual relation between the presentation position and the user'"'"'s recognized position by referring to said location information memory means, wherein the user'"'"'s recognized position includes the face position and the finger end position, and for generating the personified image expressing at least one of looks and action based on the visual relation; and personified image presentation means for presenting the personified image at the presentation position to the user as a feedback information of the gesture input; wherein said control means decides whether the user'"'"'s gesture part is watched from a current presentation position of the personified image and whether the personified image is watched from the user'"'"'s view position, and generates the personified image which gazes the user'"'"'s gesture part if the user'"'"'s gesture part is watched from the current presentation position of the personified image and the personified image is watched from the user'"'"'s view position. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A multi-modal interface method, comprising the steps of:
-
inputting a user'"'"'s image including face and hand; extracting the face and hand from the user'"'"'s image; recognizing a face position as the user'"'"'s view position, a finger end position as the user'"'"'s gesture part, and a reference object pointed by the finger end as the user'"'"'s gesture input in a world coordinate space; storing a representative position and a direction as location information by unit of a label information in the world coordinate space, wherein the label information is one of a presentation position of a personified image, the face, the finger end and the reference object, a plurality of the presentation positions being predetermined, the face, the finger end and the reference object being recognized by said recognition step; deciding a visual relation between the presentation position and the user'"'"'s recognized position by referring to the location information, wherein the user'"'"'s recognized position includes the face position and the finger end position; generating the personified image expressing at least one of looks and action based on the visual relation; and presenting the personified image at the presentation position to the user as a feedback information of the gesture input; wherein said deciding step includes the step of; deciding whether the user'"'"'s gesture part is watched from a current presentation position of the personified image and whether the personified image is watched from the user'"'"'s view position; and wherein said generating step includes the step of; generating the personified image which gazes the user'"'"'s gesture part if the user'"'"'s gesture part is watched from the current presentation position of the personified image and the personified image is watched from the user'"'"'s view position. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
-
23. A computer readable memory containing computer readable instructions, comprising:
-
instruction means for causing a computer to input a user'"'"'s image including face and hand; instruction means for causing a computer to extract the face and hand from the user'"'"'s image; instruction means for causing a computer to recognize a face position as the user'"'"'s view position, a finger end position as the user'"'"'s gesture part, and a reference object pointed by the finger end as the user'"'"'s gesture input in a world coordinate space; instruction means for causing a computer to store a representative position and a direction as location information by unit of a label information in the world coordinate space, wherein the label information is one of a presentation position of a personified image, the face, the finger end and the reference object, a plurality of the presentation positions being predetermined, the face, the finger end and the reference object being recognized by said recognition means; instruction means for causing a computer to decide a visual relation between the presentation position and the user'"'"'s recognized position by referring to the location information, wherein said user'"'"'s recognized position includes the face position and the finger end position; instruction means for causing a computer to generate the personified image expressing at least one of looks and action based on the visual relation; and instruction means for causing a computer to present the personified image at the presentation position to the user as a feedback information of the gesture input; wherein said instruction means for causing a computer to decide further comprises instruction means for causing a computer to decide whether the user'"'"'s gesture part is watched from a current presentation position of the personified image and whether the personified image is watched from the user'"'"'s view position, and wherein said instruction means for causing a computer to generate further comprises instruction means for causing a computer to generate the personified image which gazes the user'"'"'s gesture part if the user'"'"'s gesture part is watched from the current presentation position of the personified image and the personified image is watched from the user'"'"'s view position. - View Dependent Claims (24, 25, 26, 27, 28, 29)
-
Specification