Method and apparatus for embodied conversational characters with multimodal input/output in an interface device
First Claim
1. An interactive user interface for allowing a user to interact with a system, comprising:
- a display device, for use in displaying to the user a visible representation of a computer generated virtual space, including an animated virtual character therein;
a plurality of input devices, for use in accepting data defining a physical space domain, said physical space domain including the physical space occupied by the user;
an input manager for use in obtaining said data from said input devices, calibrating movements detected in said physical space domain to corresponding coordinates in said virtual space, and converting said data into an input frame representing a coherent understanding of said physical space domain and the action of said user within said physical space domain;
a knowledge base, for storing physical space domain data, including action inputs by the user within and in relation to said physical space domain, and for further storing actions by the virtual character within the virtual space;
a discourse model that contains state information about a dialogue with said user;
an understanding module for use in receiving inputs from the input manager, accessing knowledge about the domain inferred from the current discourse, and fusing all input modalities into a coherent understanding of the users environment;
a reactive component for receiving updates from the input manager and understanding module, and using information about the domain and information inferred from the current discourse to determine a current action for said virtual character to perform;
a response planner for use in formulating plans or sequences of actions;
a generation module for use in realizing a complex action request from the reactive component by producing one or more coordinated primitive actions, and sending the actions to an action scheduler for performance; and
, an action scheduler for taking multiple action requests from the reaction and generation modules and performing out said requests.
7 Assignments
0 Petitions
Accused Products
Abstract
Deliberative and reactive processing are combined to process multi-modal inputs and direct movements and speech of a synthetic character that operates as an interface between a user and a piece of equipment. The synthetic character is constructed as an ally, working with and helping the user learn and operate the equipment. The synthetic character interacts with both a virtual space where the character is displayed, and a physical space (domain) that includes the user. Real-time reactive processing provides lifelike and engaging responses to user queries and conversations. Deliberative processing provides responses to inputs that require more processing time (deep linguistic processing, for example). Knowledge bases are maintained for both dynamic (discourse model, for example) and static (e.g., knowledge about the domain or discourse plans) information types. A rule based system is utilized against the knowledge bases and selected inputs from said user to determine input meanings, and follow a predetermined discourse plan.
707 Citations
32 Claims
-
1. An interactive user interface for allowing a user to interact with a system, comprising:
-
a display device, for use in displaying to the user a visible representation of a computer generated virtual space, including an animated virtual character therein;
a plurality of input devices, for use in accepting data defining a physical space domain, said physical space domain including the physical space occupied by the user;
an input manager for use in obtaining said data from said input devices, calibrating movements detected in said physical space domain to corresponding coordinates in said virtual space, and converting said data into an input frame representing a coherent understanding of said physical space domain and the action of said user within said physical space domain;
a knowledge base, for storing physical space domain data, including action inputs by the user within and in relation to said physical space domain, and for further storing actions by the virtual character within the virtual space;
a discourse model that contains state information about a dialogue with said user;
an understanding module for use in receiving inputs from the input manager, accessing knowledge about the domain inferred from the current discourse, and fusing all input modalities into a coherent understanding of the users environment;
a reactive component for receiving updates from the input manager and understanding module, and using information about the domain and information inferred from the current discourse to determine a current action for said virtual character to perform;
a response planner for use in formulating plans or sequences of actions;
a generation module for use in realizing a complex action request from the reactive component by producing one or more coordinated primitive actions, and sending the actions to an action scheduler for performance; and
,an action scheduler for taking multiple action requests from the reaction and generation modules and performing out said requests. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 31, 32)
a microphone configured to capture speech of said user;
a video camera configured to capture at least one of gestures, body position, and gaze direction of said user;
a compass attached to said user and configured to capture a direction of said user; and
,wherein said multiple input channels include, a microphone input channel connected to said microphone, a video input channel connected to said video camera, and a direction input channel connected to said compass.
-
-
5. The interactive user interface according to claim 2, wherein said input frame synchronizes data from each of said user inputs with a virtual environment containing said character.
-
6. The interactive user interface according to claim 2, wherein said multiple input channels include at least one input channel connected to and configured to transmit data from at least one of a motion detector, cyberglove, cybergloves, and body tracking hardware to said input manager.
-
7. The interactive user interface according to claim 1, wherein:
said interaction by said synthetic character with said physical space includes at least one of pointing and gesturing by said synthetic user interaction character toward an object in said physical space.
-
8. The interactive user interface according to claim 1, wherein:
an operation of said display includes an interaction of objects in said physical space with said virtual environment that may include at least one of pointing and gesturing by an object or person in said physical space toward an object in said virtual environment.
-
31. The interface of claim 1, wherein said understanding module, said response planner, and said generation module are included within a deliberative component.
-
32. The interface of claim 31, wherein said reactive component and said deliberative component are included within a response processor.
-
9. A method of allowing a user to interact with a system, comprising the steps of:
-
displaying to the user, using a display device, a visible representation of a computer generated virtual environment, including an animated virtual character therein;
accepting, using a multi-modal input, data defining a physical space domain distinct from said virtual environment, said physical space domain including the physical space occupied by the user and the visible representation of said virtual environment;
mapping, in a knowledge base, physical space domain data, and actions by the user within the physical space domain, to an interaction with the virtual environment, and for mapping actions by the virtual character within the virtual environment to said visible representation, such that when displayed on the display device the actions of the virtual character are perceived by the user as interacting with the physical space occupied by the user; and
,generating, in response to a user input to the system, an action to be performed by said virtual character, using both deliberative processing and reactive processing, wherein said deliberative processing includes the substeps of fusing portions of the user inputs into a coherent understanding of the physical environment and actions of the user, updating a discourse model reflecting current and past inputs retrieved from the user, and, outputting to the reactive processing a frame describing the user inputs, and wherein said reactive processing includes the substeps of receiving updates of user inputs and frames concerning the user inputs from said deliberative processing, accessing data from a knowledge base about said domain and about a current discourse between the user, physical environment, and virtual space, and, determining a current action for the virtual space. - View Dependent Claims (10, 11, 12, 13)
identifying a user gesture captured by said inputs; and
,determining a meaning of a user speech captured at least one of contemporaneously and in close time proximity of said user gesture.
-
-
12. The method according to claim 11, wherein said step of generating an action further comprises the steps of:
-
scheduling an appropriate system response based on said understanding; and
,updating a discourse model based on the retrieved inputs.
-
-
13. The method according to claim 12, wherein said step of updating comprises:
-
maintaining a tagged history of speech of said user, including the step of;
identifying said discourse model with at least one of a topic under conversation between said user and said virtual environment and other information received via either of said deliberative and reactive processing.
-
-
14. A user interface for a computer system that allows a user to communicate with the system in an interactive manner, comprising:
-
a display device, for use in displaying to the user a visible representation of a computer generated virtual environment, including an animated virtual character therein;
a multi-modal input, for use in accepting data defining a physical space domain distinct from said virtual environment, said physical space domain including the physical space occupied by the user and the visible representation of said virtual environment;
a knowledge base, for use in mapping physical space domain data, and actions by the user within the physical space domain, to an interaction with the virtual environment, and for mapping actions by the virtual character within the virtual environment to said visible representation, such that when displayed on the display device the actions of the virtual character are perceived by the user as interacting with the physical space occupied by the user; and
,a response processor, that integrates deliberative and reactive processing performed on user inputs received by said multi-modal input, wherein said response processor includes a scheduler for scheduling an appropriate system response based on an understanding of said physical space domain and actions of the user, a discourse model based on the retrieved inputs, a tagged history of speech of said user, and a response planner for identifying said discourse model with at least one of a topic under conversation between said user and said virtual environment, wherein said response processor further includes a deliberative component and a reactive component, wherein said deliberative component is configured to fuse portions of the user inputs into a coherent understanding of the physical environment and actions of the user, updating a discourse model reflecting current and past inputs retrieved from the user, and outputting to the reactive component a frame describing the user inputs, and, wherein said reactive component is configured to receive updates of user inputs and frames concerning the user inputs from said deliberative processing, accessing data from a knowledge base about said domain and about a current discourse between the user, physical environment, and virtual space, and determining a current action for the virtual space. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
said multi-modal input comprises multiple input channels each configured to capture at least one of said user inputs, and an input manager configured to integrate the user inputs from said multiple input channels and provide said integrated user inputs to said deliberative processing component.
-
-
16. The interface according to claim 15, wherein said multiple input channels include at least one of speech, body position, gaze direction, gesture recognition, keyboard, mouse, user ID, and motion detection channels.
-
17. The interface according to claim 16, wherein each of said multi-modal input, said deliberative processing, and said reactive processing operate in parallel.
-
18. The interface according to claim 17, wherein said reactive processing provides reactive output to said output mechanism immediately upon receiving a predetermined input from said input device.
-
19. The interface according to claim 16, wherein said reactive processing provides reactive output to said output mechanism upon receiving a predetermined input from said deliberative processing.
-
20. The interface according to claim 19, wherein:
-
said predetermined input is speech by a user; and
,said reactive output is a command to initiate a conversational gaze on said synthetic character.
-
-
21. The interface according to claim 17, wherein said deliberative processing component comprises:
-
an understanding module configured to fuse inputs from said multi-modal input and determine a coherent understanding of both a physical environment of the user and what the user is doing;
a response planner configured to plan a sequence of actions based on said coherent understanding; and
,a response generation module configured to implement each real time response and complex action formulated by said reactive component.
-
-
22. The interface according to claim 17, wherein:
-
said multiple input channels include at least one channel configured to transmit a speech stream input by said user, and at least one channel configured to transmit any combination of at least one additional user input, including, body position, gaze direction, gesture recognition, keyboard, mouse, user ID, and motion detection; and
said deliberative processing component includes a relation module configured to relate at least one of the additional user inputs to said speech stream.
-
-
23. The interface according to claim 17, further comprising an action scheduler that controls said synthetic character according to said real time responses and said complex actions in a manner that interacts with said user.
-
24. The interface according to claim 21, wherein said response generation module initiates parallel operations of at least one of said responses by said synthetic character.
-
25. The interface according to claim 24, wherein the initiated parallel operations include at least one of speech output by said synthetic character, an action performed by said synthetic character, and task to be performed on said computational system.
-
26. The interface according to claim 24, further comprising an action scheduler that controls an animation that displays said interface in a manner that interacts with said user and performs actions requested via said user inputs, wherein:
-
said user inputs include at least one of speech, gestures, orientation, body position, gaze direction, gesture recognition, keyboard, mouse, user ID, and motion detection;
said deliberative processing utilizes at least one of said user inputs to implement a complex action; and
,said reactive processing formulates a real time response to predetermined of said user inputs.
-
-
27. The interface according to claim 24, wherein said deliberative processing component includes, as part of said deliberative processing,
a multi-modal understanding component configured to generate an understanding of speech and associated non-verbal of said user inputs, a response planning component configured to plan a sequence of communicative actions based on said understanding, and, a multi-modal language generation component configured to generate sentences and associated gestures applicable to said sequence of communicative actions.
-
28. A method for allowing a user to interface with a computer system, comprising:
-
displaying to the user, using a display device, a visible representation of a computer generated virtual environment, including an animated virtual character therein;
accepting, using a multi-modal input, data defining a physical space domain distinct from said virtual environment, said physical space domain including the physical space occupied by the user and the visible representation of said virtual environment;
mapping in a knowledge base, physical space domain data, and actions by the user within the physical space domain, to an interaction with the virtual environment, and for mapping actions by the virtual character within the virtual environment to said visible representation, such that when displayed on the display device the actions of the virtual character are perceived by the user as interacting with the physical space occupied by the user; and
,generating, using a response processor, in response to a user input to the system, an action to be performed by said virtual character, including the substeps of interpreting said user input data and said physical space information data and generating a user input context associated with said input, determining a system response in response to said user input and said user input context, wherein said step of determining a system response includes the steps of formulating real time responses to a predetermined set of said inputs by reactive processing; and
, formulating complex actions based on said inputs by deliberative processing, andwherein said deliberative processing includes the steps of determining an understanding of said physical space domain and actions of the user, scheduling an appropriate response based on said understanding, updating a discourse model based on the retrieved inputs, and communicating said understanding to said reactive processing;
wherein said step of determining an understanding, comprises the steps of;
combining selected of said inputs into a frame providing a coherent understanding of said physical space domain and actions of said user, including the steps of, accessing a static knowledge base about a physical space domain with reference to said selected inputs;
accessing at least one of a dynamic knowledge base and a discourse model to infer an understanding of a current discourse between said user and said virtual environment;
accessing a discourse model to infer an understanding of said current discourse; and
,combining information from at least one of said static knowledge base, said dynamic knowledge base, and said discourse model to produce said frame. - View Dependent Claims (29, 30)
maintaining information identifying where said user is located within said physical space domain;
maintaining information identifying at least one of a position and orientation of at least one of a character and object displayed in said virtual environment; and
,tracking placement in a plan being implemented by a programming of said virtual environment.
-
-
30. The method according to claim 29, wherein said reactive processing includes the steps of:
-
receiving asynchronous updates of selected of said user inputs and understanding frames concerning said user inputs from said deliberative processing;
accessing data from a static knowledge base about said physical space domain and a dynamic knowledge base having inferred information about a current discourse between said user, said physical space domain, and said virtual environment; and
,determining a current action for said virtual environment based on said asynchronous updates, understanding frames, and said data.
-
Specification