System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input
First Claim
Patent Images
1. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
(i) receive at least a portion of the multi-modal input data from the user interface subsystem;
(ii) be capable of making a determination of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more user and the one or more device in the environment based on at least a portion of the received multi-modal data.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are provided for performing focus detection, referential ambiguity resolution and mood classification in accordance with multi-modal input data, in varying operating conditions, in order to provide an effective conversational computing environment for one or more users.
-
Citations
32 Claims
-
1. A multi-modal conversational computing system, the system comprising:
-
a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
(i) receive at least a portion of the multi-modal input data from the user interface subsystem;
(ii) be capable of making a determination of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more user and the one or more device in the environment based on at least a portion of the received multi-modal data.
-
-
2. A multi-modal conversational computing system, the system comprising:
-
a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
(i) receive at least a portion of the multi-modal input data from the user interface subsystem;
(ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
wherein the execution of one or more actions in the environment comprises controlling at least one of the one or more devices in the environment to at least one of effectuate the determined intent, effect the determined focus, and effect the determined mood of the one or more users.
-
-
3. A multi-modal conversational computing system, the system comprising:
-
a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
(i) receive at least a portion of the multi-modal input data from the user interface subsystem;
(ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more user and the one or more device in the environment based on at least a portion of the received multi-modal data;
wherein the execution of one or more actions in the environment comprises controlling at least one of the one or more devices in the environment to request further user input to assist in making at least one of the determinations.
-
-
4. A multi-modal conversational computing system, the system comprising:
-
a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
(i) receive at least a portion of the multi-modal input data from the user interface subsystem;
(ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
wherein the execution of the one or more actions comprises initiating a process to at least one of further complete, correct, and disambiguate what the system understands from previous input.
-
-
5. A multi-modal conversational computing system, the system comprising:
-
a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
(i) receive at least a portion of the multi-modal input data from the user interface subsystem;
(ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
wherein the at least one processor is further configured to abstract the received multi-modal input data into one or more events prior to making the one or more determinations.
-
-
6. A multi-modal conversational computing system, the system comprising:
-
a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
(i) receive at least a portion of the multi-modal input data from the user interface subsystem;
(ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
wherein the at least one processor is further configured to perform one or more recognition operations on the received multi-modal input data prior to making the one or more determinations.
-
-
7. A multi-modal conversational computing system, the system comprising:
-
a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
an input/output manager module operatively coupled to the user interface subsystem and configured to abstract the multi-modal input data into one or more events;
one or more recognition engines operatively coupled to the input/output manager module and configured to perform, when necessary, one or more recognition operations on the abstracted multi-modal input data;
a dialog manager module operatively coupled to the one or more recognition engines and the input/output manager module and configured to;
(i) receive at least a portion of the abstracted multi-modal input data and, when necessary, the recognized multi-modal input data;
(ii) make a determination of an intent of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on the determined intent;
a focus and mood classification module operatively coupled to the one or more recognition engines and the input/output manager module and configured to;
(i) receive at least a portion of the abstracted multi-modal input data and, when necessary, the recognized multi-modal input data;
(ii) make a determination of at least one of a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined focus and mood; and
a context stack memory operatively coupled to the dialog manager module, the one or more recognition engines and the focus and mood classification module, which stores at least a portion of results associated with the intent, focus and mood determinations made by the dialog manager and the classification module for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data.
-
-
8. A computer-based conversational computing method, the method comprising the steps of:
-
obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor;
providing for a capability to make a determination of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data.
-
-
9. A computer-based conversational computing method, the method comprising the steps of:
-
obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor;
making a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
wherein the step of causing the execution of one or more actions in the environment comprises controlling at least one of the one or more devices in the environment to at least one of effectuate the determined intent, effect the determined focus, and effect the determined mood of the one or more users.
-
-
10. A computer-based conversational computing method, the method comprising the steps of:
-
obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor;
making a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
wherein the step of causing the execution of one or more actions in the environment comprises controlling at least one of the one or more devices in the environment to request further user input to assist in making at least one of the determinations.
-
-
11. A computer-based conversational computing method, the method comprising the steps of:
-
obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor;
making a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
wherein the step of causing the execution of the one or more actions comprises initiating a process to at least one of further complete, correct, and disambiguate what the system understands from previous input.
-
-
12. A computer-based conversational computing method, the method comprising the steps of:
-
obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor;
making a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
wherein further comprising the step of abstracting the received multi-modal input data into one or more events prior to making the one or more determinations.
-
-
13. A computer-based conversational computing method, the method comprising the steps of:
-
obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor;
making a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood;
storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination; and
performing one or more recognition operations on the received multi-modal input data prior to making the one or more determinations wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data.
-
-
14. An article of manufacture for performing conversational computing, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
-
obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor;
providing for a capability to make a determination of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data.
-
-
15. A multi-modal conversational computing system, the system comprising:
-
a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including at least audio-based data and image-based data, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
(i) receive at least a portion of the multi-modal input data from the user interface subsystem;
(ii) be capable of making a determination of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
-
23. A multi-modal conversational computing system, the system comprising:
-
a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including at least audio-based data and image-based data, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
(i) receive at least a portion of the multi-modal input data from the user interface subsystem;
(ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
wherein the execution of one or more actions in the environment comprises controlling at least one of the one or more devices in the environment to at least one of effectuate the determined intent, effect the determined focus, and effect the determined mood of the one or more users.
-
-
24. A multi-modal conversational computing system, the system comprising:
-
a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including at least audio-based data and image-based data, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
(i) receive at least a portion of the multi-modal input data from the user interface subsystem;
(ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
wherein the execution of one or more actions in the environment comprises controlling at least one of the one or more devices in the environment to request further user input to assist in making at least one of the determinations.
-
-
25. A multi-modal conversational computing system, the system comprising:
-
a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including at least audio-based data and image-based data, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
(i) receive at least a portion of the multi-modal input data from the user interface subsystem;
(ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
wherein the at least one processor is further configured to abstract the received multi-modal input data into one or more events prior to making the one or more determinations.
-
-
26. A multi-modal conversational computing system, the system comprising:
-
a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including at least audio-based data and image-based data, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
(i) receive at least a portion of the multi-modal input data from the user interface subsystem;
(ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
wherein the at least one processor is further configured to perform one or more recognition operations on the received multi-modal input data prior to making the one or more determinations. - View Dependent Claims (27, 28, 29)
-
-
30. A multi-modal conversational computing system, the system comprising:
-
a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including at least audio-based data and image-based data, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
(i) receive at least a portion of the multi-modal input data from the user interface subsystem;
(ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
wherein the execution of the one or more actions comprises initiating a process to at least one of further complete, correct, and disambiguate what the system understands from previous input.
-
-
31. A multi-modal conversational computing system, the system comprising:
-
a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including at least audio-based data and image-based data, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
an input/output manager module operatively coupled to the user interface subsystem and configured to abstract the multi-modal input data into one or more events;
one or more recognition engines operatively coupled to the input/output manager module and configured to perform, when necessary, one or more recognition operations on the abstracted multi-modal input data;
a dialog manager module operatively coupled to the one or more recognition engines and the input/output manager module and configured to;
(i) receive at least a portion of the abstracted multi-modal input data and, when necessary, the recognized multi-modal input data;
(ii) make a determination of an intent of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on the determined intent;
a focus and mood classification module operatively coupled to the one or more recognition engines and the input/output manager module and configured to;
(i) receive at least a portion of the abstracted multi-modal input data and, when necessary, the recognized multi-modal input data;
(ii) make a determination of at least one of a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined focus and mood; and
a context stack memory operatively coupled to the dialog manager module, the one or more recognition engines and the focus and mood classification module, which stores at least a portion of results associated with the intent, focus and mood determinations made by the dialog manager and the classification module for possible use in a subsequent determination;
wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data.
-
-
32. A computer-based conversational computing method, the method comprising the steps of:
-
obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including at least audio-based data and image-based data;
providing for a capability to make a determination of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data.
-
Specification