System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input

US 6,964,023 B2
Filed: 02/05/2001
Issued: 11/08/2005
Est. Priority Date: 02/05/2001
Status: Active Grant

First Claim

Patent Images

1. A multi-modal conversational computing system, the system comprising:

a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;

at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;

(i) receive at least a portion of the multi-modal input data from the user interface subsystem;

(ii) be capable of making a determination of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and

(iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and

memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;

wherein the intent determination comprises resolving referential ambiguity associated with the one or more user and the one or more device in the environment based on at least a portion of the received multi-modal data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are provided for performing focus detection, referential ambiguity resolution and mood classification in accordance with multi-modal input data, in varying operating conditions, in order to provide an effective conversational computing environment for one or more users.

Citations

32 Claims

1. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
  
  at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
  
  (i) receive at least a portion of the multi-modal input data from the user interface subsystem;
  
  (ii) be capable of making a determination of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more user and the one or more device in the environment based on at least a portion of the received multi-modal data.

2. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
  
  at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
  
  (i) receive at least a portion of the multi-modal input data from the user interface subsystem;
  
  (ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
  
  wherein the execution of one or more actions in the environment comprises controlling at least one of the one or more devices in the environment to at least one of effectuate the determined intent, effect the determined focus, and effect the determined mood of the one or more users.

3. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
  
  at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
  
  (i) receive at least a portion of the multi-modal input data from the user interface subsystem;
  
  (ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more user and the one or more device in the environment based on at least a portion of the received multi-modal data;
  
  wherein the execution of one or more actions in the environment comprises controlling at least one of the one or more devices in the environment to request further user input to assist in making at least one of the determinations.

4. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
  
  at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
  
  (i) receive at least a portion of the multi-modal input data from the user interface subsystem;
  
  (ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
  
  wherein the execution of the one or more actions comprises initiating a process to at least one of further complete, correct, and disambiguate what the system understands from previous input.

5. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
  
  at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
  
  (i) receive at least a portion of the multi-modal input data from the user interface subsystem;
  
  (ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
  
  wherein the at least one processor is further configured to abstract the received multi-modal input data into one or more events prior to making the one or more determinations.

6. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
  
  at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
  
  (i) receive at least a portion of the multi-modal input data from the user interface subsystem;
  
  (ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
  
  wherein the at least one processor is further configured to perform one or more recognition operations on the received multi-modal input data prior to making the one or more determinations.

7. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
  
  an input/output manager module operatively coupled to the user interface subsystem and configured to abstract the multi-modal input data into one or more events;
  
  one or more recognition engines operatively coupled to the input/output manager module and configured to perform, when necessary, one or more recognition operations on the abstracted multi-modal input data;
  
  a dialog manager module operatively coupled to the one or more recognition engines and the input/output manager module and configured to;
  
  (i) receive at least a portion of the abstracted multi-modal input data and, when necessary, the recognized multi-modal input data;
  
  (ii) make a determination of an intent of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on the determined intent;
  
  a focus and mood classification module operatively coupled to the one or more recognition engines and the input/output manager module and configured to;
  
  (i) receive at least a portion of the abstracted multi-modal input data and, when necessary, the recognized multi-modal input data;
  
  (ii) make a determination of at least one of a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on at least one of the determined focus and mood; and
  
  a context stack memory operatively coupled to the dialog manager module, the one or more recognition engines and the focus and mood classification module, which stores at least a portion of results associated with the intent, focus and mood determinations made by the dialog manager and the classification module for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data.

8. A computer-based conversational computing method, the method comprising the steps of:
- obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor;
  
  providing for a capability to make a determination of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
  
  causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data.

9. A computer-based conversational computing method, the method comprising the steps of:
- obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor;
  
  making a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
  
  causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
  
  wherein the step of causing the execution of one or more actions in the environment comprises controlling at least one of the one or more devices in the environment to at least one of effectuate the determined intent, effect the determined focus, and effect the determined mood of the one or more users.

10. A computer-based conversational computing method, the method comprising the steps of:
- obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor;
  
  making a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
  
  causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
  
  wherein the step of causing the execution of one or more actions in the environment comprises controlling at least one of the one or more devices in the environment to request further user input to assist in making at least one of the determinations.

11. A computer-based conversational computing method, the method comprising the steps of:
- obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor;
  
  making a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
  
  causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
  
  wherein the step of causing the execution of the one or more actions comprises initiating a process to at least one of further complete, correct, and disambiguate what the system understands from previous input.

12. A computer-based conversational computing method, the method comprising the steps of:
- obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor;
  
  making a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
  
  causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
  
  wherein further comprising the step of abstracting the received multi-modal input data into one or more events prior to making the one or more determinations.

13. A computer-based conversational computing method, the method comprising the steps of:
- obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor;
  
  making a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
  
  causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood;
  
  storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination; and
  
  performing one or more recognition operations on the received multi-modal input data prior to making the one or more determinations wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data.

14. An article of manufacture for performing conversational computing, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
- obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including data associated with a first modality input sensor and data associated with at least a second modality input sensor;
  
  providing for a capability to make a determination of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
  
  causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data.

15. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including at least audio-based data and image-based data, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
  
  at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
  
  (i) receive at least a portion of the multi-modal input data from the user interface subsystem;
  
  (ii) be capable of making a determination of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
- - 16. The system of claim 15, wherein the user interface subsystem comprises one or more image capturing devices, deployed in the environment, for capturing the image-based data.
  - 17. The system of claim 16, wherein the image-based data is at least one of in the visible wavelength spectrum and not in the visible wavelength spectrum.
  - 18. The system of claim 16, wherein the image-based data is at least one of video, infrared, and radio frequency-based image data.
  - 19. The system of claim 15, wherein the user interface subsystem comprises one or more audio capturing devices, deployed in the environment, for capturing the audio-based data.
  - 20. The system of claim 19, wherein the one or more audio capturing devices comprise one or more microphones.
  - 21. The system of claim 15, wherein the user interface subsystem comprises one or more graphical user interface-based input devices, deployed in the environment, for capturing graphical user interface-based data.
  - 22. The system of claim 15, wherein the user interface subsystem comprises a stylus-based input device, deployed in the environment, for capturing handwritten-based data.

23. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including at least audio-based data and image-based data, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
  
  at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
  
  (i) receive at least a portion of the multi-modal input data from the user interface subsystem;
  
  (ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
  
  wherein the execution of one or more actions in the environment comprises controlling at least one of the one or more devices in the environment to at least one of effectuate the determined intent, effect the determined focus, and effect the determined mood of the one or more users.

24. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including at least audio-based data and image-based data, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
  
  at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
  
  (i) receive at least a portion of the multi-modal input data from the user interface subsystem;
  
  (ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
  
  wherein the execution of one or more actions in the environment comprises controlling at least one of the one or more devices in the environment to request further user input to assist in making at least one of the determinations.

25. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including at least audio-based data and image-based data, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
  
  at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
  
  (i) receive at least a portion of the multi-modal input data from the user interface subsystem;
  
  (ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
  
  wherein the at least one processor is further configured to abstract the received multi-modal input data into one or more events prior to making the one or more determinations.

26. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including at least audio-based data and image-based data, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
  
  at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
  
  (i) receive at least a portion of the multi-modal input data from the user interface subsystem;
  
  (ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
  
  wherein the at least one processor is further configured to perform one or more recognition operations on the received multi-modal input data prior to making the one or more determinations.
- View Dependent Claims (27, 28, 29)
- - 27. The system of claim 26, wherein one of the one or more recognition operations comprises speech recognition.
  - 28. The system of claim 26, wherein one of the one or more recognition operations comprises speaker recognition.
  - 29. The system of claim 26, wherein one of the one or more recognition operations comprises gesture recognition.

30. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including at least audio-based data and image-based data, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
  
  at least one processor, the at least one processor being operatively coupled to the user interface subsystem and being configured to;
  
  (i) receive at least a portion of the multi-modal input data from the user interface subsystem;
  
  (ii) make a determination of at least one of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  memory, operatively coupled to the at least one processor, which stores at least a portion of results associated with the intent, focus and mood determinations made by the processor for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data;
  
  wherein the execution of the one or more actions comprises initiating a process to at least one of further complete, correct, and disambiguate what the system understands from previous input.

31. A multi-modal conversational computing system, the system comprising:
- a user interface subsystem, the user interface subsystem being configured to input multi-modal data from an environment in which the user interface subsystem is deployed, the multi-modal data including at least audio-based data and image-based data, and the environment including one or more users and one or more devices which are controllable by the multi-modal system;
  
  an input/output manager module operatively coupled to the user interface subsystem and configured to abstract the multi-modal input data into one or more events;
  
  one or more recognition engines operatively coupled to the input/output manager module and configured to perform, when necessary, one or more recognition operations on the abstracted multi-modal input data;
  
  a dialog manager module operatively coupled to the one or more recognition engines and the input/output manager module and configured to;
  
  (i) receive at least a portion of the abstracted multi-modal input data and, when necessary, the recognized multi-modal input data;
  
  (ii) make a determination of an intent of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on the determined intent;
  
  a focus and mood classification module operatively coupled to the one or more recognition engines and the input/output manager module and configured to;
  
  (i) receive at least a portion of the abstracted multi-modal input data and, when necessary, the recognized multi-modal input data;
  
  (ii) make a determination of at least one of a focus and a mood of at least one of the one or more users based on at least a portion of the received multi-modal input data; and
  
  (iii) cause execution of one or more actions to occur in the environment based on at least one of the determined focus and mood; and
  
  a context stack memory operatively coupled to the dialog manager module, the one or more recognition engines and the focus and mood classification module, which stores at least a portion of results associated with the intent, focus and mood determinations made by the dialog manager and the classification module for possible use in a subsequent determination;
  
  wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data.

32. A computer-based conversational computing method, the method comprising the steps of:
- obtaining multi-modal data from an environment including one or more users and one or more controllable devices, the multi-modal data including at least audio-based data and image-based data;
  
  providing for a capability to make a determination of an intent, a focus and a mood of at least one of the one or more users based on at least a portion of the obtained multi-modal input data;
  
  causing execution of one or more actions to occur in the environment based on at least one of the determined intent, the determined focus and the determined mood; and
  
  storing at least a portion of results associated with the intent, focus and mood determinations for possible use in a subsequent determination wherein the intent determination comprises resolving referential ambiguity associated with the one or more users and the one or more devices in the environment based on at least a portion of the received multi-modal data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Maes, Stephane Herman, Neti, Chalapathy Venkata
Primary Examiner(s)
SAX, STEVEN PAUL

Application Number

US09/776,654
Publication Number

US 20020135618A1
Time in Patent Office

1,737 Days
Field of Search

715/701, 715/702, 715/703, 715/704, 715705-712, 715/863, 715727-729, 715/811, 715/745, 715/789, 715/757, 710/65, 710/11, 710/14, 345/701, 345/702, 345/703, 345705-712, 345/863, 345/714, 345/717, 345/718, 345/722, 345/737, 345/745, 345/746, 345/747, 345/811, 345/756, 345/755, 345/748, 345/773, 345802-805, 704/8
US Class Current

715/811
CPC Class Codes

G06F 3/0481   based on specific propertie...

G06V 40/165   using facial parts and geom...

G10L 15/24   Speech recognition using no...

G10L 2015/227   of the speaker; Human-fact...

Y10S 715/966   Computer process, e.g. oper...

System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links