System and method for a cooperative conversational voice user interface
DCFirst Claim
1. A computer-implemented method of facilitating natural language system responses using short-term knowledge generated based on one or more prior multi-modal device interactions, the method being implemented by a computer system that includes one or more physical processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:
- receiving, by the computer system during a first conversation, a first voice input via a first input device, the first voice input comprising a first natural language utterance;
receiving, by the computer system, a second voice input comprising the first natural language utterance via a second input device;
comparing, by the computer system, the first voice input with the second voice input;
filtering, by the computer system, sound from the first voice input and the second voice input based on the comparison;
obtaining, by the computer system during the first conversation, a user interface state related to one or more non-voice inputs associated with the first voice input, the one or more non-voice inputs comprising at least a first non-voice input;
generating, by the computer system, the short-term knowledge based on at least the first voice input and the first non-voice input;
determining, by the computer system, based on the short-term knowledge, a first context for the first natural language utterance;
determining, by the computer system, based on the first context, an interpretation of the first natural language utterance; and
generating, by the computer system, based on the interpretation of the first natural language utterance, a first response to the first natural language utterance.
9 Assignments
Litigations
0 Petitions
Accused Products
Abstract
A cooperative conversational voice user interface is provided. The cooperative conversational voice user interface may build upon short-term and long-term shared knowledge to generate one or more explicit and/or implicit hypotheses about an intent of a user utterance. The hypotheses may be ranked based on varying degrees of certainty, and an adaptive response may be generated for the user. Responses may be worded based on the degrees of certainty and to frame an appropriate domain for a subsequent utterance. In one implementation, misrecognitions may be tolerated, and conversational course may be corrected based on subsequent utterances and/or responses.
-
Citations
28 Claims
-
1. A computer-implemented method of facilitating natural language system responses using short-term knowledge generated based on one or more prior multi-modal device interactions, the method being implemented by a computer system that includes one or more physical processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:
-
receiving, by the computer system during a first conversation, a first voice input via a first input device, the first voice input comprising a first natural language utterance; receiving, by the computer system, a second voice input comprising the first natural language utterance via a second input device; comparing, by the computer system, the first voice input with the second voice input; filtering, by the computer system, sound from the first voice input and the second voice input based on the comparison; obtaining, by the computer system during the first conversation, a user interface state related to one or more non-voice inputs associated with the first voice input, the one or more non-voice inputs comprising at least a first non-voice input; generating, by the computer system, the short-term knowledge based on at least the first voice input and the first non-voice input; determining, by the computer system, based on the short-term knowledge, a first context for the first natural language utterance; determining, by the computer system, based on the first context, an interpretation of the first natural language utterance; and generating, by the computer system, based on the interpretation of the first natural language utterance, a first response to the first natural language utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system for facilitating natural language system responses via short-term knowledge generated based on one or more prior multi-modal device interactions, the system comprising:
one or more physical processors programmed with one or more computer program instructions which, when executed, cause the one or more physical processors to; receive, during a first conversation, a first voice input via a first input device, the first voice input comprising a first natural language utterance; receive a second voice input comprising the first natural language utterance via a second input device; compare the first voice input with the second voice input; filter sound from the first voice input and the second voice input based on the comparison; obtain, during the first conversation, a user interface state related to one or more non-voice inputs associated with the first voice input, the one or more non-voice inputs comprising at least a first non-voice input; generate the short-term knowledge based on at least the first voice input and the first non-voice input; determine, based on the short-term knowledge, a first context for the first natural language utterance; determine, based on the first context, an interpretation of the first natural language utterance; and
generate, based on the interpretation of the first natural language utterance, a first response to the first natural language utterance.- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
Specification