System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
First Claim
1. A method of determining an intent prediction for a natural language utterance, prior to a system detection of an end of the natural language utterance, based on a portion of the natural language utterance and statistical information that correlates requests that are linked together in that one type of request statistically follows another type of request, the method being implemented on a computer system that includes one or more physical processors executing computer program instructions which, when executed by the one or more physical processors, perform the method, the method comprising:
- receiving, by the computer system, a first natural language utterance of a user;
determining, by the computer system, a first type of user request based on the first natural language utterance;
identifying, by the computer system, at least a second type of user request based on the first type of user request and the statistical information that indicates that the first type of request is made via one or more first spoken words followed by the second type of request, wherein the statistical information indicates that users other than the user have made the second type of request after the first type of request, and wherein identifying the second type of user request comprises determining that the users other than the user made the second type of request after having made the first type of request;
receiving, by the computer system, a first portion of a second natural language utterance of the user;
performing, by the computer system, speech recognition on the first portion of the second natural language utterance to recognize one or more words of the first portion of the second natural language utterance;
receiving, by the computer system, visual inputs provided by the user, wherein the visual inputs are streamed to the computer system and processed in parallel;
determining, by the computer system, prior to a detection of an end of the second natural language utterance, a first intent associated with the second natural language utterance based on the first portion of the second natural language utterance and the identified type of second user request;
determining, by the computer system, an intent associated with the visual inputs provided by the user; and
generating, by the computer system, at least one response for presentation to the user, utilizing a pre-fetched result related to at least one of the first intent and the intent associated with the visual inputs.
7 Assignments
0 Petitions
Accused Products
Abstract
In certain implementations, intent prediction is provided for a natural language utterance based on a portion of the natural language utterance prior to a system detection of an end of the natural language utterance. In some implementations, a first portion of a natural language utterance of a user may be received. Speech recognition may be performed on the first portion of the natural language utterance to recognize one or more words of the first portion of the natural language utterance. Context information for the natural language utterance may be obtained. Prior to a detection of an end of the natural language utterance, a first intent may be predicted based on the one or more words of the first portion and the context information. One or more user requests may be determined based on the first predicted intent.
-
Citations
31 Claims
-
1. A method of determining an intent prediction for a natural language utterance, prior to a system detection of an end of the natural language utterance, based on a portion of the natural language utterance and statistical information that correlates requests that are linked together in that one type of request statistically follows another type of request, the method being implemented on a computer system that includes one or more physical processors executing computer program instructions which, when executed by the one or more physical processors, perform the method, the method comprising:
-
receiving, by the computer system, a first natural language utterance of a user; determining, by the computer system, a first type of user request based on the first natural language utterance; identifying, by the computer system, at least a second type of user request based on the first type of user request and the statistical information that indicates that the first type of request is made via one or more first spoken words followed by the second type of request, wherein the statistical information indicates that users other than the user have made the second type of request after the first type of request, and wherein identifying the second type of user request comprises determining that the users other than the user made the second type of request after having made the first type of request; receiving, by the computer system, a first portion of a second natural language utterance of the user; performing, by the computer system, speech recognition on the first portion of the second natural language utterance to recognize one or more words of the first portion of the second natural language utterance; receiving, by the computer system, visual inputs provided by the user, wherein the visual inputs are streamed to the computer system and processed in parallel; determining, by the computer system, prior to a detection of an end of the second natural language utterance, a first intent associated with the second natural language utterance based on the first portion of the second natural language utterance and the identified type of second user request; determining, by the computer system, an intent associated with the visual inputs provided by the user; and generating, by the computer system, at least one response for presentation to the user, utilizing a pre-fetched result related to at least one of the first intent and the intent associated with the visual inputs. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for determining an intent prediction for a natural language utterance, prior to a system detection of an end of the natural language utterance, based on a portion of the natural language utterance and statistical information that correlates requests that are linked together in that one type of request statistically follows another type of request, the system comprising:
-
one or more physical processors programmed with computer program instructions which, when executed, cause the one or more physical processors to; receive a first natural language utterance of a user; determine a first type of user request based on the first natural language utterance; identify at least a second type of user request based on the first type of user request and the statistical information that indicates that the first type of request is made via one or more first spoken words followed by the second type of request, wherein the statistical information indicates that users other than the user have made the second type of request after the first type of request, and wherein identifying the second type of user request comprises determining that the users other than the user made the second type of request after having made the first type of request; receive a first portion of a second natural language utterance of the user; perform speech recognition on the first portion of the second natural language utterance to recognize one or more words of the first portion of the second natural language utterance; receive visual inputs provided by the user, wherein the visual inputs are streamed to the computer system and processed in parallel; determine, prior to a detection of an end of the second natural language utterance, a first intent associated with the second natural language utterance based on the first portion of the second natural language utterance and the identified second type of user request; determine an intent associated with the visual inputs provided by the user; and generate at least one response for presentation to the user, utilizing a pre-fetched result related to at least one of the first intent and the intent associated with the visual inputs. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method of determining an intent prediction for a natural language utterance, prior to a system detection of an end of the natural language utterance, based on a portion of the natural language utterance and statistical information that correlates requests that are linked together in that one type of request statistically follows another type of request, the method being implemented on a computer system that includes one or more physical processors executing computer program instructions which, when executed by the one or more physical processors, perform the method, the method comprising:
-
receiving, by the computer system, a first natural language utterance of a user; determining, by the computer system, a first type of user request based on the first natural language utterance; identifying, by the computer system, at least a second type of user request based on the first type of user request and the statistical information that indicates that the first type of request is made via one or more first spoken words followed by the second type of request, wherein the statistical information indicates that users other than the user have made the second type of request after the first type of request, and wherein identifying the second type of user request comprises determining that the users other than the user made the second type of request after having made the first type of request; receiving, by the computer system, a first portion of a second natural language utterance of a user; performing, by the computer system, speech recognition on the first portion of the second natural language utterance to recognize one or more words of the first portion of the second natural language utterance; obtaining, by the computer system, context information for the second natural language utterance; receiving, by the computer system, visual inputs provided by the user, wherein the visual inputs are streamed to the computer system and processed in parallel; determining, by the computer system, an intent associated with the visual inputs provided by the user; determining, by the computer system, prior to the detection of the end of the second natural language utterance, a first intent based on the one or more words of the first portion, the context information, and the identified second type of user request; determining, by the computer system, prior to the detection of the end of the second natural language utterance, one or more inferred words that the user will utter in the second natural language utterance based on the first determined intent; providing, by the computer system, the one or more words of the first portion and the one or more inferred words for user selection; receiving, at the computer system, a user selection of the one or more inferred words; and determining, by the computer system, at least one response for presentation to the user, utilizing a pre-fetched result related to at least one of the first intent, the one or more inferred words and the intent associated with the visual inputs. - View Dependent Claims (24, 25, 26, 27)
-
-
28. A method of determining an intent prediction for a natural language utterance, prior to a system detection of an end of the natural language utterance, based on a portion of the natural language utterance and statistical information that correlates requests that are linked together in that one type of request statistically follows another type of request, the method being implemented on a computer system that includes one or more physical processors executing computer program instructions which, when executed by the one or more physical processors, perform the method, the method comprising:
-
receiving, by the computer system, a first natural language utterance of a user; determining, by the computer system, a first type of user request based on the first natural language utterance; identifying, by the computer system, at least a second type of user request based on the first type of user request and the statistical information that indicates that the first type of request is made via one or more first spoken words followed by the second type of request, wherein the statistical information indicates that users other than the user have made the second type of request after the first type of request, and wherein identifying the second type of user request comprises determining that the users other than the user made the second type of request after having made the first type of request; receiving, by the computer system, a first portion of a second natural language utterance of the user; performing, by the computer system, speech recognition on the first portion of the second natural language utterance to recognize one or more words of the first portion of the second natural language utterance; receiving, by the computer system, visual inputs provided by the user, wherein the visual inputs are streamed to the computer system and processed in parallel; determining, by the computer system, prior to a detection of an end of the second natural language utterance, a first intent associated with the second natural language utterance based on the first portion of the second natural language utterance and the identified second type of user request; determining, by the computer system, an intent associated with the visual inputs provided by the user; determining, by the computer system, a first user request based on the first intent; and obtaining, by the computer system, at least one response for presentation to the user, utilizing a pre-fetched result related to at least one of the first user request and the intent associated with the visual inputs. - View Dependent Claims (29, 30, 31)
-
Specification