Hierarchical methods and apparatus for extracting user intent from spoken utterances
First Claim
1. A method for determining an intended action of a user of a computing system environment, the computing system environment comprising a voice system, the intended action being specified via a spoken input of the user, wherein the method comprises acts of:
- obtaining a decoding of the spoken input of the user, wherein the voice system has a precise machine-based grammar to allow the user to invoke the intended action by speaking one or more predetermined voice commands and wherein the spoken input is a free form voice instruction that is different than the precise machine-based grammar; and
extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple hierarchically dependent semantic stages, comprising;
determining a first level of classification of the intended action from the decoding of the spoken input during a first semantic stage of the iterative hierarchical extraction process, the first level of classification having a plurality of sub-classifications associated with the first level of classification; and
determining, from among the plurality of sub-classifications associated with the first level of classification, a second level of classification of the intended action from the same decoding of the spoken input during a second semantic stage of the iterative hierarchical extraction process.
3 Assignments
0 Petitions
Accused Products
Abstract
A technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a class is determined after a first iteration and a sub-class of the class is determined after a second iteration. The class and the sub-class of the class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target. The user intent extracting step may further determine a sub-class of the sub-class of the class after a third iteration, such that the class, the sub-class of the class, and the sub-class of the sub-class of the class are hierarchically indicative of the intent of the user.
49 Citations
30 Claims
-
1. A method for determining an intended action of a user of a computing system environment, the computing system environment comprising a voice system, the intended action being specified via a spoken input of the user, wherein the method comprises acts of:
-
obtaining a decoding of the spoken input of the user, wherein the voice system has a precise machine-based grammar to allow the user to invoke the intended action by speaking one or more predetermined voice commands and wherein the spoken input is a free form voice instruction that is different than the precise machine-based grammar; and extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple hierarchically dependent semantic stages, comprising; determining a first level of classification of the intended action from the decoding of the spoken input during a first semantic stage of the iterative hierarchical extraction process, the first level of classification having a plurality of sub-classifications associated with the first level of classification; and determining, from among the plurality of sub-classifications associated with the first level of classification, a second level of classification of the intended action from the same decoding of the spoken input during a second semantic stage of the iterative hierarchical extraction process. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. At least one computer readable storage device encoded with a plurality of instructions that, when executed, cause at least one processor to perform a method for determining an intended action of a user of a computing system environment, the computing system environment comprising a voice system, the intended action being specified via a spoken utterance input of the user, wherein the method comprises acts of:
-
obtaining a decoding of the spoken input of the user, wherein the voice system has a precise machine-based grammar to allow the user to invoke the intended action by speaking one or more predetermined voice commands and wherein the spoken input is a free form voice instruction that is different than the precise machine-based grammar; and extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple hierarchically dependent semantic stages, comprising; determining a first level of classification of the intended action from the decoding of the spoken input during a first semantic stage of the iterative hierarchical extraction process, the first level of classification having a plurality of sub-classifications associated with the first level of classification; and determining, from among the plurality of sub-classifications associated with the first level of classification, a second level of classification of the intended action from the same decoding of the spoken input during a second semantic stage of the iterative hierarchical extraction process. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. An apparatus comprising:
at least one processor programmed to determine an intended action specified via a spoken input of a user of a computing system environment comprising a voice system by; obtaining a decoding of the spoken input of the user, wherein the voice system has a precise machine-based grammar to allow the user to invoke the intended action by speaking one or more predetermined voice commands and wherein the spoken input is a free form voice instruction that is different than the precise machine-based grammar; and extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple hierarchically dependent semantic stages, comprising; determining a first level of classification of the intended action from the decoding of the spoken input during a first semantic stage of the iterative hierarchical extraction process, the first level of classification having a plurality of sub-classifications associated with the first level of classification; and determining, from among the plurality of sub-classifications associated with the first level of classification, a second level of classification of the intended action from the same decoding of the spoken input during a second semantic stage of the iterative hierarchical extraction process. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
Specification