Hierarchical methods and apparatus for extracting user intent from spoken utterances
First Claim
1. A method for determining an intended action of a user of a computing system environment, the computing system environment comprising a voice system, the intended action being specified via a spoken input of the user, the method comprising:
- obtaining a decoding of the spoken input of the user; and
extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple hierarchically dependent semantic stages, comprising;
determining a first level of classification of the intended action from the decoding of the spoken input during a first semantic stage of the iterative hierarchical extraction process, the first level of classification having a plurality of sub-classifications associated with the first level of classification; and
determining, from among the plurality of sub-classifications associated with the first level of classification, a second level of classification of the intended action from the same decoding of the spoken input during a second semantic stage of the iterative hierarchical extraction process,wherein determining the intended action further comprises utilizing information about the user or the user'"'"'s environment.
2 Assignments
0 Petitions
Accused Products
Abstract
Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system. For example, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target.
40 Citations
20 Claims
-
1. A method for determining an intended action of a user of a computing system environment, the computing system environment comprising a voice system, the intended action being specified via a spoken input of the user, the method comprising:
-
obtaining a decoding of the spoken input of the user; and extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple hierarchically dependent semantic stages, comprising; determining a first level of classification of the intended action from the decoding of the spoken input during a first semantic stage of the iterative hierarchical extraction process, the first level of classification having a plurality of sub-classifications associated with the first level of classification; and determining, from among the plurality of sub-classifications associated with the first level of classification, a second level of classification of the intended action from the same decoding of the spoken input during a second semantic stage of the iterative hierarchical extraction process, wherein determining the intended action further comprises utilizing information about the user or the user'"'"'s environment. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. At least one computer readable storage device encoded with a plurality of instructions that, when executed, cause at least one processor to perform a method for determining an intended action of a user of a computing system environment, the computing system environment comprising a voice system, the intended action being specified via a spoken input of the user, wherein the method comprises acts of:
-
obtaining a decoding of the spoken input of the user; and extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple hierarchically dependent semantic stages, comprising; determining a first level of classification of the intended action from the decoding of the spoken input during a first semantic stage of the iterative hierarchical extraction process, the first level of classification having a plurality of sub-classifications associated with the first level of classification; and determining, from among the plurality of sub-classifications associated with the first level of classification, a second level of classification of the intended action from the same decoding of the spoken input during a second semantic stage of the iterative hierarchical extraction process, wherein determining the intended action further comprises utilizing information about the user or the user'"'"'s environment. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. An apparatus comprising:
-
at least one processor programmed to determine an intended action specified via a spoken input of a user of a computing system environment comprising a voice system by; obtaining a decoding of the spoken input of the user; and extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple hierarchically dependent semantic stages, comprising; determining a first level of classification of the intended action from the decoding of the spoken input during a first semantic stage of the iterative hierarchical extraction process, the first level of classification having a plurality of sub-classifications associated with the first level of classification; and determining, from among the plurality of sub-classifications associated with the first level of classification, a second level of classification of the intended action from the same decoding of the spoken input during a second semantic stage of the iterative hierarchical extraction process wherein determining the intended action further comprises utilizing information about the user or the user'"'"'s environment. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification