Hierarchical methods and apparatus for extracting user intent from spoken utterances

US 8,265,939 B2
Filed: 08/31/2005
Issued: 09/11/2012
Est. Priority Date: 08/31/2005
Status: Active Grant

First Claim

Patent Images

1. A method for determining an intended action of a user of a computing system environment, the computing system environment comprising a voice system, the intended action being specified via a spoken input of the user, wherein the method comprises acts of:

obtaining a decoding of the spoken input of the user, wherein the voice system has a precise machine-based grammar to allow the user to invoke the intended action by speaking one or more predetermined voice commands and wherein the spoken input is a free form voice instruction that is different than the precise machine-based grammar; and

extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple hierarchically dependent semantic stages, comprising;

determining a first level of classification of the intended action from the decoding of the spoken input during a first semantic stage of the iterative hierarchical extraction process, the first level of classification having a plurality of sub-classifications associated with the first level of classification; and

determining, from among the plurality of sub-classifications associated with the first level of classification, a second level of classification of the intended action from the same decoding of the spoken input during a second semantic stage of the iterative hierarchical extraction process.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a class is determined after a first iteration and a sub-class of the class is determined after a second iteration. The class and the sub-class of the class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target. The user intent extracting step may further determine a sub-class of the sub-class of the class after a third iteration, such that the class, the sub-class of the class, and the sub-class of the sub-class of the class are hierarchically indicative of the intent of the user.

49 Citations

View as Search Results

30 Claims

1. A method for determining an intended action of a user of a computing system environment, the computing system environment comprising a voice system, the intended action being specified via a spoken input of the user, wherein the method comprises acts of:
- obtaining a decoding of the spoken input of the user, wherein the voice system has a precise machine-based grammar to allow the user to invoke the intended action by speaking one or more predetermined voice commands and wherein the spoken input is a free form voice instruction that is different than the precise machine-based grammar; and
  
  extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple hierarchically dependent semantic stages, comprising;
  
  determining a first level of classification of the intended action from the decoding of the spoken input during a first semantic stage of the iterative hierarchical extraction process, the first level of classification having a plurality of sub-classifications associated with the first level of classification; and
  
  determining, from among the plurality of sub-classifications associated with the first level of classification, a second level of classification of the intended action from the same decoding of the spoken input during a second semantic stage of the iterative hierarchical extraction process.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the second level of classification has a plurality of sub-classifications associated with the second level of classification, and wherein extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple semantic stages further comprises determining, from among the plurality of sub-classifications associated with the second level of classification, a third level of classification of the intended action from the same decoding of the spoken input during a third semantic stage of the iterative hierarchical extraction process.
  - 3. The method of claim 2, wherein the first level of classification represents a target associated with the intended action, the second level of classification represents a target action associated with the target, and the third level of classification represents data associated with the target action.
  - 4. The method of claim 2, wherein the act of extracting the intended action further comprises selecting a number of top scoring labels at each semantic stage utilizing confidence scores and at least one rejection criterion.
  - 5. The method of claim 4, wherein selecting a number of top scoring labels comprises determining a relative importance of at least a portion of the spoken input based at least in part on at least one auditory characteristic of the portion of the spoken input unrelated to semantic content of the spoken input.
  - 6. The method of claim 1, further comprising providing one or more commands to the voice system based, at least in part, on the determination of the first level of classification and the second level of classification.
  - 7. The method of claim 1, wherein the precise machine-based grammar is hierarchically arranged, and wherein the first level of classification and the second level of classification correspond to different levels within the grammar.
  - 8. The method of claim 1, wherein the method comprises extracting a value for an attribute at each of the first semantic stage and the second semantic stage of the iterative hierarchical extraction process.
  - 9. The method of claim 1, wherein the act of extracting the intended action comprises considering the decoding of the spoken input in its entirety during each of the first semantic stage and the second semantic stage of the iterative hierarchical extraction process.
  - 10. The method of claim 1, wherein neither the first semantic stage nor the second semantic stage involves tagging each word of the decoding of the spoken input or attaching a semantic label.

11. At least one computer readable storage device encoded with a plurality of instructions that, when executed, cause at least one processor to perform a method for determining an intended action of a user of a computing system environment, the computing system environment comprising a voice system, the intended action being specified via a spoken utterance input of the user, wherein the method comprises acts of:
- obtaining a decoding of the spoken input of the user, wherein the voice system has a precise machine-based grammar to allow the user to invoke the intended action by speaking one or more predetermined voice commands and wherein the spoken input is a free form voice instruction that is different than the precise machine-based grammar; and
  
  extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple hierarchically dependent semantic stages, comprising;
  
  determining a first level of classification of the intended action from the decoding of the spoken input during a first semantic stage of the iterative hierarchical extraction process, the first level of classification having a plurality of sub-classifications associated with the first level of classification; and
  
  determining, from among the plurality of sub-classifications associated with the first level of classification, a second level of classification of the intended action from the same decoding of the spoken input during a second semantic stage of the iterative hierarchical extraction process.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The at least one computer readable storage device of claim 11, wherein the second level of classification has a plurality of sub-classifications associated with the second level of classification, and wherein extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple semantic stages further comprises determining, from among the plurality of sub-classifications associated with the second level of classification, a third level of classification of the intended action from the same decoding of the spoken input during a third semantic stage of the iterative hierarchical extraction process.
  - 13. The at least one computer readable storage device of claim 12, wherein the first level of classification represents a target associated with the intended action, the second level of classification represents a target action associated with the target, and the third level of classification represents data associated with the target action.
  - 14. The at least one computer readable storage device of claim 12, wherein the act of extracting the intended action further comprises selecting a number of top scoring labels at each semantic stage utilizing confidence scores and at least one rejection criterion.
  - 15. The at least one computer readable storage device of claim 14, wherein selecting a number of top scoring labels comprises determining a relative importance of at least a portion of the spoken input based at least in part on at least one auditory characteristic of the portion of the spoken input unrelated to semantic content of the spoken input.
  - 16. The at least one computer readable storage device of claim 11, wherein the method further comprises providing one or more commands to the voice system based, at least in part, on the determination of the first level of classification and the second level of classification.
  - 17. The at least one computer readable storage device of claim 11, wherein the precise machine-based grammar is hierarchically arranged, and wherein the first level of classification and the second level of classification correspond to different levels within the grammar.
  - 18. The at least one computer readable storage device of claim 11, wherein the method comprises extracting a value for an attribute at each of the first semantic stage and the second semantic stage of the iterative hierarchical extraction process.
  - 19. The at least one computer readable storage device of claim 11, wherein the act of extracting the intended action comprises considering the decoding of the spoken input in its entirety during each of the first semantic stage and the second semantic stage of the iterative hierarchical extraction process.

20. An apparatus comprising:
- at least one processor programmed to determine an intended action specified via a spoken input of a user of a computing system environment comprising a voice system by;
  
  obtaining a decoding of the spoken input of the user, wherein the voice system has a precise machine-based grammar to allow the user to invoke the intended action by speaking one or more predetermined voice commands and wherein the spoken input is a free form voice instruction that is different than the precise machine-based grammar; and
  
  extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple hierarchically dependent semantic stages, comprising;
  
  determining a first level of classification of the intended action from the decoding of the spoken input during a first semantic stage of the iterative hierarchical extraction process, the first level of classification having a plurality of sub-classifications associated with the first level of classification; and
  
  determining, from among the plurality of sub-classifications associated with the first level of classification, a second level of classification of the intended action from the same decoding of the spoken input during a second semantic stage of the iterative hierarchical extraction process.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 21. The apparatus of claim 20, wherein the second level of classification has a plurality of sub-classifications associated with the second level of classification, and wherein extracting the intended action from the decoding of the spoken input using an iterative hierarchical extraction process comprising analyzing the decoding of the spoken input in multiple semantic stages further comprises determining, from among the plurality of sub-classifications associated with the second level of classification, a third level of classification of the intended action from the same decoding of the spoken input during a third semantic stage of the iterative hierarchical extraction process.
  - 22. The apparatus of claim 21, wherein the first level of classification represents a target associated with the intended action, the second level of classification represents a target action associated with the target, and the third level of classification represents data associated with the target action.
  - 23. The apparatus of claim 21, wherein extracting the intended action further comprises selecting a number of top scoring labels at each semantic stage utilizing confidence scores and at least one rejection criterion.
  - 24. The apparatus of claim 23, wherein selecting a number of top scoring labels comprises determining a relative importance of at least a portion of the spoken input based at least in part on at least one auditory characteristic of the portion of the spoken input unrelated to semantic content of the spoken input.
  - 25. The apparatus of claim 20, wherein the at least one processor is further programmed to provide one or more commands to the voice system based, at least in part, on the determining of the first level of classification and the second level of classification.
  - 26. The apparatus of claim 20, wherein the at least one processor is further programmed to generate one or more questions of the user and to use answers to the one or more questions to facilitate determining the intended action.
  - 27. The apparatus of claim 20, wherein the at least one processor is further programmed to gather information from one or more sensors and use the information gathered from the one or more sensors to facilitate determining the intended action.
  - 28. The apparatus of claim 20, wherein the precise machine-based grammar is hierarchically arranged, and wherein the first level of classification and the second level of classification correspond to different levels within the grammar.
  - 29. The apparatus of claim 20, wherein extracting the intended action comprises extracting a value for an attribute at each of the first semantic stage and the second semantic stage of the iterative hierarchical extraction process.
  - 30. The apparatus of claim 20, wherein extracting the intended action comprises considering the decoding of the spoken input in its entirety during each of the first semantic stage and the second semantic stage of the iterative hierarchical extraction process.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Kanevsky, Dimitri, Reisinger, Joseph Simon, Sicconi, Robert, Viswanathan, Mahesh
Primary Examiner(s)
He, Jialong

Application Number

US11/216,483
Publication Number

US 20070055529A1
Time in Patent Office

2,568 Days
Field of Search

704/9, 704/257, 704/275, 704/270, 704/270.1
US Class Current

704/275
CPC Class Codes

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/1822   Parsing for meaning underst...

G10L 2015/226   using non-speech characteri...

Hierarchical methods and apparatus for extracting user intent from spoken utterances

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

49 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Hierarchical methods and apparatus for extracting user intent from spoken utterances

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

49 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links