Structured models of repetition for speech recognition

US 8,965,765 B2
Filed: 09/19/2008
Issued: 02/24/2015
Est. Priority Date: 09/19/2008
Status: Active Grant

First Claim

Patent Images

1. In a computing environment, a method, comprising, receiving two or more adjacent utterances, in which a later utterance is structurally related to an earlier utterance by repetition, using a structured model of repetition to determine an intention associated with at least one of the utterances, recognizing the utterances as separate sets of word sequences, and wherein using the structured model of repetition comprises performing a joint probability analysis on the word sequences and associated acoustic data, and using word sequences common to the sets of word sequences to select only a subset of the word sequences for the joint probability analysis.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.

24 Citations

View as Search Results

19 Claims

1. In a computing environment, a method, comprising, receiving two or more adjacent utterances, in which a later utterance is structurally related to an earlier utterance by repetition, using a structured model of repetition to determine an intention associated with at least one of the utterances, recognizing the utterances as separate sets of word sequences, and wherein using the structured model of repetition comprises performing a joint probability analysis on the word sequences and associated acoustic data, and using word sequences common to the sets of word sequences to select only a subset of the word sequences for the joint probability analysis.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1 wherein using the structured model of repetition to determine the intention comprises attempting to determine exact words spoken by a user, or selecting at least one entry from among a fixed set of database entries, or both attempting to determine exact words spoken by a user and selecting at least one entry from among a fixed set of database entries.
  - 3. The method of claim 1 wherein the later utterance is a repeated utterance relative to the earlier utterance that occurs in a session having utterances.
  - 4. The method of claim 3 further comprising using data from a separate session for recognizing the later utterance.
  - 5. The method of claim 1 further comprising, using phonetic similarity to select only a subset of the word sequences for the joint probability analysis.
  - 6. The method of claim 5 further comprising, using a transduction process that takes phonemes as input and produces words as output to determine the subset.
  - 7. The method of claim 6 wherein the transduction process uses a language model on the output, in which the language model is built from a set of listings, transcribed utterances, or decoded utterances, or any combination of a set of listings, transcribed utterances, or decoded utterances.
  - 8. The method of claim 1 further comprising, using a statistical user model that makes one or more inferences about at least one guess corresponding to a later utterance that was made regarding a misrecognized previous utterance.
  - 9. The method of claim 1 wherein recognizing the utterances comprises using at least one speech recognizer that is different from a speech recognizer used in recognizing the earlier utterance.
  - 10. The method of claim 1 wherein using the structured model comprises determining that the second utterance is an extension of the first utterance, including that the second utterance adds at least one word before the first utterance, or adds at least one word after the first utterance, or both adds at least one word before the first utterance and adds at least one word after the first utterance.
  - 11. The method of claim 1 wherein using the structured model comprises determining that the second utterance is a truncation of the first utterance, including that the second utterance has removed at least one word before the first utterance, or removed at least one word after the first utterance, or both removed at least one word before the first utterance and removed at least one word after the first utterance.
  - 12. The method of claim 1 wherein using the structured model comprises determining that the second utterance spells at least part of one word that was spoken in the first utterance.
  - 13. The method of claim 1 wherein structured model of repetition comprises a set of one or more features used in a generative probabilistic model, or a set of one or more features used in a maximum entropy model.

14. In a computing environment, a system comprising, at least one processor, a memory communicatively coupled to the at least one processor and including components comprising, a repeat analysis mechanism that processes speech recognition results differently based on whether input speech is an initial input, or is repeated input speech that includes a structural transformation of the initial input, and, when the input speech is the repeated input speech, the repeat analysis mechanism configured to combine recognition data corresponding to the repeated input speech with recognition data corresponding to the prior input speech to provide a recognition result for that repeated input speech, the recognition result based upon one or more structural features corresponding to the repeated input speech in relation to the prior input speech, wherein the repeat analysis mechanism dynamically limits the recognition data corresponding to the repeated input speech that is combined with the recognition data corresponding to the prior input speech.
- View Dependent Claims (15, 16)
- - 15. The system of claim 14 wherein the repeat analysis mechanism is coupled to an automatic speech recognizer that provides recognition data for the initial input, and a different automatic speech recognizer that provides recognition data corresponding to the repeated input speech.
  - 16. The system of claim 14 wherein the repeat analysis mechanism selects a recognition result by selecting at least one listing from a finite set of listings or selecting at least one most probable set of one or more words corresponding the repeated input speech.

17. One or more computer-readable storage media having computer-executable instructions, which when executed perform steps, comprising, receiving an utterance, determining if the utterance is a structural transformation comprising at least one of an extension, a truncation, or at least a partial spelling of a prior utterance from a same speaker as the utterance, and if so, using word sequence data corresponding to recognition of the prior utterance in combination with word sequence data corresponding to recognition of the utterance to select a recognition result for the utterance comprising performing a joint probability analysis on the word sequence data corresponding to recognition of the utterance and associated acoustic data and using the word sequence data corresponding to recognition of the prior utterance and the word sequence data corresponding to recognition of the utterance to select a subset of the word sequences for the joint probability analysis and wherein at least one speech recognizer that is different from a speech recognizer used in recognizing the prior utterance.
- View Dependent Claims (18, 19)
- - 18. The one or more computer-readable storage media of claim 17 wherein selecting a recognition result comprises selecting at least one listing from a finite set of listings, or selecting at least one most probable set of one or more words corresponding to the second utterance.
  - 19. The one or more computer-readable storage media of claim 17 wherein the utterance is a repeated utterance relative to a prior utterance that occurs in a session having two or more utterances, and having computer-executable instructions comprising, using data from a separate session as part of selecting the recognition result for the repeated utterance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Zweig, Geoffrey G., Li, Xiao, Bohus, Dan, Acero, Alejandro, Horvitz, Eric J.
Primary Examiner(s)
Armstrong, Angela A

Application Number

US12/233,826
Publication Number

US 20100076765A1
Time in Patent Office

2,349 Days
Field of Search

704/231, 704/251, 704/255, 704/257
US Class Current

704/251
CPC Class Codes

G10L 15/1822 Parsing for meaning underst...

Structured models of repetition for speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

24 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Structured models of repetition for speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

24 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links