System and method for estimating the reliability of alternate speech recognition hypotheses in real time

US 9,653,066 B2
Filed: 10/23/2009
Issued: 05/16/2017
Est. Priority Date: 10/23/2009
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving an N-best list of speech recognition hypotheses from a speech utterance, wherein the N-best list of speech recognition hypotheses comprises words recognized from the speech utterance;

receiving an acoustic score of each word in the N-best list of speech recognition hypotheses;

receiving a count indicating a number of words associated with each hypothesis in the speech recognition hypotheses;

receiving an indication of problematic words in the each hypothesis in the N-best list of speech recognition hypotheses, wherein the indication is determined by a reliability estimator;

determining, via a processor and based on a feature set evaluated by an algorithm, a first probability of correctness for the each hypothesis in the N-best list of speech recognition hypotheses, the feature set evaluated by the algorithm comprising the count, the acoustic score, and the indication of problematic words;

determining, via the processor, a second probability that the N-best list of speech recognition hypotheses does not contain a correct hypothesis using the reliability estimator; and

using the first probability and the second probability in a spoken dialog.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and computer-readable storage media for estimating reliability of alternate speech recognition hypotheses. A system configured to practice the method receives an N-best list of speech recognition hypotheses and features describing the N-best list, determines a first probability of correctness for each hypothesis in the N-best list based on the received features, determines a second probability that the N-best list does not contain a correct hypothesis, and uses the first probability and the second probability in a spoken dialog. The features can describe properties of at least one of a lattice, a word confusion network, and a garbage model. In one aspect, the N-best lists are not reordered according to reranking scores. The determination of the first probability of correctness can include a first stage of training a probabilistic model and a second stage of distributing mass over items in a tail of the N-best list.

Citations

20 Claims

1. A method comprising:
- receiving an N-best list of speech recognition hypotheses from a speech utterance, wherein the N-best list of speech recognition hypotheses comprises words recognized from the speech utterance;
  
  receiving an acoustic score of each word in the N-best list of speech recognition hypotheses;
  
  receiving a count indicating a number of words associated with each hypothesis in the speech recognition hypotheses;
  
  receiving an indication of problematic words in the each hypothesis in the N-best list of speech recognition hypotheses, wherein the indication is determined by a reliability estimator;
  
  determining, via a processor and based on a feature set evaluated by an algorithm, a first probability of correctness for the each hypothesis in the N-best list of speech recognition hypotheses, the feature set evaluated by the algorithm comprising the count, the acoustic score, and the indication of problematic words;
  
  determining, via the processor, a second probability that the N-best list of speech recognition hypotheses does not contain a correct hypothesis using the reliability estimator; and
  
  using the first probability and the second probability in a spoken dialog.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the speech recognition hypotheses are stored in a word confusion network.
  - 3. The method of claim 1, wherein the processor is configured to perform speech language generation.
  - 4. The method of claim 1, wherein determining the first probability of correctness comprises two stages.
  - 5. The method of claim 4, wherein a first stage of the two stages comprises training a discriminative model P_a.
  - 6. The method of claim 5, wherein a second stage of the two stages comprises distributing mass over items in a tail of the N-best list.
  - 7. The method of claim 1, wherein the processor is configured to perform spoken language understanding.
  - 8. The method of claim 1, the processor is configured to perform automatic speech recognition, and further comprising using the first probability and the second probability in the automatic speech recognition.

9. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, result in the processor performing operations comprising;
  
  receiving an N-best list of speech recognition hypotheses from a speech utterance, wherein the N-best list of speech recognition hypotheses comprises words recognized from the speech utterance;
  
  receiving an acoustic score of each word in the N-best list of speech recognition hypotheses;
  
  receiving a count indicating a number of words associated with each hypothesis in the speech recognition hypotheses;
  
  receiving an indication of problematic words in the each hypothesis in the N-best list of speech recognition hypotheses, wherein the indication is determined by a reliability estimator;
  
  determining, based on a feature set evaluated by an algorithm, a first probability of correctness for the each hypothesis in the N-best list of speech recognition hypotheses, the feature set evaluated by the algorithm comprising the count, the acoustic score, and the indication of problematic words;
  
  determining a second probability that the N-best list of speech recognition hypotheses does not contain a correct hypothesis using the reliability estimator; and
  
  using the first probability and the second probability in a spoken dialog.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The system of claim 9, wherein the speech recognition hypotheses are stored in a garbage model.
  - 11. The system of claim 9, wherein the processor is configured to perform speech language generation.
  - 12. The system of claim 9, wherein determining the first probability of correctness comprises two stages.
  - 13. The system of claim 12, wherein a first stage of the two stages comprises training a discriminative model P_a.
  - 14. The system of claim 13, wherein a second stage of the two stages comprises distributing mass over items in a tail of the N-best list.

15. A computer-readable storage medium having instructions stored which, when executed by a computing device, cause the computing device perform operations comprising:
- receiving an N-best list of speech recognition hypotheses from a speech utterance, wherein the N-best list of speech recognition hypotheses comprises words recognized from the speech utterance;
  
  receiving an acoustic score of each word in the N-best list of speech recognition hypotheses;
  
  receiving a count indicating a number of words associated with each hypothesis in the speech recognition hypotheses;
  
  receiving an indication of problematic words in the each hypothesis in the N-best list of speech recognition hypotheses, wherein the indication is determined by a reliability estimator;
  
  determining, based on a feature set evaluated by an algorithm, a first probability of correctness for the each hypothesis in the N-best list of speech recognition hypotheses, the feature set evaluated by the algorithm comprising the count, the acoustic score, and the indication of problematic words;
  
  determining a second probability that the N-best list of speech recognition hypotheses does not contain a correct hypothesis using the reliability estimator; and
  
  using the first probability and the second probability in a spoken dialog.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable storage medium of claim 15, wherein the speech recognition hypotheses are stored in one of a word confusion network and a garbage model.
  - 17. The computer-readable storage medium of claim 15, wherein the computing device is configured to perform speech language generation.
  - 18. The computer-readable storage medium of claim 15, wherein determining the first probability of correctness comprises two stages.
  - 19. The computer-readable storage medium of claim 18, wherein a first stage of the two stages comprises training a discriminative model P_a.
  - 20. The computer-readable storage medium of claim 19, wherein a second stage of the two stages comprises distributing mass over items in a tail of the N-best list.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Williams, Jason, Balakrishnan, Suhrid
Primary Examiner(s)
Godbold, Douglas
Assistant Examiner(s)
Villena, Mark

Application Number

US12/604,650
Publication Number

US 20110099012A1
Time in Patent Office

2,762 Days
Field of Search

704275, 704240, 704236, 704255
US Class Current
CPC Class Codes

G10L 15/01   Assessment or evaluation of...

G10L 15/04   Segmentation; Word boundary...

G10L 15/08   Speech classification or se...

G10L 15/083   Recognition networks G10L15...

G10L 15/14   using statistical models, e...

G10L 15/22   Procedures used during a sp...

G10L 15/28   Constructional details of s...

System and method for estimating the reliability of alternate speech recognition hypotheses in real time

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for estimating the reliability of alternate speech recognition hypotheses in real time

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links