System and method for speech recognition using dynamically adjusted confidence measure

US 5,710,866 A
Filed: 05/26/1995
Issued: 01/20/1998
Est. Priority Date: 05/26/1995
Status: Expired due to Term

First Claim

Patent Images

1. A computer-implemented method of recognizing an input speech utterance, comprising:

receiving the input speech utterance;

comparing the input speech utterance with a plurality of stored acoustic utterance models, each stored acoustic utterance model corresponding to a linguistic expression, the comparing including an output probability analysis and a transition probability analysis of the input speech utterance with respect to each of the stored acoustic utterance models and resulting in a constrained acoustic score for each of the stored acoustic utterance models;

for each stored acoustic utterance model, computing a confidence measure based on the constrained acoustic score for the stored acoustic utterance model, the confidence measure reflecting a probability that the constrained acoustic score correctly reflects how accurately the stored acoustic utterance model matches the input speech utterance;

for each stored acoustic utterance model, computing a combined score based on the constrained acoustic score and the confidence measure computed for the stored acoustic utterance model; and

determining which of the linguistic expressions corresponding to the stored acoustic utterance models most closely matches the input speech utterance based on the combined scores, thereby recognizing the input speech utterance.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method of recognizing an input speech utterance compares the input speech utterance to a plurality of hidden Markov models to obtain a constrained acoustic score that reflects the probability that the hidden Markov model matches the input speech utterance. The method computes a confidence measure for each hidden Markov model that reflects the probability of the constrained acoustic score being correct. The computed confidence measure is then used to adjust the constrained acoustic score. Preferably, the confidence measure is computed based on a difference between the constrained acoustic score and an unconstrained acoustic score that is computed independently of any language context. In addition, a new confidence measure preferably is computed for each input speech frame from the input speech utterance so that the constrained acoustic score is adjusted for each input speech frame.

127 Citations

30 Claims

1. A computer-implemented method of recognizing an input speech utterance, comprising:
- receiving the input speech utterance;
  
  comparing the input speech utterance with a plurality of stored acoustic utterance models, each stored acoustic utterance model corresponding to a linguistic expression, the comparing including an output probability analysis and a transition probability analysis of the input speech utterance with respect to each of the stored acoustic utterance models and resulting in a constrained acoustic score for each of the stored acoustic utterance models;
  
  for each stored acoustic utterance model, computing a confidence measure based on the constrained acoustic score for the stored acoustic utterance model, the confidence measure reflecting a probability that the constrained acoustic score correctly reflects how accurately the stored acoustic utterance model matches the input speech utterance;
  
  for each stored acoustic utterance model, computing a combined score based on the constrained acoustic score and the confidence measure computed for the stored acoustic utterance model; and
  
  determining which of the linguistic expressions corresponding to the stored acoustic utterance models most closely matches the input speech utterance based on the combined scores, thereby recognizing the input speech utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 14, 15, 16)
- - 2. The method of claim 1, further including:
    - comparing the input speech utterance with a plurality of context-independent acoustic utterance models without analyzing transition probabilities;
      
      determining which of the context-independent acoustic utterance models most closely matches the input speech utterance, thereby resulting in an unconstrained acoustic score for the most closely matching context-independent acoustic utterance model; and
      
      wherein the step of computing a confidence measure for each acoustic utterance model includes computing the confidence measure based on a difference between the constrained acoustic score for the acoustic utterance model and the unconstrained acoustic score.
  - 3. The method of claim 1 wherein the comparing step includes:
    - for each stored acoustic utterance model, adjusting the constrained acoustic score for the stored acoustic utterance model based on the confidence measure determined for the stored acoustic utterance model.
  - 4. The method of claim 1 wherein the input speech utterance includes a plurality of speech frames and the comparing step includes:
    - comparing each speech frame of the input speech utterance to each of the stored acoustic utterance models, thereby resulting in an incremental constrained acoustic score for each speech frame and for each stored acoustic utterance model; and
      
      summing the incremental constrained acoustic scores to obtain the constrained acoustic score.
  - 5. The method of claim 1, further comprising:
    - for each stored acoustic utterance model, computing a linguistic score that reflects a probability that the stored acoustic utterance model corresponds to a linguistic model in a stored lexicon of linguistic models; and
      
      wherein the step of computing a combined score for each stored acoustic utterance model includes computing the combined score based on the linguistic score and a language weight that is varied based on the confidence measure computed for the stored acoustic utterance model.
  - 6. The method of claim 1, further including:
    - for each stored acoustic utterance model;
      
      penalizing the combined score by an insertion penalty that is adjusted based on how many words there are in the linguistic expression corresponding to the stored acoustic utterance model; and
      
      adjusting the insertion penalty based on the confidence measure computed for the stored acoustic utterance model.
  - 14. The method of claim 1 wherein each stored acoustic utterance model is a hidden Markov model.
  - 15. The method of claim 1, further comprising:
    - for each stored acoustic utterance model, computing a linguistic score that reflects a probability that the stored acoustic utterance model corresponds to a linguistic model in a stored lexicon of linguistic models; and
      
      wherein the step of computing a combined score for each stored acoustic utterance model includes computing the combined score based on the linguistic score and a language weight computed for the stored acoustic utterance model.
  - 16. The method of claim 1 wherein the step of computing a combined score for each stored acoustic utterance model includes, for each stored acoustic utterance model, computing the combined score based on a language weight that is varied based on the confidence measure computed for the stored acoustic utterance model.

7. A computer-implemented method of recognizing an input speech utterance that includes a sequence of input speech frames, the method comprising:
- receiving the sequence of input speech frames;
  
  for each input speech frame;
  
  comparing the input speech frame to a plurality of acoustic hidden Markov models, each hidden Markov model corresponding to a linguistic expression, the comparing including an output probability analysis and a transition probability analysis of the input speech frame with respect to each of the hidden Markov models and resulting in a constrained acoustic scored for each of the hidden Markov models; and
  
  for each hidden Markov model;
  
  computing a confidence measure based on the constrained acoustic score for the hidden Markov model, the confidence measure reflecting a probability that the constrained acoustic score correctly reflects how accurately the hidden Markov model matches the input speech frame;
  
  adjusting the constrained acoustic score for the hidden Markov model based on the confidence measure computed for the hidden Markov model; and
  
  determining which of the linguistic expressions corresponding to the hidden Markov models most closely matches the input speech utterance based on the adjusted constrained acoustic scores, thereby recognizing the input speech utterance.
- View Dependent Claims (8, 9, 10, 17, 18)
- - 8. The method of claim 7, further including:
    - for each input speech frame;
      
      comparing the input speech frame with a plurality of context-independent acoustic utterance models without analyzing transition probabilities;
      
      determining which of the context independent acoustic utterance models most closely matches the input speech frame, thereby resulting in an unconstrained acoustic score for the most closely matching context independent acoustic utterance model; and
      
      wherein the step of computing a confidence measure for each hidden Markov model includes computing the confidence measure based on a difference between the constrained acoustic score for the hidden Markov model and the unconstrained acoustic score.
  - 9. The method of claim 7, further including, for each hidden Markov model, computing a combined score based on a language weight that is varied based on the confidence measures computed for the hidden Markov model.
  - 10. The method of claim 7, further including:
    - for each hidden Markov model;
      
      penalizing the combined score by an insertion penalty that is adjusted based on how many words there are in the linguistic expression corresponding to the hidden Markov model; and
      
      adjusting the insertion penalty based on the confidence measure computed for the hidden Markov model.
  - 17. The method of claim 7, further comprising:
    - for each hidden Markov model, computing a linguistic score that reflects a probability that the hidden Markov model corresponds to a linguistic model in a stored lexicon of linguistic models; and
      
      computing a combined score for each hidden Markov model based on the linguistic score and a language weight computed for the hidden Markov model.
  - 18. The method of claim 7, further comprising:
    - for each input speech frame and for each hidden Markov model;
      
      computing a language weight based on the confidence measure computed for the hidden Markov model for the input speech frame;
      
      computing a linguistic score that reflects a probability that the hidden Markov model corresponds to a linguistic model in a stored lexicon of linguistic models; and
      
      computing an incremental combined score based on the adjusted constrained acoustic score, the linguistic score and the language weight computed for the hidden Markov model for the input speech frame; and
      
      for each hidden Markov model, summing the incremental combined acoustic scores computed for the hidden Markov model to obtain a total combined acoustic score for the hidden Markov model;
      
      wherein the determining step includes recognizing the input speech utterance based on the total combined acoustic scores computed for the hidden Markov models.

11. A computer system for recognizing an input speech utterance that includes a sequence of input speech frames, the system comprising:
- a hidden Markov model storage unit storing a plurality of context-dependent hidden Markov models, each hidden Markov model corresponding to a linguistic expression;
  
  a context-independent acoustic model storage unit storing a plurality of context-independent acoustic models;
  
  means for comparing the sequence of input speech frames to the hidden Markov models, including means for performing an output probability analysis and means for performing a transition probability analysis of the sequence of input speech frames with respect to each of the hidden Markov models and resulting in a constrained acoustic scored for each of the hidden Markov models; and
  
  means for comparing each input speech frame to the context-independent acoustic models, for computing an incremental unconstrained acoustic score for each input speech frame that reflects a probability that the input frame matches whichever context-independent acoustic model most closely matches the input speech frame, and for accumulating the incremental unconstrained acoustic scores into a unconstrained acoustic score for the sequence of input speech frames;
  
  means for computing a confidence measure for each hidden Markov model based on a difference between the constrained acoustic score for the hidden Markov model and the unconstrained acoustic score;
  
  means for adjusting the constrained acoustic score for each hidden Markov model based on the confidence measure computed for the hidden Markov model; and
  
  means for determining which of the linguistic expressions corresponding to the hidden Markov models most closely matches the input speech utterance based on the adjusted constrained acoustic scores, thereby recognizing the input speech utterance.
- View Dependent Claims (12, 13)
- - 12. The system of claim 11 wherein the means for determining includes means that, for each hidden Markov model, computes a combined score based on a language weight that is varied based on the confidence measure computed for the hidden Markov model, and the means for determining further includes means for determining which of the hidden Markov models has the highest combined score and for recognizing the linguistic expression associated with the hidden Markov model with the highest combined score as the input speech utterance.
  - 13. The system of claim 12, further including:
    - means that, for each of the hidden Markov models, computes a separate insertion penalty based on the confidence measure computed for the hidden Markov model; and
      
      means that, for each of the hidden Markov models, penalizes the combined score for the hidden Markov model by the insertion penalty computed for the hidden Markov model.

19. A computer-implemented method of recognizing an input speech utterance, comprising:
- receiving the input speech utterance;
  
  for each of a plurality of stored acoustic utterance models, computing a context-based constrained acoustic score based on context information of the stored acoustic utterance model, each stored acoustic utterance model corresponding to a linguistic expression;
  
  for each stored acoustic utterance model, computing a confidence measure based on the constrained acoustic score for the stored acoustic utterance model, the confidence measure reflecting a probability that the constrained acoustic score correctly reflects how accurately the stored acoustic utterance model matches the input speech utterance;
  
  for each stored acoustic utterance model, computing a combined score based on the constrained acoustic score and the confidence measure computed for the stored acoustic utterance model; and
  
  determining which of the linguistic expressions corresponding to the stored acoustic utterance models most closely matches the input speech utterance based on the combined scores, thereby recognizing the input speech utterance.
- View Dependent Claims (20)
- - 20. The method of claim 19, further including:
    - computing an unconstrained acoustic score for the input speech utterance without regard for the context information of the stored acoustic utterance models; and
      
      wherein the step of computing a confidence measure for each acoustic utterance model includes computing the confidence measure based on a difference between the constrained acoustic score for the acoustic utterance model and the unconstrained acoustic score.

21. A computer-implemented method of recognizing an input speech utterance, comprising:
- receiving the input speech utterance;
  
  for each of a plurality of stored acoustic utterance models, computing a context-based constrained acoustic score based on context information of the stored acoustic utterance model, each stored acoustic utterance model corresponding to a linguistic expression;
  
  computing an unconstrained acoustic score for the input speech utterance without regard for the context information of the stored acoustic utterance models;
  
  for each stored acoustic utterance model, computing a confidence measure based on the unconstrained acoustic score and the constrained acoustic score for the stored acoustic utterance model, the confidence measure reflecting a probability that the constrained acoustic score correctly reflects how accurately the stored acoustic utterance model matches the input speech utterance;
  
  for each stored acoustic utterance model, adjusting the constrained acoustic score for the stored acoustic utterance model based on the confidence measure determined for the stored acoustic utterance model; and
  
  determining which of the linguistic expressions corresponding to the stored acoustic utterance models most closely matches the input speech utterance based on the constrained acoustic scores, thereby recognizing the input speech utterance.
- View Dependent Claims (22)
- - 22. The method of claim 21, further comprising:
    - for each stored acoustic utterance model, computing a linguistic score that reflects a probability that the stored acoustic utterance model corresponds to a linguistic model in a stored lexicon of linguistic models; and
      
      for each stored acoustic utterance model, computing a combined score based on the constrained acoustic score and the linguistic score computed for the stored acoustic utterance model and on a language weight that is varied based on the confidence measure computed for the stored acoustic utterance model.

23. A computer storage medium for controlling a computer to recognize an input speech utterance received by the computer, the storage medium comprising:
- a plurality of acoustic utterance models, each acoustic utterance model corresponding to a linguistic expression;
  
  computer instructions for comparing the input speech utterance with each of the acoustic utterance models to obtain a constrained acoustic score for each of the acoustic utterance models;
  
  computer instructions that, for each of the acoustic utterance models, causes the computer to compute a confidence measure based on the constrained acoustic score for the stored acoustic utterance model, the confidence measure reflecting a probability that the constrained acoustic score correctly reflects how accurately the acoustic utterance model matches the input speech utterance;
  
  computer instructions that, for each of the acoustic utterance models, causes the computer to compute a combined score based on the constrained acoustic score and the confidence measure computed for the acoustic utterance model; and
  
  computer instructions for determining which of the linguistic expressions corresponding to the acoustic utterance models most closely matches the input speech utterance based on the combined scores, thereby recognizing the input speech utterance.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30)
- - 24. The storage medium of claim 23 wherein each acoustic utterance model is a hidden Markov model.
  - 25. The storage medium of claim 23, further including:
    - a plurality of context-independent acoustic utterance models;
      
      computer instructions for comparing the input speech utterance with each of the context-independent acoustic utterance models;
      
      computer instructions for determining which of the context-independent acoustic utterance models most closely matches the input speech utterance, thereby resulting in an unconstrained acoustic score for the most closely matching context-independent acoustic utterance model; and
      
      wherein the computer instructions that cause the computer to compute the confidence measure for each acoustic utterance model include computer instructions for computing the confidence measure based on a difference between the constrained acoustic score for the acoustic utterance model and the unconstrained acoustic score.
  - 26. The storage medium of claim 23, further including:
    - computer instructions that, for each acoustic utterance model, cause the computer to adjust the constrained acoustic score for the acoustic utterance model based on the confidence measure determined for the acoustic utterance model.
  - 27. The storage medium of claim 23 wherein the input speech utterance includes a plurality of speech frames and the computer instructions for comparing include:
    - computer instructions for comparing each speech frame of the input speech utterance to each of the acoustic utterance models, thereby resulting in an incremental constrained acoustic score for each speech frame and for each of the acoustic utterance models; and
      
      computer instructions that, for each of the acoustic utterance models, causes the computer to sum the incremental constrained acoustic scores computed for the acoustic utterance model and thereby obtain the constrained acoustic score for the acoustic utterance model.
  - 28. The storage medium of claim 23, further comprising:
    - a lexicon that includes a plurality of linguistic models;
      
      computer instructions that, for each of the acoustic utterance models, causes the computer to compute a linguistic score that reflects a probability that the acoustic utterance model corresponds to one of the linguistic models in the lexicon; and
      
      wherein the computer instructions that cause the computer to compute combined scores includes computer instructions that, for each of the acoustic utterance models, causes the computer to compute the combined score for the acoustic utterance model based on the linguistic score computed for the acoustic utterance model and a language weight computed for the acoustic utterance model.
  - 29. The storage medium of claim 23 wherein the computer instructions for causing the computer to compute a combined score for each of the acoustic utterance models include computer instructions that, for each of the acoustic utterance models, causes the computer to compute the combined score for the acoustic utterance model based on a language weight that is varied based on the confidence measure computed for the acoustic utterance model.
  - 30. The storage medium of claim 23, further comprising:
    - computer instructions that, for each of the acoustic utterance models;
      
      computes an insertion penalty based on the confidence measure computed for the acoustic utterance model; and
      
      adjusts the combined score using the insertion penalty computed for the acoustic utterance model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Huang, Xuedong, Beeferman, Douglas H., Alleva, Fileno A.
Primary Examiner(s)
TSANG, FAN S

Application Number

US08/452,141
Time in Patent Office

970 Days
Field of Search

395/2.6, 395/2.64, 395/2.65, 395/2.66, 395/2.55
US Class Current

704/256.4
CPC Class Codes

G10L 15/10 using distance or distortio...

G10L 15/142 Hidden Markov Models [HMMs]

System and method for speech recognition using dynamically adjusted confidence measure

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

127 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for speech recognition using dynamically adjusted confidence measure

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

127 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links