Confidence estimation based on frequency

US 10,152,298 B1
Filed: 06/29/2015
Issued: 12/11/2018
Est. Priority Date: 06/29/2015
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of determining a confidence score for a potential speech recognition result, the method comprising:

determining a first group of words, wherein each word of the first group of words was predicted a first number of times during automatic speech recognition (ASR) testing;

determining a first total number of predictions for the first group, the first total number of predictions corresponding to the first number multiplied by a number of words in the first group of words;

determining a first total number of correct results for the first group, the first total number of correct results corresponding to a cumulative number of times each word of the first group was correctly predicted during the ASR testing;

determining a first group prior probability for the first group by dividing the first total number of correct results by the first total number of predictions;

determining a first prior probability for a first word using the first group prior probability;

receiving audio data corresponding to a first utterance; and

performing first ASR processing on the audio data to generate a hypothesis, the hypothesis including the first word, wherein performing first ASR processing further comprises determining a confidence score for the hypothesis using the first prior probability for the first word.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Devices, systems and methods are disclosed for estimating a prior probability for speech recognition by taking into account a number of observations of a particular word and a prior probability for a group of words having a similar number of observations. For example, a prior probability may be determined by combining a number of correct results and a number of observations for a group of words and calculating a prior probability of the entire group. Further, a prior probability may be determined for a word that was not previously observed by determining a prior probability for a group of words that have been observed once. The prior probability for a particular word may be determined differently as the number of observations increases and may transition from the group prior probability to an individual prior probability when the number of observations exceeds a threshold.

103 Citations

View as Search Results

24 Claims

1. A computer-implemented method of determining a confidence score for a potential speech recognition result, the method comprising:
- determining a first group of words, wherein each word of the first group of words was predicted a first number of times during automatic speech recognition (ASR) testing;
  
  determining a first total number of predictions for the first group, the first total number of predictions corresponding to the first number multiplied by a number of words in the first group of words;
  
  determining a first total number of correct results for the first group, the first total number of correct results corresponding to a cumulative number of times each word of the first group was correctly predicted during the ASR testing;
  
  determining a first group prior probability for the first group by dividing the first total number of correct results by the first total number of predictions;
  
  determining a first prior probability for a first word using the first group prior probability;
  
  receiving audio data corresponding to a first utterance; and
  
  performing first ASR processing on the audio data to generate a hypothesis, the hypothesis including the first word, wherein performing first ASR processing further comprises determining a confidence score for the hypothesis using the first prior probability for the first word.
- View Dependent Claims (2, 3, 4)
- - 2. The computer-implemented method of claim 1, further comprising:
    - determining the first number of times is equal to one;
      
      determining that the first word was not predicted during the ASR testing; and
      
      determining to use the first group prior probability for the first word.
  - 3. The computer-implemented method of claim 1, further comprising:
    - performing second ASR processing on a second utterance, the second ASR processing predicting the first word;
      
      determining a total number of first predictions for the first word by adding a number of times the first word was predicted during the ASR testing and a number of times the first word was predicted during the first ASR processing;
      
      determining that the total number of first predictions for the first word exceeds a threshold;
      
      determining a first number of correct results for the first word, the first number of correct results corresponding to a number of times the first word was correctly predicted during the ASR testing and the first ASR processing; and
      
      determining a second prior probability for the first word by dividing the first number of correct results by the total number of first predictions,wherein determining the confidence score further comprises determining the confidence score for the hypothesis using the second prior probability for the first word.
  - 4. The computer-implemented method of claim 1, further comprising:
    - receiving a second group of words, wherein the second group of words are grouped based on a language frequency;
      
      determining a second total number of predictions for the second group, the second total number of predictions corresponding to a cumulative number of times each of the second group of words was predicted during the ASR testing;
      
      determining a second total number of correct results for the second group, the second total number of correct results corresponding to a cumulative number of times each word of the second group was correctly predicted during the ASR testing;
      
      determining a second group prior probability for the second group by dividing the second total number of correct results by the second total number of predictions;
      
      determining that a second number of predictions in the ASR testing for a second word of the second group of words is below a threshold;
      
      determining a second prior probability for the second word using the second group prior probability;
      
      receiving second audio data corresponding to a second utterance; and
      
      performing second ASR processing on the second audio data to generate a second hypothesis, the second hypothesis including the second word, wherein performing second ASR processing further comprises determining a confidence score for the hypothesis using the second prior probability for the second word.

5. A computer-implemented method, the method comprising:
- receiving audio data corresponding to a first utterance;
  
  performing first automatic speech recognition (ASR) processing on the audio data using a language model to generate a hypothesis, the hypothesis including a first word, wherein;
  
  the language model was trained using a first prior probability for the first word,the first prior probability based on a group prior probability for a plurality of words,the group prior probability based on a total number of predictions of the plurality of words during ASR testing and a total number of correct predictions of the plurality of words during the ASR testing,at least one of the plurality of words is not included in the hypothesis, andperforming the first ASR processing on the audio data comprises;
  
  identifying the first word, anddetermining a confidence score for the hypothesis using the language model; and
  
  generating output data using the hypothesis.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 21, 22, 23)
- - 6. The computer-implemented method of claim 5, further comprising:
    - identifying a first number of predictions of the first word during ASR testing;
      
      determining that the first number is below a threshold; and
      
      determining the first prior probability, wherein determining the first prior probability further comprises using the group prior probability for the plurality of words as the first prior probability.
  - 7. The computer-implemented method of claim 5, further comprising:
    - determining the plurality of words;
      
      determining the total number of predictions of the plurality of words during ASR testing;
      
      determining the total number of correct predictions of the plurality of words during the ASR testing; and
      
      determining the group prior probability for the plurality of words by dividing the total number of correct predictions by the total number of predictions.
  - 8. The computer-implemented method of claim 5,wherein the plurality of words are grouped based on a language frequency.
  - 9. The computer-implemented method of claim 5, further comprising:
    - determining that a first number of predictions of the first word during ASR testing is equal to zero, indicating that the first word was not predicted during the ASR testing;
      
      determining the plurality of words, wherein each of the plurality of words was predicted during the ASR testing more than once;
      
      determining the total number of predictions of the plurality of words during ASR testing;
      
      determining the total number of correct predictions of the plurality of words during the ASR testing;
      
      determining the group prior probability for the plurality of words using the total number of predictions and the total number of correct predictions; and
      
      determining the first prior probability, wherein determining the first prior probability further comprises determining the group prior probability for the plurality of words as the first prior probability.
  - 10. The computer-implemented method of claim 5, further comprising:
    - receiving feedback indicating whether the first word was correctly predicted during the first ASR processing;
      
      determining a total number of first predictions for the first word by adding a first number of predictions during ASR testing and a second number of times the first word was predicted during the first ASR processing;
      
      determining a third number of correct results for the first word, the third number of correct results corresponding to a number of times the first word was correctly predicted during the ASR testing and the first ASR processing; and
      
      determining an individual prior probability for the first word by dividing the third number of correct results and the total number of first predictions.
  - 11. The computer-implemented method of claim 10, further comprising:
    - determining that the total number of first predictions exceeds a threshold; and
      
      determining the first prior probability for the first word using the individual prior probability.
  - 12. The computer-implemented method of claim 10, further comprising:
    - determining that the total number of first predictions is below a threshold; and
      
      determining the first prior probability, wherein determining the first prior probability further comprises determining a weighted average of the individual prior probability and the group prior probability.
  - 21. The computer-implemented method of claim 5, wherein the plurality of words is selected based on a number of times that the first word was predicted during ASR testing.
  - 22. The computer-implemented method of claim 5, whereinidentifying the first word comprises using at least the language model to determine that the first word potentially corresponds to the audio data.
  - 23. The computer-implemented method of claim 5, wherein generating the output data comprises causing a command represented by the hypothesis to be executed.

13. A system, comprising:
- at least one processor;
  
  memory including instructions operable to be executed by the at least one processor to cause the system to;
  
  receive audio data corresponding to a first utterance;
  
  perform first automatic speech recognition (ASR) processing on the audio data using a language model to generate a hypothesis, the hypothesis including a first word, wherein;
  
  the language model was trained using a first prior probability for the first word, the first prior probability based on a group prior probability for a plurality of words, the group prior probability based on a total number of predictions of the plurality of words during ASR testing and a total number of correct predictions of the plurality of words during the ASR testing,at least one of the plurality of words is not included in the hypothesis, andperforming the first ASR processing on the audio data comprises;
  
  identifying the first word, anddetermining a confidence score for the hypothesis using the language model; and
  
  generate output data using the hypothesis.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The system of claim 13, wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to:
    - identify a first number of predictions of the first word during ASR testing;
      
      determine that the first number is below a threshold; and
      
      determine the first prior probability, wherein determining the first prior probability further comprises using the group prior probability for the plurality of words as the first prior probability.
  - 15. The system of claim 13, wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to:
    - determine the plurality of words;
      
      determine the total number of predictions of the plurality of words during ASR testing;
      
      determine the total number of correct predictions of the plurality of words during the ASR testing; and
      
      determine the group prior probability for the plurality of words by dividing the total number of correct predictions by the total number of predictions.
  - 16. The system of claim 13,wherein the plurality of words are grouped based on a language frequency.
  - 17. The system of claim 13, wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to:
    - determine that a first number of predictions of the first word during ASR testing is equal to zero, indicating that the first word was not predicted during the ASR testing;
      
      determine the plurality of words, wherein each of the plurality of words was predicted during the ASR testing more than once;
      
      determine the total number of predictions of the plurality of words during ASR testing;
      
      determine the total number of correct predictions of the plurality of words during the ASR testing;
      
      determine the group prior probability for the plurality of words using the total number of predictions and the total number of correct predictions; and
      
      determine the first prior probability, wherein determining the first prior probability further comprises determining the group prior probability for the plurality of words as the first prior probability.
  - 18. The system of claim 13, the set of actions further comprising:
    - receiving feedback indicating whether the first word was correctly predicted during the first ASR processing;
      
      determining a total number of first predictions for the first word by adding a first number of predictions during ASR testing and a second number of times the first word was predicted during the first ASR processing;
      
      determining a first number of correct results for the first word, the first number of correct results corresponding to a number of times the first word was correctly predicted during the ASR testing and the first ASR processing; and
      
      determining an individual prior probability for the first word by dividing the first number of correct results and the total number of first predictions.
  - 19. The system of claim 18, wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to:
    - determine that the total number of first predictions exceeds a threshold; and
      
      determine the first prior probability for the first word using the individual prior probability.
  - 20. The system of claim 18, wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to:
    - determine that the total number of first predictions is below a threshold; and
      
      determine the first prior probability, wherein determining the first prior probability further comprises determining a weighted average of the individual prior probability and the group prior probability.

24. A computer-implemented method, the method comprising:
- receiving audio data corresponding to a first utterance;
  
  performing first automatic speech recognition (ASR) processing on the audio data using a language model to generate a hypothesis, the hypothesis including a first word, wherein;
  
  the language model was trained using a first prior probability for the first word,the first prior probability determined by;
  
  determining that a first number of predictions of the first word during ASR testing is equal to zero, indicating that the first word was not predicted during the ASR testing,determining a plurality of words, wherein each of the plurality of words was predicted during the ASR testing more than once,determining a total number of predictions of the plurality of words during ASR testing,determining a total number of correct predictions of the plurality of words during the ASR testing,determining a group prior probability for the plurality of words using the total number of predictions and the total number of correct predictions, andsetting the group prior probability as the first prior probability,at least one of the plurality of words is not included in the hypothesis, andperforming first ASR processing on the audio data comprises;
  
  identifying the first word, anddetermining a confidence score for the hypothesis using the language model; and
  
  generating output data using the hypothesis.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Salvador, Stan Weidner
Primary Examiner(s)
Thomas-Homescu, Anne L

Application Number

US14/754,181
Time in Patent Office

1,261 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/14   using statistical models, e...

G10L 15/183   using context dependencies,...

Confidence estimation based on frequency

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

103 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Confidence estimation based on frequency

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

103 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links