Speech recognition accuracy with multi-confidence thresholds

US 7,657,433 B1
Filed: 09/08/2006
Issued: 02/02/2010
Est. Priority Date: 09/08/2006
Status: Active Grant

First Claim

Patent Images

1. A system for processing an input utterance, comprising:

a processor;

a speech recognition engine that causes the processor to provide a recognition result corresponding to the input utterance and a confidence score corresponding to a confidence level in the recognition result;

a threshold selection component that selects, based on the input utterance, a threshold value corresponding to the input utterance;

wherein, the selected threshold value is determined based on classification of the input utterance into a partition of multiple partitions in a set of training data;

wherein, each of the multiple partitions is associated with a threshold value;

wherein, the selected threshold value corresponding to the input utterance is the threshold value associated with the partition into which the input utterance is classified.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system uses multiple confidence thresholds to improve the quality of speech recognition results. The choice of which confidence threshold to use for a particular utterance may be based on one or more features relating to the utterance. In one particular implementation, the speech recognition system includes a speech recognition engine that provides speech recognition results and a confidence score for an input utterance. The system also includes a threshold selection component that determines, based on the received input utterance, a threshold value corresponding to the input utterance. The system further includes a threshold component that accepts the recognition results based on a comparison of the confidence score to the threshold value.

256 Citations

26 Claims

1. A system for processing an input utterance, comprising:
- a processor;
  
  a speech recognition engine that causes the processor to provide a recognition result corresponding to the input utterance and a confidence score corresponding to a confidence level in the recognition result;
  
  a threshold selection component that selects, based on the input utterance, a threshold value corresponding to the input utterance;
  
  wherein, the selected threshold value is determined based on classification of the input utterance into a partition of multiple partitions in a set of training data;
  
  wherein, each of the multiple partitions is associated with a threshold value;
  
  wherein, the selected threshold value corresponding to the input utterance is the threshold value associated with the partition into which the input utterance is classified.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1,further comprising, a threshold component configured to accept the recognition result based on a comparison of the confidence score to the selected threshold value;
    - wherein the threshold component accepts the recognition result when the confidence score is above the selected threshold value and rejects the recognition result when the confidence score is below the selected threshold value.
  - 3. The system of claim 1, wherein the feature is a continuous feature.
  - 4. The system of claim 3, wherein the continuous feature includes, one or more of, an utterance audio duration, a latency of recognition by the speech recognition engine, a word count in the recognition results, or a time of day.
  - 5. The system of claim 3, wherein the set of partitions for the continuous feature is defined by boundary values automatically determined during training of the speech recognition system using the training data.
  - 6. The system of claim 1, wherein the feature is a discrete feature.
  - 7. The system of claim 6, wherein the discrete feature includes, one or more of, gender, geographical area, or age group of callers.
  - 8. The system of claim 1, wherein the threshold selection component determines the threshold value from a plurality of predetermined possible threshold values.
  - 9. The system of claim 1, wherein the speech recognition system is an interactive voice response system.

10. A computer-readable medium having stored thereon a set of instructions which when executed causes a processor to perform a method of processing input information, the method, comprising:
- generating a recognition result corresponding to the input information;
  
  determining a confidence score corresponding to a confidence level in the accuracy of the speech recognition result;
  
  classifying the input information into one of a plurality of partitions defined from training data based on a feature relating to the input information or to the user;
  
  wherein, each of the multiple partitions is associated with each of a plurality of threshold values;
  
  determining a threshold value from the plurality of threshold values;
  
  wherein, the determined threshold value is one of the plurality of threshold values that is associated with the partition into which the input utterance is classified;
  
  determining whether to accept or reject the recognition result based on the determined threshold value and the confidence score.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The method of claim 10, wherein determining whether to accept or reject the recognition result includes:
    - accepting the recognition result when the confidence score is above the determined threshold value and rejecting the recognition result when the confidence score is below the determined threshold value.
  - 12. The method of claim 10, wherein boundary values that define the plurality of partitions are automatically determined based on the training data.
  - 13. The method of claim 10, wherein the plurality of threshold values are automatically determined based on the training data.
  - 14. The method of claim 10, wherein the feature is a continuous feature.
  - 15. The method of claim 14, wherein the continuous feature is one or more of an utterance audio duration, a latency of producing speech recognition results, or a time of day.
  - 16. The method of claim 10, wherein the feature is a discrete feature.
  - 17. The method of claim 16, wherein the discrete feature is one or more of gender, geographical area, or age group of users.
  - 18. The device of claim 10, wherein the input information includes an audible utterance from the user and the recognition result includes a speech recognition result of the utterance.

19. A computer-readable medium having stored thereon a set of instructions which when executed causes a processor to perform a method comprising:
- obtaining training data;
  
  defining partitions for the training data based on a feature associated with the training data; and
  
  determining a confidence threshold for each partition based on the feature,wherein, in run-time operation, input information is converted into recognition results and the input information detected as having the feature is classified into one of the partitions of the training data defined using the feature and accepted or rejected as valid recognition results based on a comparison of the confidence threshold corresponding to the one of the partitions defined using the feature.
- View Dependent Claims (20, 21, 22, 23)
- - 20. The method of claim 19, wherein automatically determining the confidence threshold for each partition is based on a determination of confidence threshold values that maximize, for the training data, a correct acceptance rate of the pattern recognition system while maintaining a false acceptance rate below a preset rate.
  - 21. The method of claim 19, wherein the feature is defined over a continuous range and the partitions are defined automatically as ranges within the continuous range.
  - 22. The method of claim 21, wherein the automatically defined ranges within the continuous range are based on a greedy iterative partitioning technique.
  - 23. The method of claim 19, wherein the training data includes speech utterances, speech recognition results, and a confidence score for each of the speech recognition results.

24. A computer-readable medium having stored thereon a set of instructions which when executed causes a processor to perform a method of generating a recognition result from input information, the method, comprising:
- defining multiple partitions for training data based on features associated with the training data;
  
  generating multiple threshold values each corresponding to each of the multiple partitions of the training data;
  
  wherein, the input information having a particular feature is classified into a partition of the multiple partitions that is defined using the particular feature;
  
  wherein, the recognition result is accepted or rejected based on comparison of the confidence threshold generated for the partition.

25. A system, comprising:
- means for, defining multiple partitions for training data based on features associated with the training data;
  
  means for, generating multiple threshold values each corresponding to each of the multiple partitions of the training data;
  
  means for, generating a recognition result from input information;
  
  means for, classifying the input information having a particular feature into a partition of the multiple partitions that is defined using the particular feature;
  
  wherein, the recognition result is accepted or rejected based on comparison of the confidence threshold generated for the partition of the multiple partitions.

26. A system, comprising:
- means for, receiving an input utterance, to provide recognition results corresponding to the input utterance, and to provide a confidence score corresponding to a confidence level in the recognition results;
  
  means for, determining, based on the input utterance, a threshold value corresponding to the input utterance;
  
  wherein, the threshold value is determined by classification of the input utterance into one of a set of partitions defined from training data;
  
  wherein, the classification of the received input utterance is performed based on a feature associated with the input utterance;
  
  means for, accepting the recognition result based on a comparison of the confidence score to the threshold value;
  
  wherein the threshold component accepts the recognition result when the confidence score is above the threshold value and rejects the recognition result when the confidence score is below the threshold value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Tellme Networks Incorporated ([24]7.ai, Inc.)
Inventors
Chang, Shuangyu
Primary Examiner(s)
Vo; Huyen X.

Application Number

US11/530,212
Time in Patent Office

1,243 Days
Field of Search

704/240, 704/231, 704/246, 704/236, 704/248, 704/238, 704/239, 704/270, 704/252, 704/255, 704/245
US Class Current

704/252
CPC Class Codes

G10L 15/08 Speech classification or se...

Speech recognition accuracy with multi-confidence thresholds

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

256 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition accuracy with multi-confidence thresholds

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

256 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links