Speech recognition with parallel recognition tasks

US 8,364,481 B2
Filed: 07/02/2008
Issued: 01/29/2013
Est. Priority Date: 07/02/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving an audio signal;

sending, by a computer system, a plurality of signals, the plurality of signals configured to initiate execution of a speech recognition task by each speech recognition system (SRS) from a plurality of SRS'"'"'s, each speech recognition task including generation of (i) a recognition result specifying possible speech included in the audio signal and (ii) a confidence value indicating a probability that the recognition result generated by that SRS is correct;

in response to detecting that a portion of the plurality of speech recognition tasks have completed by at least one SRS, generating, by the computer system, (i) one or more recognition results for each of the completed speech recognition tasks and (ii) one or more confidence values for each of the one or more recognition results;

determining whether each confidence value from the one or more confidence values for the completed portion of the speech recognition tasks meets a threshold for the at least one SRS;

in response to determining that the one or more confidence values for the completed portion of the speech recognition tasks meet the threshold for the at least one SRS, sending a signal configured to cause at least one other remaining SRS from the plurality of SRS'"'"'s to abort a speech recognition task that has not completed and that has not generated a confidence value; and

outputting a final recognition result based at least in part on at least one of the generated one or more recognition results from the at least one SRS.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS'"'"'s). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS'"'"'s that have not completed generating a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.

257 Citations

21 Claims

1. A computer-implemented method comprising:
- receiving an audio signal;
  
  sending, by a computer system, a plurality of signals, the plurality of signals configured to initiate execution of a speech recognition task by each speech recognition system (SRS) from a plurality of SRS'"'"'s, each speech recognition task including generation of (i) a recognition result specifying possible speech included in the audio signal and (ii) a confidence value indicating a probability that the recognition result generated by that SRS is correct;
  
  in response to detecting that a portion of the plurality of speech recognition tasks have completed by at least one SRS, generating, by the computer system, (i) one or more recognition results for each of the completed speech recognition tasks and (ii) one or more confidence values for each of the one or more recognition results;
  
  determining whether each confidence value from the one or more confidence values for the completed portion of the speech recognition tasks meets a threshold for the at least one SRS;
  
  in response to determining that the one or more confidence values for the completed portion of the speech recognition tasks meet the threshold for the at least one SRS, sending a signal configured to cause at least one other remaining SRS from the plurality of SRS'"'"'s to abort a speech recognition task that has not completed and that has not generated a confidence value; and
  
  outputting a final recognition result based at least in part on at least one of the generated one or more recognition results from the at least one SRS.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, further comprising sending a signal configured to cause the at least one other remaining SRS from the plurality of SRS'"'"'s to abort a particular speech recognition task if the particular speech recognition task exceeds a particular threshold specifying a maximum period of time permitted for completion of the particular speech recognition task.
  - 3. The method of claim 1, further comprising sending a signal configured to cause the at least one other remaining SRS from the plurality of SRS'"'"'s to abort an uncompleted speech recognition task if a completed speech recognition task is associated with a low confidence value in a generated recognition result.
  - 4. The method of claim 1, wherein at least one of the plurality of SRS'"'"'s generates a plurality of recognition results.
  - 5. The method of claim 4, further comprising ranking a top N recognition results of the plurality of recognition results based on confidence values associated with the top N recognition results, wherein N represents any positive integer.
  - 6. The method of claim 5, further comprising determining a running average for confidence values associated with the top N recognition results based on confidence values generated by different SRS'"'"'s from the plurality of SRS'"'"'s.
  - 7. The method of claim 6, further comprising selecting the final recognition result from the top N results based on which recognition result is associated with a highest confidence value from among the confidence values.
  - 8. The method of claim 1, further comprising selecting the final recognition result based on which recognition result is associated with the greatest confidence value.
  - 9. The method of claim 1, further comprising selecting the final recognition result based on a number of SRS'"'"'s from the plurality of SRS'"'"'s that generated the final recognition result and at least one confidence value associated with the final recognition result.
  - 10. The method of claim 1, further comprising sending a signal configured to cause the at least one other remaining SRS from the plurality of SRS'"'"'s to abort each of the plurality of speech recognition tasks if a maximum time period is exceeded.
  - 11. The method of claim 10, further comprising prompting of a user to repeat an utterance.
  - 12. The method of claim 10, further comprising transmitting a request that a human operator receive the audio signal.
  - 13. The method of claim 1, further comprising selecting the final recognition result based on a distribution of the confidence values on a normalized scale.
  - 14. The method of claim 13, further comprising selecting a recognition result from the one or more recognition results as the final recognition result if the recognition result is associated with a high confidence value and at least a portion of the other recognition results from the one or more recognition results are clustered together within a relatively small range of low confidence values.
  - 15. The method of claim 1, further comprising weighting a particular recognition result from the one or more recognition results in the selection of the final recognition result based on a correlation between at least two SRS from the plurality of SRS'"'"'s that generated the particular recognition result.
  - 16. The method of claim 1, wherein at least a portion of the plurality of SRS'"'"'s comprise different language models or acoustic models.

17. A computer-implemented method comprising:
- receiving an audio signal;
  
  initiating speech recognition tasks by a plurality of speech recognition systems (SRS'"'"'s), each configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the recognition result;
  
  completing a portion of the speech recognition tasks by at least one SRS comprising generating one or more recognition results and one or more confidence values for the one or more recognition results;
  
  determining whether the one or more confidence values for the completed portion of the speech recognition tasks meets a confidence threshold for the at least one SRS;
  
  pausing a remaining portion of the speech recognition tasks for other remaining SRS'"'"'s that have not completed and that have not generated a confidence value if the one or more confidence values meets the confidence threshold for the at least one SRS; and
  
  outputting a final recognition result based on at least one of the generated one or more recognition results from the at least one SRS.
- View Dependent Claims (18, 21)
- - 18. The method of claim 17, further comprising resuming the paused remaining portion of speech recognition tasks if an indication is received that the final recognition result is incorrect.
  - 21. The method of claim 17, further comprising receiving an indication that the final recognition result is correct;
    - andin response to receiving the indication, aborting the remaining portion of the speech recognition tasks for the other remaining SRS'"'"'s that have not completed.

19. A system comprising:
- a plurality of speech recognition systems (SRS'"'"'s) that initiate execution of a plurality of speech recognition tasks to identify possible speech encoded in a received audio signal, each speech recognition task including generation of (i) a recognition result specifying possible speech included in the received audio signal and (ii) a confidence value indicating a probability that the recognition result generated by that SRS is correct;
  
  a recognition managing module to;
  
  receive, for a portion of the plurality of speech recognition tasks that have completed by at least one SRS, one or more recognition results and one or more confidence values associated with the one or more recognition results;
  
  determine whether each confidence value from the one or more confidence values for the completed portion of the speech recognition tasks meets a threshold for the at least one SRS; and
  
  in response to determining that the one or more confidence values for the completed portion of the speech recognition tasks meet the threshold for the at least one SRS, send a signal configured to cause at least one other remaining SRS from the plurality of SRS'"'"'s to abort a speech recognition task that has not completed and that has not generated a confidence value; and
  
  an interface for transmitting a final recognition result selected based at least in part on the one or more confidence values of the one or more recognition results from the at least one SRS.

20. A system comprising:
- a plurality of speech recognition systems (SRS'"'"'s) that initiate execution of a plurality of speech recognition tasks to identify possible speech encoded in a received audio signal, each speech recognition task including generation of (i) a recognition result specifying possible speech included in the received audio signal and (ii) a confidence value indicating a probability that the recognition result generated by that SRS is correct;
  
  means for;
  
  receiving, for a portion of the plurality of speech recognition tasks that have completed by at least one SRS, one or more recognition results and one or more confidence values associated with the one or more recognition results;
  
  determine whether each confidence value from the one or more confidence values for the completed portion of the speech recognition tasks meets a threshold for the at least one SRS; and
  
  in response to determining that the one or more confidence values for the completed portion of the speech recognition tasks meet the threshold for the at least one SRS, sending a signal configured to cause at least one other remaining SRS from the plurality of SRS'"'"'s to abort a speech recognition task that has not completed and that has not generated a confidence value; and
  
  an interface for transmitting a final recognition result selected based at least in part on the one or more confidence values of the one or more recognition results from the at least one SRS, wherein the final recognition result represents possible speech within the received audio signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Strope, Brian, Beaufays, Francoise, Siohan, Olivier
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Sirjani, Fariba

Application Number

US12/166,822
Publication Number

US 20100004930A1
Time in Patent Office

1,672 Days
Field of Search

704/240, 704/251, 704/E15.001, 704/231
US Class Current

704/231
CPC Class Codes

G10L 15/00   Speech recognition G10L17/0...

G10L 15/01   Assessment or evaluation of...

G10L 15/26   Speech to text systems G10L...

G10L 15/32   Multiple recognisers used i...

Speech recognition with parallel recognition tasks

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

257 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition with parallel recognition tasks

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

257 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links