Speech recognition using multiple recognizers (selectively) applied to the same input sample

US 6,122,613 A
Filed: 01/30/1997
Issued: 09/19/2000
Est. Priority Date: 01/30/1997
Status: Expired due to Term

First Claim

Patent Images

1. A computer-based method of speech recognition comprising:

receiving a speech sample;

processing the speech sample with a first speech recognizor running on a first processor and at least a second speech recognizor running on a second processor, the speech recognizors having different performance characteristics;

wherein the processing by the first speech recognizor comprises real-time continuous speech recognition; and

recognizing speech content of the speech sample based on the processing by the speech recognizors.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech sample is recognized with a computer system by processing the speech sample with at least two speech recognizers, each of which has a different performance characteristic. One speech recognizer may be a large-vocabulary, continuous speech recognizer optimized for real-time responsiveness and another speech recognizer may be an offline recognizer optimized for high accuracy. The speech content of the sample is recognized based on processing results from the speech recognizers. The speaker is provided with a real-time, yet potentially error-laden, text display corresponding to the speech sample while, subsequently, a human transcriptionist may use recognition results from multiple recognizers to produce an essentially error-free transcription. The performance characteristics of the recognizers may be based on style or subject matter, and the recognizers may operate serially or in parallel. Sets of candidates produced by the two recognizers may be combined by merging the scores to generate a combined set of candidates that corresponds to the union of the two sets. Offline processing may be performed based on input from a human operator, cost, processing times, confidence levels, or importance. Uncertainty for a candidate may occur when a difference between a score for a best scoring candidate and a score for a second best scoring candidate is less than a threshold value. A graphic user interface may allow the user to selectively transmit the speech sample to an other speech recognizer (or restrict such transmission), based on document type or availability of the second speech recognizer.

394 Citations

51 Claims

1. A computer-based method of speech recognition comprising:
- receiving a speech sample;
  
  processing the speech sample with a first speech recognizor running on a first processor and at least a second speech recognizor running on a second processor, the speech recognizors having different performance characteristics;
  
  wherein the processing by the first speech recognizor comprises real-time continuous speech recognition; and
  
  recognizing speech content of the speech sample based on the processing by the speech recognizors.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 2. The method of claim 1 in which the first and second speech recognizors are optimized for their respective performance characteristics.
  - 3. The method of claim 2 in which the optimized characteristic for the first speech recognizor comprises real-time responsiveness.
  - 4. The method of claim 3 in which the optimized characteristic for the second speech recognizor comprises recognition accuracy.
  - 5. The method of claim 1 in which the performance characteristics are based on style.
  - 6. The method of claim 1 in which the performance characteristics are based on subject matter.
  - 7. The method of claim 1 in which the processing by the first speech recognizor comprises real-time processing.
  - 8. The method of claim 7 in which the processing by the second speech recognizor comprises offline processing.
  - 9. The method of claim 1 in which the processing comprises performing a first recognition analysis with the first speech recognizor and a second recognition analysis with the second speech recognizor in parallel.
  - 10. The method of claim 1 in which the processing comprises performing a first recognition analysis with the first speech recognizor and a second recognition analysis with the second speech recognizor serially.
  - 11. The method of claim 1 in which the processing by the first speech recognizor further comprises providing a real-time text display corresponding to the speech sample.
  - 12. The method of claim 1 in which the processing by the second speech recognizor comprises performing large vocabulary continuous speech recognition on the speech sample.
  - 13. The method of claim 1 in which the processing comprises:
    - the first speech recognizor identifying a first set of candidates that likely match the speech sample and calculating a corresponding first set of scores, the scores based on a likelihood of matching the speech sample; and
      
      the second speech recognizor identifying a second set of candidates that likely match the speech sample and calculating a corresponding second set of scores, the scores based on a likelihood of matching the speech sample.
  - 14. The method of claim 13 in which the processing further comprises:
    - combining the first set of candidates and the second set of candidates to generate a combined set of candidates; and
      
      merging the first set of scores and the second set of scores to generate a combined set of scores.
  - 15. The method of claim 14 in which the combining comprises finding the union of the first and second sets of candidates.
  - 16. The method of claim 14 in which the merging comprises calculating a weighted average from corresponding pairs of scores in the first and second sets of scores.
  - 17. The method of claim 14 further comprising presenting the combined set of candidates to a transcriptionist in an order of priority determined by the candidates'"'"' respective combined scores.
  - 18. The method of claim 13 further comprising determining, for each set of candidates, that a recognition uncertainty exists if a difference between a score for a best scoring candidate and a score for a second best scoring candidate is less than a threshold value.
  - 19. The method of claim 1 further comprising presenting results of the processing to a transcriptionist at a second computer.
  - 20. The method of claim 1 in which the recognizing comprises receiving feedback from a transcriptionist regarding whether the speech content was correctly recognized.
  - 21. The method of claim 20 further comprising adapting speech models used by the first and second speech recognizors based on the feedback received from the transcriptionist.
  - 22. The method of claim 1 further comprising determining whether a recognition uncertainty exists based on the processing by the first and second speech recognizors.
  - 23. The method of claim 22 further comprising identifying a recognition uncertainty to a transcriptionist.
  - 24. The method of claim 22 in which a recognition uncertainty is determined to exist if a recognition result from the first speech recognizor disagrees with a recognition result from the second speech recognizor.

25. A computer-based method of speech recognition comprising:
- receiving a speech sample;
  
  processing the speech sample with a first speech recognizor, the first speech recognizor providing interactive, real-time, continuous speech recognition;
  
  selectively performing offline, non-interactive, non-real-time processing of the speech sample using a second speech recognizor; and
  
  recognizing speech content of the speech sample based on the processing by the speech recognizors.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 45)
- - 26. The method of claim 25 in which the selective performing comprises deciding whether to perform offline processing based on input from a human operator.
  - 27. The method of claim 25 in which the selective performing comprises deciding whether to perform offline processing based on predetermined criteria.
  - 28. The method of claim 27 in which the predetermined criteria comprise costs associated with offline processing.
  - 29. The method of claim 27 in which the predetermined criteria comprise processing times associated with offline processing.
  - 30. The method of claim 27 in which the predetermined criteria comprise a confidence level of recognition results from the first speech recognizor.
  - 31. The method of claim 27 in which the predetermined criteria comprise an importance level associated with the speech sample.
  - 32. The method of claim 25 in which the offline processing includes the processing by the second speech recognizor.
  - 33. The method of claim 25 in which the offline processing comprises recognition error correction by a transcriptionist.
  - 45. The speech recognition system of claim 30 further comprising a transcription station, coupled to the processor, for use by a transcriptionist to correct recognition errors.

34. A computer-based method of speech recognition comprising:
- receiving a speech sample;
  
  processing the speech sample with at least two speech recognizors, each of which is optimized for a different recognition characteristic;
  
  comparing results of the processing by the recognizors; and
  
  determining that a recognition uncertainty exists when a best result produced by a first recognizor differs from a best result produced by a second recognizor.
- View Dependent Claims (35, 36)
- - 35. The method of claim 34 further comprising identifying a portion of the speech sample as corresponding to the recognition uncertainty.
  - 36. The method of claim 34 further comprising presenting an indicator of the recognition uncertainty to a transcriptionist.

37. A speech recognition system comprising:
- an input device configured to receive a speech sample to be recognized;
  
  a first speech recognizor, coupled to the input device, for performing speech recognition on the speech sample;
  
  at least one other speech recognizor, coupled to the first speech recognizor, for performing offline, non-interactive, non-real-time speech recognition on the speech sample; and
  
  a processor configured to receive and process recognition results from the speech recognizors.
- View Dependent Claims (38, 39, 40, 41, 42, 43, 44)
- - 38. The speech recognition system of claim 37 in which the first and second characteristics comprise complementary properties.
  - 39. The speech recognition system of claim 37 in which the first characteristic comprises real-time responsiveness.
  - 40. The speech recognition system of claim 39 in which the second characteristic comprises high recognition accuracy.
  - 41. The speech recognition system of claim 37 further comprising a computer system for controlling the first recognizor, the computer system comprising a graphic user interface for interacting with a user.
  - 42. The speech recognition system of claim 41 in which the graphic user interface allows the user to revise a recognition result from the first speech recognizor.
  - 43. The speech recognition system of claim 41 in which the graphic user interface allows the user to selectively restrict the speech sample from being transmitted to the at least one other speech recognizor.
  - 44. The speech recognition system of claim 41 further comprising a transcription station, and in which the graphic user interface allows the user to selectively transmit the speech sample to the transcription station.

46. A computer-based method of speech recognition comprising:
- receiving a speech sample;
  
  processing the speech sample with a first speech recognizor running on a first processor;
  
  determining whether a predetermined criterion based on input from a user is satisfied;
  
  transmitting the speech sample to a second speech recognizor running on a second processor for additional processing only if the predetermined criterion is satisfied; and
  
  if the predetermined criterion is not satisfied, outputting results of the processing with the first speech recognizor without transmitting the speech sample to the second recognizor.

47. A computer-based method of speech recognition comprising:
- receiving a speech sample;
  
  processing the speech sample with a first speech recognizor running on a first processor;
  
  determining whether a predetermined criterion based on a document type associated with the speech sample is satisfied;
  
  transmitting the speech sample to a second speech recognizor running on a second processor for additional processing only if the predetermined criterion is satisfied; and
  
  if the predetermined criterion is not satisfied, outputting results of the processing with the first speech recognizor without transmitting the speech sample to the second recognizor.

48. A computer-based method of speech recognition comprising:
- receiving a speech sample;
  
  processing the speech sample with a first speech recognizor running on a first processor;
  
  determining whether a predetermined criterion based on a cost associated with the second speech recognizor is satisfied;
  
  transmitting the speech sample to a second speech recognizor running on a second processor for additional processing only if the predetermined criterion is satisfied; and
  
  if the predetermined criterion is not satisfied, outputting results of the processing with the first speech recognizor without transmitting the speech sample to the second recognizor.

49. A computer-based method of speech recognition comprising:
- receiving a speech sample;
  
  processing the speech sample with a first speech recognizor running on a first processor and at least a second speech recognizor running on a second processor, the speech recognizors having different performance characteristics; and
  
  recognizing speech content of the speech sample based on the processing by the speech recognizors;
  
  wherein the processing comprises;
  
  the first speech recognizor identifying a first set of candidates that likely match the speech sample and calculating a corresponding first set of scores, the scores based on a likelihood of matching the speech sample; and
  
  the second speech recognizor identifying a second set of candidates that likely match the speech sample and calculating a corresponding second set of scores, the scores based on a likelihood of matching the speech sample.

50. A computer-based method of speech recognition comprising:
- receiving a speech sample;
  
  processing the speech sample with a first speech recognizor running on a first processor and at least a second speech recognizor running on a second processor, the speech recognizors having different performance characteristics;
  
  recognizing speech content of the speech sample based on the processing by the speech recognizors;
  
  determining whether a recognition uncertainty exists based on the processing by the first and second speech recognizors; and
  
  identifying a recognition uncertainty to a transcriptionist.

51. A computer-based method of speech recognition comprising:
- receiving a speech sample;
  
  processing the speech sample with a first speech recognizor running on a first processor and at least a second speech recognizor running on a second processor, the speech recognizors having different performance characteristics; and
  
  recognizing speech content of the speech sample based on the processing by the speech recognizors; and
  
  determining whether a recognition uncertainty exists based on the processing by the first and second speech recognizors,wherein the recognition uncertainty is determined to exist if a recognition result from the first speech recognizor disagrees with a recognition result from the second speech recognizor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Dragon Systems, Inc. (Microsoft Corporation)
Inventors
Baker, James K.
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Storm, Donald L.

Application Number

US08/791,680
Time in Patent Office

1,328 Days
Field of Search

704/200, 704/235, 704/252, 704/231, 704/236, 704/251, 704/270, 704/277
US Class Current

704/235
CPC Class Codes

G10L 15/22 Procedures used during a sp...

G10L 15/32 Multiple recognisers used i...

Speech recognition using multiple recognizers (selectively) applied to the same input sample

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

394 Citations

51 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition using multiple recognizers (selectively) applied to the same input sample

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

394 Citations

51 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links