Speech recognition using multiple recognizers (selectively) applied to the same input sample
First Claim
1. A computer-based method of speech recognition comprising:
- receiving a speech sample;
processing the speech sample with a first speech recognizor running on a first processor and at least a second speech recognizor running on a second processor, the speech recognizors having different performance characteristics;
wherein the processing by the first speech recognizor comprises real-time continuous speech recognition; and
recognizing speech content of the speech sample based on the processing by the speech recognizors.
8 Assignments
0 Petitions
Accused Products
Abstract
A speech sample is recognized with a computer system by processing the speech sample with at least two speech recognizers, each of which has a different performance characteristic. One speech recognizer may be a large-vocabulary, continuous speech recognizer optimized for real-time responsiveness and another speech recognizer may be an offline recognizer optimized for high accuracy. The speech content of the sample is recognized based on processing results from the speech recognizers. The speaker is provided with a real-time, yet potentially error-laden, text display corresponding to the speech sample while, subsequently, a human transcriptionist may use recognition results from multiple recognizers to produce an essentially error-free transcription. The performance characteristics of the recognizers may be based on style or subject matter, and the recognizers may operate serially or in parallel. Sets of candidates produced by the two recognizers may be combined by merging the scores to generate a combined set of candidates that corresponds to the union of the two sets. Offline processing may be performed based on input from a human operator, cost, processing times, confidence levels, or importance. Uncertainty for a candidate may occur when a difference between a score for a best scoring candidate and a score for a second best scoring candidate is less than a threshold value. A graphic user interface may allow the user to selectively transmit the speech sample to an other speech recognizer (or restrict such transmission), based on document type or availability of the second speech recognizer.
394 Citations
51 Claims
-
1. A computer-based method of speech recognition comprising:
-
receiving a speech sample; processing the speech sample with a first speech recognizor running on a first processor and at least a second speech recognizor running on a second processor, the speech recognizors having different performance characteristics; wherein the processing by the first speech recognizor comprises real-time continuous speech recognition; and recognizing speech content of the speech sample based on the processing by the speech recognizors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A computer-based method of speech recognition comprising:
-
receiving a speech sample; processing the speech sample with a first speech recognizor, the first speech recognizor providing interactive, real-time, continuous speech recognition; selectively performing offline, non-interactive, non-real-time processing of the speech sample using a second speech recognizor; and recognizing speech content of the speech sample based on the processing by the speech recognizors. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 45)
-
-
34. A computer-based method of speech recognition comprising:
-
receiving a speech sample; processing the speech sample with at least two speech recognizors, each of which is optimized for a different recognition characteristic; comparing results of the processing by the recognizors; and determining that a recognition uncertainty exists when a best result produced by a first recognizor differs from a best result produced by a second recognizor. - View Dependent Claims (35, 36)
-
-
37. A speech recognition system comprising:
-
an input device configured to receive a speech sample to be recognized; a first speech recognizor, coupled to the input device, for performing speech recognition on the speech sample; at least one other speech recognizor, coupled to the first speech recognizor, for performing offline, non-interactive, non-real-time speech recognition on the speech sample; and a processor configured to receive and process recognition results from the speech recognizors. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44)
-
-
46. A computer-based method of speech recognition comprising:
-
receiving a speech sample; processing the speech sample with a first speech recognizor running on a first processor; determining whether a predetermined criterion based on input from a user is satisfied; transmitting the speech sample to a second speech recognizor running on a second processor for additional processing only if the predetermined criterion is satisfied; and if the predetermined criterion is not satisfied, outputting results of the processing with the first speech recognizor without transmitting the speech sample to the second recognizor.
-
-
47. A computer-based method of speech recognition comprising:
-
receiving a speech sample; processing the speech sample with a first speech recognizor running on a first processor; determining whether a predetermined criterion based on a document type associated with the speech sample is satisfied; transmitting the speech sample to a second speech recognizor running on a second processor for additional processing only if the predetermined criterion is satisfied; and if the predetermined criterion is not satisfied, outputting results of the processing with the first speech recognizor without transmitting the speech sample to the second recognizor.
-
-
48. A computer-based method of speech recognition comprising:
-
receiving a speech sample; processing the speech sample with a first speech recognizor running on a first processor; determining whether a predetermined criterion based on a cost associated with the second speech recognizor is satisfied; transmitting the speech sample to a second speech recognizor running on a second processor for additional processing only if the predetermined criterion is satisfied; and if the predetermined criterion is not satisfied, outputting results of the processing with the first speech recognizor without transmitting the speech sample to the second recognizor.
-
-
49. A computer-based method of speech recognition comprising:
-
receiving a speech sample; processing the speech sample with a first speech recognizor running on a first processor and at least a second speech recognizor running on a second processor, the speech recognizors having different performance characteristics; and recognizing speech content of the speech sample based on the processing by the speech recognizors; wherein the processing comprises; the first speech recognizor identifying a first set of candidates that likely match the speech sample and calculating a corresponding first set of scores, the scores based on a likelihood of matching the speech sample; and the second speech recognizor identifying a second set of candidates that likely match the speech sample and calculating a corresponding second set of scores, the scores based on a likelihood of matching the speech sample.
-
-
50. A computer-based method of speech recognition comprising:
-
receiving a speech sample; processing the speech sample with a first speech recognizor running on a first processor and at least a second speech recognizor running on a second processor, the speech recognizors having different performance characteristics; recognizing speech content of the speech sample based on the processing by the speech recognizors; determining whether a recognition uncertainty exists based on the processing by the first and second speech recognizors; and identifying a recognition uncertainty to a transcriptionist.
-
-
51. A computer-based method of speech recognition comprising:
-
receiving a speech sample; processing the speech sample with a first speech recognizor running on a first processor and at least a second speech recognizor running on a second processor, the speech recognizors having different performance characteristics; and recognizing speech content of the speech sample based on the processing by the speech recognizors; and determining whether a recognition uncertainty exists based on the processing by the first and second speech recognizors, wherein the recognition uncertainty is determined to exist if a recognition result from the first speech recognizor disagrees with a recognition result from the second speech recognizor.
-
Specification