System and method for latency reduction for automatic speech recognition using partial multi-pass results
First Claim
Patent Images
1. A computer-implemented method for reducing latency in an automatic speech recognition (ASR) system, the method comprising:
- transcribing via a processor speech data using a first ASR pass, which operates at a first transcription rate near real time, to produce first transcription data;
transcribing said speech data using a second ASR pass slower than said first ASR pass to produce second transcription data, wherein said second transcription data is more accurate than said first transcription data;
transcribing said speech data using a third ASR pass based on the speech data and transcribed speech data from the second ASR pass to produce third transcription data;
displaying via a display a part of said first transcription data, which corresponds to a portion of said speech data, prior to transcription of said portion of said speech data by said second ASR pass;
updating said displayed part of said first transcription data with one or more of said second transcription data and said third transcription data upon completion of the transcription of said portion of said speech data by said second or third ASR pass; and
displaying with transcription data an indication of how many additional transcription passes are forthcoming that will update the transcription data.
5 Assignments
0 Petitions
Accused Products
Abstract
A system and method is provided for reducing latency for automatic speech recognition. In one embodiment, intermediate results produced by multiple search passes are used to update a display of transcribed text.
36 Citations
20 Claims
-
1. A computer-implemented method for reducing latency in an automatic speech recognition (ASR) system, the method comprising:
-
transcribing via a processor speech data using a first ASR pass, which operates at a first transcription rate near real time, to produce first transcription data; transcribing said speech data using a second ASR pass slower than said first ASR pass to produce second transcription data, wherein said second transcription data is more accurate than said first transcription data; transcribing said speech data using a third ASR pass based on the speech data and transcribed speech data from the second ASR pass to produce third transcription data; displaying via a display a part of said first transcription data, which corresponds to a portion of said speech data, prior to transcription of said portion of said speech data by said second ASR pass; updating said displayed part of said first transcription data with one or more of said second transcription data and said third transcription data upon completion of the transcription of said portion of said speech data by said second or third ASR pass; and displaying with transcription data an indication of how many additional transcription passes are forthcoming that will update the transcription data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A tangible computer-readable storage medium that stores a program for controlling a computer device to perform a method to reduce latency in an automatic speech recognition (ASR) system, the method comprising:
-
transcribing speech data via a processor in the computer device using a first ASR pass, which operates at a first transcription rate near real time, to produce first transcription data; transcribing said speech data using a second ASR pass slower than said first ASR pass to produce second transcription data, wherein said second transcription data is more accurate than said first transcription data; transcribing said speech data using a third ASR pass based on the speech data and transcribed speech data from the second ASR pass to produce third transcription data; displaying on a display a part of said first transcription data, which corresponds to a portion of said speech data, prior to transcription of said portion of said speech data by said second ASR pass; updating said displayed part of said first transcription data with one or more of said second transcription data and said third transcription data upon completion of the transcription of said portion of said speech data by said second or third ASR pass; and displaying with transcription data an indication of how many additional transcription passes are forthcoming that will update the transcription data.
-
-
11. An automatic speech recognition (ASR) system using a method of reducing latency, the method comprising:
-
transcribing, via a processor in the ASR system, speech data using a first ASR pass, which operates at a first transcription rate near real time, to produce first transcription data; transcribing said speech data using a second ASR pass slower than said first ASR pass to produce second transcription data, wherein said second transcription data is more accurate than said first transcription data; transcribing said speech data using a third ASR pass based on the speech data and transcribed speech data from the second ASR pass to produce third transcription data; displaying a part of said first transcription data, which corresponds to a portion of said speech data, prior to transcription of said portion of said speech data by said second ASR pass; updating said displayed part of said first transcription data with one or more of said second transcription data and said third transcription data upon completion of the transcription of said portion of said speech data by said second or third ASR pass; and displaying with transcription data an indication of how many additional transcription passes are forthcoming that will update the transcription data.
-
-
12. A computer-implemented method of reducing latency in the display of transcribed data generated by automatic speech recognition (ASR) process, the method comprising:
-
transcribing via a processor a segment of speech data using a plurality of normalized ASR passes, said plurality of normalized ASR passes having varying levels of accuracy and speed, wherein a second normalized ASR pass estimates a gender and a vocal tract of a speaker based on audio and a first transcription data obtained prior to the transcribing using the plurality of normalized ASR passes; incrementally updating a display of transcribed text as more accurate text is generated by one of said plurality of normalized ASR passes; displaying an indicator that communicates a relative accuracy of words in said displayed text; and displaying with transcription data an indication of how many additional transcription passes are forthcoming that will update the transcription data. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification