System and Method for Latency Reduction for Automatic Speech Recognition Using Partial Multi-Pass Results
First Claim
Patent Images
1. A method comprising:
- transcribing, via a processor, speech data using a first automatic speech recognition pass, which operates at a first transcription rate near real time, to produce a first transcription data and a first word graph;
adapting a model for a second automatic speech recognition pass that uses the first word graph, wherein the second automatic speech recognition pass produces a second transcription data and a second word graph, and wherein the second automatic speech recognition pass is slower than the first automatic speech recognition pass;
displaying at least part of the first transcription data corresponding to a portion of the speech data, prior to transcription of the portion of the speech data by the second automatic speech recognition pass, to yield a displayed part; and
updating the displayed part with at least the second transcription data.
5 Assignments
0 Petitions
Accused Products
Abstract
A system and method is provided for reducing latency for automatic speech recognition. In one embodiment, intermediate results produced by multiple search passes are used to update a display of transcribed text.
-
Citations
20 Claims
-
1. A method comprising:
-
transcribing, via a processor, speech data using a first automatic speech recognition pass, which operates at a first transcription rate near real time, to produce a first transcription data and a first word graph; adapting a model for a second automatic speech recognition pass that uses the first word graph, wherein the second automatic speech recognition pass produces a second transcription data and a second word graph, and wherein the second automatic speech recognition pass is slower than the first automatic speech recognition pass; displaying at least part of the first transcription data corresponding to a portion of the speech data, prior to transcription of the portion of the speech data by the second automatic speech recognition pass, to yield a displayed part; and updating the displayed part with at least the second transcription data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
a processor; a memory storing instructions for controlling the processor to perform steps comprising; transcribing speech data using a first automatic speech recognition pass, which operates at a first transcription rate near real time, to produce a first transcription data and a first word graph; adapting a model for a second automatic speech recognition pass that uses the first word graph, wherein the second automatic speech recognition pass produces a second transcription data and a second word graph, and wherein the second automatic speech recognition pass is slower than the first automatic speech recognition pass; displaying at least part of the first transcription data corresponding to a portion of the speech data, prior to transcription of the portion of the speech data by the second automatic speech recognition pass, to yield a displayed part; and updating the displayed part with at least the second transcription data. - View Dependent Claims (10, 11, 12, 13, 14, 16)
-
-
15. The system of claim 15, wherein the low confidence portion of the displayed part is displayed in a darker shade as compared to the high confidence portion of the displayed data.
-
17. A non-transitory computer-readable storage medium storing instructions which, when executed by a computing device, cause the computing device to perform steps comprising:
-
transcribing speech data using a first automatic speech recognition pass, which operates at a first transcription rate near real time, to produce a first transcription data and a first word graph; adapting a model for a second automatic speech recognition pass that uses the first word graph, wherein the second automatic speech recognition pass produces a second transcription data and a second word graph, and wherein the second automatic speech recognition pass is slower than the first automatic speech recognition pass; displaying at least part of the first transcription data corresponding to a portion of the speech data, prior to transcription of the portion of the speech data by the second automatic speech recognition pass, to yield a displayed part; and updating the displayed part with at least the second transcription data. - View Dependent Claims (18, 19, 20)
-
Specification