System and method for latency reduction for automatic speech recognition using partial multi-pass results

US 7,729,912 B1
Filed: 12/23/2003
Issued: 06/01/2010
Est. Priority Date: 12/23/2003
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for reducing latency in an automatic speech recognition (ASR) system, the method comprising:

transcribing via a processor speech data using a first ASR pass, which operates at a first transcription rate near real time, to produce first transcription data;

transcribing said speech data using a second ASR pass slower than said first ASR pass to produce second transcription data, wherein said second transcription data is more accurate than said first transcription data;

transcribing said speech data using a third ASR pass based on the speech data and transcribed speech data from the second ASR pass to produce third transcription data;

displaying via a display a part of said first transcription data, which corresponds to a portion of said speech data, prior to transcription of said portion of said speech data by said second ASR pass;

updating said displayed part of said first transcription data with one or more of said second transcription data and said third transcription data upon completion of the transcription of said portion of said speech data by said second or third ASR pass; and

displaying with transcription data an indication of how many additional transcription passes are forthcoming that will update the transcription data.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method is provided for reducing latency for automatic speech recognition. In one embodiment, intermediate results produced by multiple search passes are used to update a display of transcribed text.

36 Citations

View as Search Results

20 Claims

1. A computer-implemented method for reducing latency in an automatic speech recognition (ASR) system, the method comprising:
- transcribing via a processor speech data using a first ASR pass, which operates at a first transcription rate near real time, to produce first transcription data;
  
  transcribing said speech data using a second ASR pass slower than said first ASR pass to produce second transcription data, wherein said second transcription data is more accurate than said first transcription data;
  
  transcribing said speech data using a third ASR pass based on the speech data and transcribed speech data from the second ASR pass to produce third transcription data;
  
  displaying via a display a part of said first transcription data, which corresponds to a portion of said speech data, prior to transcription of said portion of said speech data by said second ASR pass;
  
  updating said displayed part of said first transcription data with one or more of said second transcription data and said third transcription data upon completion of the transcription of said portion of said speech data by said second or third ASR pass; and
  
  displaying with transcription data an indication of how many additional transcription passes are forthcoming that will update the transcription data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein said first non-normalized ASR pass operates at real time.
  - 3. The method of claim 1, wherein said first non-normalized ASR pass operates at greater than real time.
  - 4. The method of claim 1, wherein said displaying comprises displaying an indicator that signifies that more accurate transcription data is being generated.
  - 5. The method of claim 1, wherein said displayed data is in a different color or shade as compared to said updated displayed data.
  - 6. The method of claim 1, wherein portions of displayed data having a relatively lower confidence score are distinctly displayed as compared to displayed data having a relatively higher confidence score.
  - 7. The method of claim 6, wherein the displayed data having a relatively lower confidence score is displayed in a darker shade as compared to displayed data having a relatively higher confidence score.
  - 8. The method of claim 6, wherein said portions of displayed data having a relatively lower confidence score enable a user to listen to the corresponding portions of speech data.
  - 9. The method of claim 1, further comprising transcribing said speech data using one or more normalized ASR passes beyond said second normalized ASR pass.

10. A tangible computer-readable storage medium that stores a program for controlling a computer device to perform a method to reduce latency in an automatic speech recognition (ASR) system, the method comprising:
- transcribing speech data via a processor in the computer device using a first ASR pass, which operates at a first transcription rate near real time, to produce first transcription data;
  
  transcribing said speech data using a second ASR pass slower than said first ASR pass to produce second transcription data, wherein said second transcription data is more accurate than said first transcription data;
  
  transcribing said speech data using a third ASR pass based on the speech data and transcribed speech data from the second ASR pass to produce third transcription data;
  
  displaying on a display a part of said first transcription data, which corresponds to a portion of said speech data, prior to transcription of said portion of said speech data by said second ASR pass;
  
  updating said displayed part of said first transcription data with one or more of said second transcription data and said third transcription data upon completion of the transcription of said portion of said speech data by said second or third ASR pass; and
  
  displaying with transcription data an indication of how many additional transcription passes are forthcoming that will update the transcription data.

11. An automatic speech recognition (ASR) system using a method of reducing latency, the method comprising:
- transcribing, via a processor in the ASR system, speech data using a first ASR pass, which operates at a first transcription rate near real time, to produce first transcription data;
  
  transcribing said speech data using a second ASR pass slower than said first ASR pass to produce second transcription data, wherein said second transcription data is more accurate than said first transcription data;
  
  transcribing said speech data using a third ASR pass based on the speech data and transcribed speech data from the second ASR pass to produce third transcription data;
  
  displaying a part of said first transcription data, which corresponds to a portion of said speech data, prior to transcription of said portion of said speech data by said second ASR pass;
  
  updating said displayed part of said first transcription data with one or more of said second transcription data and said third transcription data upon completion of the transcription of said portion of said speech data by said second or third ASR pass; and
  
  displaying with transcription data an indication of how many additional transcription passes are forthcoming that will update the transcription data.

12. A computer-implemented method of reducing latency in the display of transcribed data generated by automatic speech recognition (ASR) process, the method comprising:
- transcribing via a processor a segment of speech data using a plurality of normalized ASR passes, said plurality of normalized ASR passes having varying levels of accuracy and speed, wherein a second normalized ASR pass estimates a gender and a vocal tract of a speaker based on audio and a first transcription data obtained prior to the transcribing using the plurality of normalized ASR passes;
  
  incrementally updating a display of transcribed text as more accurate text is generated by one of said plurality of normalized ASR passes;
  
  displaying an indicator that communicates a relative accuracy of words in said displayed text; and
  
  displaying with transcription data an indication of how many additional transcription passes are forthcoming that will update the transcription data.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
- - 13. The method of claim 12, wherein at least one of said plurality of normalized ASR passes operates at near real time.
  - 14. The method of claim 12, wherein at least one of said plurality of normalized ASR passes operates at greater than real time.
  - 15. The method of claim 12, wherein said updating comprises updating an entire set of displayed text.
  - 16. The method of claim 12, wherein said updating comprises updating only a portion of an entire set of displayed text.
  - 17. The method of claim 12, wherein said indicator is based on word confidence scores.
  - 18. The method of claim 12, wherein said indicator is a color.
  - 19. The method of claim 12, wherein said indicator is a shade.
  - 20. The method of claim 12, wherein said indicator is a number.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property II LP (AT&T, Inc.)
Inventors
Amento, Brian Scott, Bacchiani, Michiel Adriaan Unico
Primary Examiner(s)
Vo, Huyen X.

Application Number

US10/742,852
Time in Patent Office

2,352 Days
Field of Search

704/231, 704/235, 704/246, 704/251, 704/270, 704/252, 704/229, 704/243, 704/236, 704/244, 704/275, 704/239, 704/255, 704/276, 704/240
US Class Current

704/252
CPC Class Codes

G10L 15/32 Multiple recognisers used i...

System and method for latency reduction for automatic speech recognition using partial multi-pass results

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

36 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for latency reduction for automatic speech recognition using partial multi-pass results

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links