Timely speech recognition

US 9,099,090 B2
Filed: 10/01/2012
Issued: 08/04/2015
Est. Priority Date: 08/22/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, the computer-implemented method comprising:

under control of a computing device configured with specific computer-executable instructions,receiving a first portion of audio data;

generating, with an automatic speech recognition engine, first transcribed text corresponding to the first portion of the audio data;

determining a confidence level for transcription accuracy of the first transcribed text;

transmitting the first transcribed text to a first device for presentation on the first device;

transmitting the confidence level to the first device, the confidence level associated with a cue for presentation on the first device, wherein the cue indicates the confidence level for transcription accuracy of the first transcribed text, and wherein the cue is distinct from the first transcribed text;

substantially while the first transcribed text is being presented on the first device,receiving a second portion of the audio data; and

generating, with the automatic speech recognition engine, second transcribed text corresponding to the first portion of the audio data and the second portion of the audio data; and

transmitting the second transcribed text to the first device for presentation on the first device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automatic speech recognition engine may generate text or tokens that correspond to audio data. For example, the automatic speech recognition engine may generate first text or first speech tokens corresponding to a first portion of audio data. The automatic speech recognition engine may further generate second text or second speech tokens that correspond to a first portion of the audio data and a second portion of the audio data. The text or speech tokens generated by the automatic speech recognition engine may be provided to a device for presentation thereon. In some embodiments, the automatic speech recognition engine generates the second text or second speech tokens substantially while the first text or first speech tokens are presented on the device.

Citations

18 Claims

1. A computer-implemented method, the computer-implemented method comprising:
- under control of a computing device configured with specific computer-executable instructions,receiving a first portion of audio data;
  
  generating, with an automatic speech recognition engine, first transcribed text corresponding to the first portion of the audio data;
  
  determining a confidence level for transcription accuracy of the first transcribed text;
  
  transmitting the first transcribed text to a first device for presentation on the first device;
  
  transmitting the confidence level to the first device, the confidence level associated with a cue for presentation on the first device, wherein the cue indicates the confidence level for transcription accuracy of the first transcribed text, and wherein the cue is distinct from the first transcribed text;
  
  substantially while the first transcribed text is being presented on the first device,receiving a second portion of the audio data; and
  
  generating, with the automatic speech recognition engine, second transcribed text corresponding to the first portion of the audio data and the second portion of the audio data; and
  
  transmitting the second transcribed text to the first device for presentation on the first device.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The computer-implemented method of claim 1, wherein the confidence level is based at least in part on at least one of:
    - a background noise level of the first portion of the audio data;
      
      ora volume of the first portion of the audio data.
  - 3. The computer-implemented method of claim 1, wherein the confidence level is transmitted to the first device with the first transcribed text.
  - 4. The computer-implemented method of claim 1, wherein:
    - the first portion of the audio data and the second portion of the audio data are received as a stream; and
      
      the second portion of the audio data is received after the first portion of the audio data is received.
  - 5. The computer-implemented method of claim 4, wherein the second portion of the audio data is received substantially immediately after the first portion of the audio data is received.
  - 6. The computer-implemented method of claim 1, wherein the first transcribed text and the second transcribed text each comprise at least one of a syllable, a word, a phrase, or a sentence.

7. A system comprising:
- an electronic data store configured to store one or more algorithms that, when executed, implement an automatic speech recognition engine; and
  
  a computing device in communication with the electronic data store, the computing device configured to;
  
  receive a first portion of audio data;
  
  generate, with the automatic speech recognition engine, first transcribed text corresponding to the first portion of the audio data,determine a first confidence level for transcription accuracy of the first transcribed text;
  
  transmit the first transcribed text to a first device for presentation on the first device;
  
  transmit the first confidence level to the first device, the first confidence level associated with a cue for presentation on the first device, wherein the cue indicates the first confidence level for transcription accuracy of the first transcribed text, and wherein the cue is distinct from the first transcribed text;
  
  substantially while the first transcribed text is presented on the first device,receive a second portion of the audio data; and
  
  generate, with the automatic speech recognition engine, second transcribed text corresponding to the first portion of the audio data and the second portion of the audio data; and
  
  transmit the second transcribed text to the first device for presentation on the first device.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the first portion of the audio data and the second portion of the audio data are received from the first device.
  - 9. The system of claim 7, wherein the first portion of the audio data and the second portion of the audio data are received from a second device.
  - 10. The system of claim 7, wherein the first transcribed text comprises at least one of a syllable, a word, a phrase, or a sentence.
  - 11. The system of claim 7, wherein the computing device is further configured to:
    - generate, with the automatic speech recognition engine, additional first transcribed text corresponding to the first portion of the audio data;
      
      determine a confidence level for transcription accuracy of the additional first transcribed text; and
      
      select a portion of the additional first transcribed text with a second confidence level greater than the first confidence level.
  - 12. The system of claim 11, wherein the computing device is further configured to transmit, to the first device, each of the additional first transcribed text.

13. A non-transitory computer-readable storage medium having stored thereon a computer-executable module configured to execute in one or more processors, the computer-executable module being further configured to:
- obtain a first portion of audio data;
  
  transmit the first portion of the audio data to a remote computing device;
  
  receive, from the remote computing device, transcribed text corresponding to the first portion of the audio data;
  
  cause presentation of the transcribed text;
  
  receive, from the remote computing device, a first confidence level for transcription accuracy of the first transcribed text, the first confidence level associated with a first cue, wherein the first cue indicates the first confidence level for transcription accuracy of the first transcribed text, and wherein the first cue is distinct from the first transcribed text;
  
  cause presentation of the first cue;
  
  substantially while the first transcribed text is caused to be presented,obtain a second portion of the audio data;
  
  transmit the second portion of the audio data to the remote computing device; and
  
  receive, from the remote computing device, second transcribed text corresponding to the first portion of the audio data and the second portion of the audio data; and
  
  cause presentation of the second transcribed text.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The non-transitory computer-readable storage medium of claim 13, wherein the first cue comprises at least one of an auditory cue, a verbal cue, an optical cue, a vibratory cue, or a graphical cue.
  - 15. The non-transitory computer-readable storage medium of claim 13, whereinthe first cue is presented substantially while the transcribed text is presented.
  - 16. The non-transitory computer-readable storage medium of claim 13, wherein the computer-executable component is further configured to:
    - receive, from the remote computing device, a second confidence level corresponding to transcription accuracy of the transcribed text;
      
      compare the first confidence level to the second confidence level;
      
      select a second cue based at least in part on the comparison; and
      
      cause presentation of the second cue.
  - 17. The non-transitory computer-readable storage medium of claim 13, wherein the computer-executable component is further configured to:
    - receive, from the remote computing device, one or more alternate transcribed texts corresponding to the first portion of the audio data; and
      
      cause presentation of the one or more alternate transcribed texts in conjunction with the transcribed text.
  - 18. The non-transitory computer-readable storage medium of claim 13, wherein the computer-executable component is further configured to:
    - receive, from the remote computing device, one or more alternate transcribed texts corresponding to the first portion of the audio data and the second portion of the audio data; and
      
      cause presentation of the one or more alternate transcribed texts in conjunction with the transcribed text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Canyon IP Holdings LLC (Intellectual Ventures LLC)
Inventors
Paden, Scott Edward
Primary Examiner(s)
Han, Qi

Application Number

US13/632,962
Publication Number

US 20130275129A1
Time in Patent Office

1,037 Days
Field of Search

704/235, 704/231, 704/251, 704/255, 704/270, 704/270.1, 704/275
US Class Current

1/1
CPC Class Codes

G10L 15/22 Procedures used during a sp...

G10L 15/26 Speech to text systems G10L...

Timely speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Timely speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links