Corrective feedback loop for automated speech recognition

US 8,352,264 B2
Filed: 03/19/2009
Issued: 01/08/2013
Est. Priority Date: 03/19/2008
Status: Active Grant

First Claim

Patent Images

1. A method for facilitating updating of a language model, the method comprising:

as implemented by a client device configured with specific computer-executable instructions,receiving an audio message comprising speech of a user;

communicating the audio message to a first remote server;

receiving, from the first remote server,a transcription generated by an automatic speech recognition engine from the audio message; and

an alternative result matrix generated by the automatic speech recognition engine from the audio message;

receiving an affirmation of the transcription from the user;

storing the transcription with an identifier corresponding to the audio message; and

communicating the identifier and the transcription to a second remote server.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for facilitating the updating of a language model includes receiving, at a client device, via a microphone, an audio message corresponding to speech of a user; communicating the audio message to a first remote server; receiving, that the client device, a result, transcribed at the first remote server using an automatic speech recognition system (“ASR”), from the audio message; receiving, at the client device from the user, an affirmation of the result; storing, at the client device, the result in association with an identifier corresponding to the audio message; and communicating, to a second remote server, the stored result together with the identifier.

92 Citations

View as Search Results

26 Claims

1. A method for facilitating updating of a language model, the method comprising:
- as implemented by a client device configured with specific computer-executable instructions,receiving an audio message comprising speech of a user;
  
  communicating the audio message to a first remote server;
  
  receiving, from the first remote server,a transcription generated by an automatic speech recognition engine from the audio message; and
  
  an alternative result matrix generated by the automatic speech recognition engine from the audio message;
  
  receiving an affirmation of the transcription from the user;
  
  storing the transcription with an identifier corresponding to the audio message; and
  
  communicating the identifier and the transcription to a second remote server.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the first remote server and the second remote server are the same server.
  - 3. The method of claim 1, wherein the identifier comprises the audio message.
  - 4. The method of claim 1, wherein the identifier comprises location information for the audio message.
  - 5. The method of claim 1, wherein the alternative result matrix comprises one or more alternative results, each alternative result comprising at least one word.
  - 6. The method of claim 5, wherein each alternative result has a confidence value satisfying a threshold.

7. A non-transitory computer-readable medium having a computer-executable component configured for execution by one or more processors of a client device the computer-executable component being further configured to:
- receive an audio message comprising speech of a user;
  
  communicate the audio message to a first remote server;
  
  receive, from the first remote server, a transcription of the audio message generated by an automatic speech recognition engine;
  
  receive an affirmation of the transcription from the user;
  
  store the transcription with an identifier corresponding to the audio message; and
  
  communicate the transcription and the identifier to a second remote server,wherein the transcription and the identifier are communicated to the second remote server in response to at least one of a user instruction, an API call, or a next contact between the client device and the second remote server.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The non-transitory computer-readable medium of claim 7, wherein the first remote server and the second remote server are the same server.
  - 9. The non-transitory computer-readable medium of claim 7, wherein the identifier comprises the audio message.
  - 10. The non-transitory computer-readable medium of claim 7, wherein the identifier comprises location information for the audio message.
  - 11. The non-transitory computer-readable medium of claim 7, wherein the computer-executable component is further configured to:
    - receive, from the first remote server, an alternative result matrix generated from the audio message by the automatic speech recognition engine, the alternative result matrix comprising one or more alternative results, each alternative result comprising at least one word; and
      
      cause the client device to display at least one alternative result.
  - 12. The non-transitory computer-readable medium of claim 11, wherein the displayed at least one alternative result has a confidence value satisfying a threshold.

13. A method for facilitating the updating of a language model, the method comprising:
- receiving, at a client device, an audio message corresponding to speech of a user;
  
  communicating, to a first remote server, the audio message;
  
  receiving, at the client device, a result and alternative result matrix transcribed, at the first remote server using an automatic speech recognition engine, from the audio message;
  
  receiving, at the client device from the user, a manual correction of the result;
  
  storing, at the client device, the corrected result in association with an identifier corresponding to the audio message; and
  
  communicating, to a second remote server, the stored result together with the identifier.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The method of claim 13, wherein the manual correction comprises an affirmation of an alternative fragment result of the alternative result matrix.
  - 15. The method of claim 13, wherein the manual correction comprises text input manually by the user via a keypad.
  - 16. The method of claim 13, wherein the manual correction comprises text input manually by the user via a touchscreen.
  - 17. The method of claim 13, wherein the identifier corresponding to the audio message comprises the audio message.
  - 18. The method of claim 13, wherein the identifier corresponding to the audio message comprises location information for the original audio message.
  - 19. The method of claim 13, wherein the first remote server and the second remote server are the same remote server.

20. A method for facilitating the updating of a language model, the method comprising:
- receiving, at a client device, a first audio message corresponding to speech of a user;
  
  communicating, to a first remote server, the first audio message;
  
  receiving, at the client device, a first result, transcribed at the first remote server using an automatic speech recognition engine, from the first audio message;
  
  receiving, at the client device from the user, a disapproval of the first result;
  
  receiving, at the client device, a second audio message corresponding to speech of the user;
  
  communicating, to the first remote server, the second audio message;
  
  receiving, at the client device, a second result, transcribed at the first remote server using the automatic speech recognition engine, from the second audio message;
  
  receiving, at the client device from the user, an affirmation of the second result;
  
  storing, at the client device, the second result in association with an identifier corresponding to the second audio message; and
  
  communicating, to a second remote server, the stored second result together with the identifier.
- View Dependent Claims (21)
- - 21. The method of claim 20, wherein the first remote server and the second remote server are the same remote server.

22. A system comprising:
- an electronic data store configured to store;
  
  one or more algorithms that, when executed, implement an automatic speech recognition engine; and
  
  a language model; and
  
  a computing device in communication with the electronic data store, the computing device configured to;
  
  receive an audio message from a client device, the audio message comprising speech;
  
  based at least in part on the language model, generate a transcription of the audio message with the automatic speech recognition engine;
  
  based at least in part on the language model, generate one or more alternate results for the audio message with the automatic speech recognition engine;
  
  transmit the transcription and the one or more alternate results to the client device;
  
  receive a response and an identifier of the audio message from the client device; and
  
  based at least in part on the response, update the language model to generate an updated language model.
- View Dependent Claims (23, 24, 25, 26)
- - 23. The system of claim 22, wherein the response comprises an affirmation of the transcription.
  - 24. The system of claim 22, wherein the response comprises a selection of an alternate result.
  - 25. The system of claim 22, wherein the response comprises a typed correction of the transcription.
  - 26. The system of claim 22, wherein the one or more alternate results have a confidence value satisfying a threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Canyon IP Holdings LLC (Intellectual Ventures LLC)
Inventors
White, Marc, Jablokov, Igor Roditis, Jablokov, Victor Roditis
Primary Examiner(s)
RIDER, JUSTIN W

Application Number

US12/407,502
Publication Number

US 20090240488A1
Time in Patent Office

1,391 Days
Field of Search

704/255, 704/257
US Class Current

704/255
CPC Class Codes

G06F 3/0236   using selection techniques ...

G10L 15/183   using context dependencies,...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 2015/0631   Creating reference template...

Corrective feedback loop for automated speech recognition

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

92 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Corrective feedback loop for automated speech recognition

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

92 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links