Mechanism for providing user guidance and latency concealment for automatic speech recognition systems

US 8,457,963 B2
Filed: 03/30/2010
Issued: 06/04/2013
Est. Priority Date: 03/31/2009
Status: Active Grant

First Claim

Patent Images

1. An automatic speech recognition method comprising:

receiving from a user of a mobile phone a signal to start capturing a user spoken audio utterance;

begin capturing the user spoken audio utterance in a buffer of the mobile phone in response to the received signal to start capturing the user spoken audio utterance;

receiving from the user of the mobile phone a signal to stop capturing the user spoken audio utterance;

end capturing the user spoken audio utterance in the buffer of the mobile phone in response to the received signal to stop capturing the user spoken audio utterance;

and performing the following steps in sequence;

in a first step, begin sending the captured user spoken audio utterance from the buffer of the mobile phone to an automatic speech recognition system of a server located across a network from the mobile phone;

in a second step, begin playing back the captured user spoken audio utterance from the buffer of the mobile phone to the user while the captured user spoken audio utterance is being sent to and/or recognized by the automatic speech recognition system;

in a third step, receiving at the mobile phone a recognized version of the captured user spoken audio utterance from the automatic speech recognition system of the server located across the network from the mobile phone; and

in a fourth step, rendering at the mobile phone the recognized version of the captured user spoken audio utterance.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Audio input to a user device is captured in a buffer and played back to the user while being sent to and recognized by an automatic speech recognition (ASR) system. Overlapping the playback with the speech recognition processing masks a portion of the true latency of the ASR system thus improving the user'"'"'s perception of the ASR system'"'"'s responsiveness. Further, upon hearing the playback, the user is intuitively guided to self-correct for any defects in the captured audio.

20 Citations

View as Search Results

16 Claims

1. An automatic speech recognition method comprising:
- receiving from a user of a mobile phone a signal to start capturing a user spoken audio utterance;
  
  begin capturing the user spoken audio utterance in a buffer of the mobile phone in response to the received signal to start capturing the user spoken audio utterance;
  
  receiving from the user of the mobile phone a signal to stop capturing the user spoken audio utterance;
  
  end capturing the user spoken audio utterance in the buffer of the mobile phone in response to the received signal to stop capturing the user spoken audio utterance;
  
  and performing the following steps in sequence;
  
  in a first step, begin sending the captured user spoken audio utterance from the buffer of the mobile phone to an automatic speech recognition system of a server located across a network from the mobile phone;
  
  in a second step, begin playing back the captured user spoken audio utterance from the buffer of the mobile phone to the user while the captured user spoken audio utterance is being sent to and/or recognized by the automatic speech recognition system;
  
  in a third step, receiving at the mobile phone a recognized version of the captured user spoken audio utterance from the automatic speech recognition system of the server located across the network from the mobile phone; and
  
  in a fourth step, rendering at the mobile phone the recognized version of the captured user spoken audio utterance.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein the first step occurs before the end capturing the user spoken audio utterance in the buffer of the mobile phone.
  - 3. The method of claim 1 wherein the first step occurs at the end capturing the user spoken audio utterance in the buffer of the mobile phone.
  - 4. The method of claim 1 wherein the first step occurs after the end capturing the user spoken audio utterance in the buffer of the mobile phone.
  - 5. The method of claim 1 wherein the second step occurs a predetermined period of time alter the end capturing the user spoken audio utterance in the buffer of the mobile phone.
  - 6. The method of claim 5 wherein the predefined period of time is 100 milliseconds.

7. An apparatus comprising:
- a microphone configured to receive audio from a user of the apparatus;
  
  a buffer configured to store the received audio;
  
  an utterance gating control configured to start and stop the buffer storing the received audio;
  
  a loudspeaker;
  
  and a processor configured to perform the following steps in sequence;
  
  in a first step, begin sending the stored audio across a network to an automatic speech recognition system;
  
  in a second step, after waiting a predefined period of time after the utterance gating control has stopped the buffer storing the received audio, control play back of the stored audio through the loudspeaker while the stored audio is being sent to and/or recognized by the automatic speech recognition system;
  
  in a third step, receive across the network from the automatic speech recognition system a recognized version of the sent audio; and
  
  in a fourth step, render at the apparatus the received recognized version of the sent audio.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The apparatus of claim 7 wherein the apparatus is a mobile phone.
  - 9. The apparatus of claim 7 wherein utterance gating control is a push-to-talk button.
  - 10. The apparatus of claim 7 wherein the network is a cellular telephone network.
  - 11. The apparatus of claim 7 wherein the network is the Internet.

12. A non-transitory-computer readable storage medium having embodied thereon a program, the program executable by a processor to perform a method for automatic speech recognition, the method comprising:
- receiving from a user of a mobile phone a signal to start capturing a user spoken audio utterance;
  
  begin capturing the user spoken audio utterance in a buffer of the mobile phone in response to the received signal to start capturing the user spoken audio utterance;
  
  receiving from the user of the mobile phone a signal to stop capturing the user spoken audio utterance;
  
  end capturing the user spoken audio utterance in the buffer of the mobile phone in response to the received signal to stop capturing the user spoken audio utterance;
  
  and performing the following steps in sequence;
  
  in a first step, begin sending the captured user spoken audio utterance from the buffer of the mobile phone to an automatic speech recognition system of a server located across a network from the mobile phone;
  
  in a second step, begin playing back the captured user spoken audio utterance from the buffer of the mobile phone to the user while the captured user spoken audio utterance is being sent to and/or recognized by the automatic speech recognition system;
  
  in a third step, receiving at the mobile phone a recognized version of the captured user spoken audio utterance from the automatic speech recognition system of the server located across the network from the mobile phone; and
  
  in a fourth step, rendering at the mobile phone the recognized version of the captured user spoken audio utterance.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The non-transitory computer readable storage medium of claim 12 wherein in the method the first step occurs before the end capturing the user spoken audio utterance in the buffer of the mobile phone.
  - 14. The non-transitory computer readable storage medium of claim 12 wherein in the method the first step occurs at the end capturing the user spoken audio utterance in the buffer of the mobile phone.
  - 15. The non-transitory computer readable storage medium of claim 12 wherein in the method the first step occurs after the end capturing the user spoken audio utterance in the buffer of the mobile phone.
  - 16. The non-transitory computer readable storage medium of claim 12 wherein in the method the second step occurs a predetermined period of time after the end capturing the user spoken audio utterance in the buffer of the mobile phone.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Promptu Systems Corporation
Original Assignee
Promptu Systems Corporation
Inventors
Charriere, Laurent
Primary Examiner(s)
Kasraian, Allahyar

Application Number

US12/750,653
Publication Number

US 20100248786A1
Time in Patent Office

1,162 Days
Field of Search

455/563, 4554121-413, 704/231, 704/243, 704246-248, 704251-253, 704/200, 704/258, 704/267, 704/270.1
US Class Current

704/246
CPC Class Codes

G10L 15/30 Distributed recognition, e....

Mechanism for providing user guidance and latency concealment for automatic speech recognition systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

20 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Mechanism for providing user guidance and latency concealment for automatic speech recognition systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

20 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links