Mechanism for providing user guidance and latency concealment for automatic speech recognition systems
First Claim
Patent Images
1. An automatic speech recognition method comprising:
- receiving from a user of a mobile phone a signal to start capturing a user spoken audio utterance;
begin capturing the user spoken audio utterance in a buffer of the mobile phone in response to the received signal to start capturing the user spoken audio utterance;
receiving from the user of the mobile phone a signal to stop capturing the user spoken audio utterance;
end capturing the user spoken audio utterance in the buffer of the mobile phone in response to the received signal to stop capturing the user spoken audio utterance;
and performing the following steps in sequence;
in a first step, begin sending the captured user spoken audio utterance from the buffer of the mobile phone to an automatic speech recognition system of a server located across a network from the mobile phone;
in a second step, begin playing back the captured user spoken audio utterance from the buffer of the mobile phone to the user while the captured user spoken audio utterance is being sent to and/or recognized by the automatic speech recognition system;
in a third step, receiving at the mobile phone a recognized version of the captured user spoken audio utterance from the automatic speech recognition system of the server located across the network from the mobile phone; and
in a fourth step, rendering at the mobile phone the recognized version of the captured user spoken audio utterance.
1 Assignment
0 Petitions
Accused Products
Abstract
Audio input to a user device is captured in a buffer and played back to the user while being sent to and recognized by an automatic speech recognition (ASR) system. Overlapping the playback with the speech recognition processing masks a portion of the true latency of the ASR system thus improving the user'"'"'s perception of the ASR system'"'"'s responsiveness. Further, upon hearing the playback, the user is intuitively guided to self-correct for any defects in the captured audio.
20 Citations
16 Claims
-
1. An automatic speech recognition method comprising:
-
receiving from a user of a mobile phone a signal to start capturing a user spoken audio utterance; begin capturing the user spoken audio utterance in a buffer of the mobile phone in response to the received signal to start capturing the user spoken audio utterance; receiving from the user of the mobile phone a signal to stop capturing the user spoken audio utterance; end capturing the user spoken audio utterance in the buffer of the mobile phone in response to the received signal to stop capturing the user spoken audio utterance; and performing the following steps in sequence; in a first step, begin sending the captured user spoken audio utterance from the buffer of the mobile phone to an automatic speech recognition system of a server located across a network from the mobile phone; in a second step, begin playing back the captured user spoken audio utterance from the buffer of the mobile phone to the user while the captured user spoken audio utterance is being sent to and/or recognized by the automatic speech recognition system; in a third step, receiving at the mobile phone a recognized version of the captured user spoken audio utterance from the automatic speech recognition system of the server located across the network from the mobile phone; and in a fourth step, rendering at the mobile phone the recognized version of the captured user spoken audio utterance. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An apparatus comprising:
-
a microphone configured to receive audio from a user of the apparatus; a buffer configured to store the received audio; an utterance gating control configured to start and stop the buffer storing the received audio; a loudspeaker; and a processor configured to perform the following steps in sequence; in a first step, begin sending the stored audio across a network to an automatic speech recognition system; in a second step, after waiting a predefined period of time after the utterance gating control has stopped the buffer storing the received audio, control play back of the stored audio through the loudspeaker while the stored audio is being sent to and/or recognized by the automatic speech recognition system; in a third step, receive across the network from the automatic speech recognition system a recognized version of the sent audio; and in a fourth step, render at the apparatus the received recognized version of the sent audio. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A non-transitory-computer readable storage medium having embodied thereon a program, the program executable by a processor to perform a method for automatic speech recognition, the method comprising:
-
receiving from a user of a mobile phone a signal to start capturing a user spoken audio utterance; begin capturing the user spoken audio utterance in a buffer of the mobile phone in response to the received signal to start capturing the user spoken audio utterance; receiving from the user of the mobile phone a signal to stop capturing the user spoken audio utterance; end capturing the user spoken audio utterance in the buffer of the mobile phone in response to the received signal to stop capturing the user spoken audio utterance; and performing the following steps in sequence; in a first step, begin sending the captured user spoken audio utterance from the buffer of the mobile phone to an automatic speech recognition system of a server located across a network from the mobile phone; in a second step, begin playing back the captured user spoken audio utterance from the buffer of the mobile phone to the user while the captured user spoken audio utterance is being sent to and/or recognized by the automatic speech recognition system; in a third step, receiving at the mobile phone a recognized version of the captured user spoken audio utterance from the automatic speech recognition system of the server located across the network from the mobile phone; and in a fourth step, rendering at the mobile phone the recognized version of the captured user spoken audio utterance. - View Dependent Claims (13, 14, 15, 16)
-
Specification