Reducing speech recognition latency
First Claim
1. A method for dynamically adjusting speech recognition processing to reduce latency, the method comprising:
- receiving a first portion of an audio input corresponding to an utterance;
identifying a time stamp associated with the first portion;
performing speech recognition processing on the first portion using a first graph pruning factor;
identifying a current time of processing of the first portion;
determining a current latency of the utterance by comparing the time stamp to the current time;
determining a property of a second portion of the audio input prior to performing speech recognition processing on the second portion, the property comprising an estimated difficulty of speech recognition processing, the estimated difficulty based on a percentage of the second portion of the audio input that has a signal to noise ratio below a threshold;
determining an estimated latency based at least in part on the property of the second portion and the current latency;
comparing the estimated latency to a target latency;
determining a second graph pruning factor based at least in part on the comparing;
performing additional speech recognition processing on the second portion using the second graph pruning factor; and
outputting speech processing results.
1 Assignment
0 Petitions
Accused Products
Abstract
In an automatic speech recognition (ASR) processing system, ASR processing may be configured to reduce a latency of returning speech results to a user. The latency may be determined by comparing a time stamp of an utterance in process to a current time. Latency may also be estimated based on an endpoint of the utterance or other considerations such as how difficult the utterance may be to process. To improve latency the ASR system may be configured to adjust various processing parameters, such as graph pruning factors, path weights, ASR models, etc. Latency checks and corrections may occur dynamically for a particular utterance while it is being processed, thus allowing the ASR system to adjust to rapidly changing latency conditions.
-
Citations
23 Claims
-
1. A method for dynamically adjusting speech recognition processing to reduce latency, the method comprising:
-
receiving a first portion of an audio input corresponding to an utterance; identifying a time stamp associated with the first portion; performing speech recognition processing on the first portion using a first graph pruning factor; identifying a current time of processing of the first portion; determining a current latency of the utterance by comparing the time stamp to the current time; determining a property of a second portion of the audio input prior to performing speech recognition processing on the second portion, the property comprising an estimated difficulty of speech recognition processing, the estimated difficulty based on a percentage of the second portion of the audio input that has a signal to noise ratio below a threshold; determining an estimated latency based at least in part on the property of the second portion and the current latency; comparing the estimated latency to a target latency; determining a second graph pruning factor based at least in part on the comparing; performing additional speech recognition processing on the second portion using the second graph pruning factor; and outputting speech processing results. - View Dependent Claims (2, 3)
-
-
4. A computing device, comprising:
-
at least one processor; a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the at least one processor; to receive a first portion of audio data; to perform, beginning at a first time and with a first frame, speech processing on the first portion using a first value of a speech processing parameter; to determine, at a second time, a current location, in the first portion of the audio data, of data being processed during the speech processing; to determine a second frame at the current location; to determine a first number of frames between the first frame and the second frame; to determine a first processing rate based at least in part on the first number of frames, the first time and the second time; to estimate, based on the current location and the first processing rate, a speech processing latency corresponding to processing of a second portion of the audio data; to set the speech processing parameter to a second value based at least in part on the speech processing latency; and to perform speech processing on the second portion of the audio data using the second value of the speech processing parameter. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable storage medium storing processor-executable instructions for controlling a computing device, comprising:
-
program code to receive a first portion of audio data; program code to perform speech processing on the first portion using a first value of a speech processing parameter; program code to determine a property of a second portion of the audio data, the property comprising an estimated difficulty of speech recognition processing, the estimated difficulty based on a percentage of the second portion of the audio data that has a signal to noise ratio below a threshold; program code to estimate, based at least in part on the property, a speech processing latency corresponding to processing of the audio data; program code to set the speech processing parameter to a second value based at least in part on the speech processing latency; and program code to perform speech processing on a second portion of the audio data using the second value of the speech processing parameter. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23)
-
Specification