Local speech recognition of frequent utterances
First Claim
1. A system for performing speech recognition comprising a local device and a remote device, the system configured to perform actions comprising:
- receiving a plurality of spoken utterances by a local device during a period of use of the local device;
determining a first frequently spoken utterance and a second frequently spoken utterance from the plurality of spoken utterances, wherein the determining is based on a number of times each of the first frequently spoken utterance and the second frequently spoken utterance were received by the local device during the period of use;
creating a first model for the first frequently spoken utterance and a second model for the second frequently spoken utterance;
receiving a first spoken utterance by the local device;
sending a representation of the first spoken utterance from the local device to a remote device;
determining, by the local device, that the first spoken utterance corresponds to the first frequently spoken utterance, wherein the determining is based at least in part on the first model and the second model;
sending, by the local device, a cancellation request to the remote device in response to determining, by the local device, that the first spoken utterance corresponds to the first frequently spoken utterance, wherein the cancellation request indicates that the remote device need not perform speech recognition on the representation of the first spoken utterance;
performing an action corresponding to the first spoken utterance;
receiving a second spoken utterance by the local device;
determining, by the local device, that the second spoken utterance does not correspond to the first frequently spoken utterance and that the second spoken utterance does not correspond to the second frequently spoken utterance, wherein the determining is based at least in part on the first model and the second model;
sending a representation of the second spoken utterance from the local device to the remote device;
performing speech recognition on the representation of the second spoken utterance by the remote device; and
performing an action corresponding to the second spoken utterance.
1 Assignment
0 Petitions
Accused Products
Abstract
In a distributed automated speech recognition (ASR) system, speech models may be employed on a local device to allow the local device to process frequently spoken utterances while passing other utterances to a remote device for processing. Upon receiving an audio signal, the local device compares the audio signal to the speech models of the frequently spoken utterances to determine whether the audio signal matches one of the speech models. When the audio signal matches one of the speech models, the local device processes the utterance, for example by executing a command. When the audio signal does not match one of the speech models, the local device transmits the audio signal to a second device for ASR processing. This reduces latency and the amount of audio signals that are sent to the second device for ASR processing.
-
Citations
20 Claims
-
1. A system for performing speech recognition comprising a local device and a remote device, the system configured to perform actions comprising:
-
receiving a plurality of spoken utterances by a local device during a period of use of the local device; determining a first frequently spoken utterance and a second frequently spoken utterance from the plurality of spoken utterances, wherein the determining is based on a number of times each of the first frequently spoken utterance and the second frequently spoken utterance were received by the local device during the period of use; creating a first model for the first frequently spoken utterance and a second model for the second frequently spoken utterance; receiving a first spoken utterance by the local device; sending a representation of the first spoken utterance from the local device to a remote device; determining, by the local device, that the first spoken utterance corresponds to the first frequently spoken utterance, wherein the determining is based at least in part on the first model and the second model; sending, by the local device, a cancellation request to the remote device in response to determining, by the local device, that the first spoken utterance corresponds to the first frequently spoken utterance, wherein the cancellation request indicates that the remote device need not perform speech recognition on the representation of the first spoken utterance; performing an action corresponding to the first spoken utterance; receiving a second spoken utterance by the local device; determining, by the local device, that the second spoken utterance does not correspond to the first frequently spoken utterance and that the second spoken utterance does not correspond to the second frequently spoken utterance, wherein the determining is based at least in part on the first model and the second model; sending a representation of the second spoken utterance from the local device to the remote device; performing speech recognition on the representation of the second spoken utterance by the remote device; and performing an action corresponding to the second spoken utterance. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method, comprising:
-
receiving a plurality of spoken utterances during a period of use of a local device; storing, by the local device, a speech model corresponding to a frequently spoken utterance, the frequently spoken utterance comprising one of the plurality of spoken utterances and being determined based on a number of times the frequently spoken utterance was received by the local device during the period of use; receiving, by the local device, first audio data comprising first speech; transmitting, by the local device, a representation of the first audio data to a remote device; determining, by the local device, that the first speech includes the frequently spoken utterance based at least in part on the speech model; sending, by the local device, a cancellation request to the remote device in response to determining that the first speech includes the frequently spoken utterance, wherein the cancellation request indicates that the remote device need not perform speech recognition on the representation of the first audio data; receiving, by the local device, second audio data comprising second speech; determining, by the local device, that the second speech does not include the frequently spoken utterance; and transmitting, by the local device, a representation of the second audio data to the remote device for processing, wherein the remote device performs speech recognition on the second audio data. - View Dependent Claims (6, 7, 8, 9, 10)
-
-
11. A computing device, comprising:
-
at least one processor; a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the processor; to receive a plurality of spoken utterances during a period of use of the device; to store a speech model corresponding to a frequently spoken utterance, the frequently spoken utterance being one of the plurality of spoken utterances and being determined based on a number of times the frequently spoken utterance was received by the device during the period of use; to receive first audio data comprising first speech; to transmit a representation of the first audio data to a remote device; to determine that the first speech includes the frequently spoken utterance based at least in part on the speech model; to send a cancellation request to the remote device in response to determining that the first speech includes the frequently spoken utterance, wherein the cancellation request indicates that the remote device need not perform speech recognition on the representation of the first audio data; to receive second audio data comprising second speech; to determine that the second speech does not include the frequently spoken utterance; and to transmit a representation of the second audio data to the remote device for processing, wherein the remote device performs speech recognition on the second audio data. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A non-transitory computer-readable storage medium storing processor-executable instructions for controlling a computing device, comprising:
-
program code to receive a plurality of spoken utterances during a period of use of the device; program code to store a speech model corresponding to a frequently spoken utterance, the frequently spoken utterance being one of the plurality of spoken utterances and being determined based on a number of times the frequently spoken utterance was received by the device during the period of use; program code to receive first audio data comprising first speech; program code to transmit a representation of the first audio data to a remote device; program code to determine that the first speech includes the frequently spoken utterance based at least in part on the speech model; program code to send a cancellation request to the remote device in response to determining that the first speech includes the frequently spoken utterance, wherein the cancellation request indicates that the remote device need not perform speech recognition on the representation of the first audio data; program code to receive second audio data comprising second speech; program code to determine that the second speech does not include the frequently spoken utterance; and program code to transmit a representation of the second audio data to the remote device for processing, wherein the remote device performs speech recognition on the second audio data. - View Dependent Claims (17, 18, 19, 20)
-
Specification