Speech model retrieval in distributed speech recognition systems
First Claim
1. A system comprising:
- a computer-readable memory storing executable instructions; and
one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to;
receive, from a client device, audio data comprising a user utterance;
determine that an additional speech recognition model is not available;
perform first speech recognition processing on the audio data using a base speech recognition model to produce first speech recognition results;
request the additional speech recognition model from a network-accessible data store, wherein the request is initiated prior to completion of the first speech recognition processing;
receive the additional speech recognition model from the network-accessible data store;
perform second speech recognition processing using the additional speech recognition model and using at least one of the audio data or the first speech recognition results;
transmit a response to the client device based at least in part on the second speech recognition processing;
remove the additional speech recognition model from a cache;
determine a predicted time at which the client device is predicted to initiate a subsequent speech recognition session; and
pre-cache, in the cache, the additional speech recognition model at substantially the predicted time.
1 Assignment
0 Petitions
Accused Products
Abstract
Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.
46 Citations
29 Claims
-
1. A system comprising:
-
a computer-readable memory storing executable instructions; and one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to; receive, from a client device, audio data comprising a user utterance; determine that an additional speech recognition model is not available; perform first speech recognition processing on the audio data using a base speech recognition model to produce first speech recognition results; request the additional speech recognition model from a network-accessible data store, wherein the request is initiated prior to completion of the first speech recognition processing; receive the additional speech recognition model from the network-accessible data store; perform second speech recognition processing using the additional speech recognition model and using at least one of the audio data or the first speech recognition results; transmit a response to the client device based at least in part on the second speech recognition processing; remove the additional speech recognition model from a cache; determine a predicted time at which the client device is predicted to initiate a subsequent speech recognition session; and pre-cache, in the cache, the additional speech recognition model at substantially the predicted time. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method comprising:
under control of one or more computing devices configured with specific computer executable instructions, performing, by the one or more computing devices, first speech processing on audio data regarding an utterance of a user to produce speech processing results; requesting, by the one or more computing devices, speech processing data from a network-accessible data store, wherein the request is initiated prior to completion of the first speech processing; receiving, by the one or more computing devices, the speech processing data from the network-accessible data store; performing, by the one or more computing devices, second speech processing using the speech processing data and at least one of the audio data or the speech processing results; removing, by the one or more computing devices, the speech processing data from a storage of the one or more computing devices; determining, by the one or more computing devices, a predicted time at which the user is predicted to initiate a subsequent speech recognition session; and receiving, by the one or more computing devices, the speech processing data at substantially the predicted time. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
19. A non-transitory computer readable medium comprising executable code that, when executed by a processor, causes a computing device to perform a process comprising:
-
performing first speech recognition processing on audio data regarding an utterance of a user to produce speech recognition results; requesting speech recognition data from a network-accessible data store, wherein the request is initiated prior to completion of the first speech recognition processing; receiving the speech recognition data from a network-accessible data store; performing second speech recognition processing using the speech recognition data and at least one of the audio data or the speech recognition results; removing the speech recognition data from a storage of the computing device; determining a predicted time at which the user is predicted to initiate a subsequent speech recognition session; and receiving the speech recognition data at substantially the predicted time. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
Specification