SPEECH MODEL RETRIEVAL IN DISTRIBUTED SPEECH RECOGNITION SYSTEMS
First Claim
1. A system comprising:
- a computer-readable memory storing executable instructions; and
one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to;
receive, from a client device, audio data comprising a user utterance;
determining that an additional speech recognition model is not available;
perform first speech recognition processing on the audio data using a base speech recognition model to produce first speech recognition results;
request the additional speech recognition model from a network-accessible data store, wherein the request is initiated prior to completion of the first speech recognition processing;
receive the additional speech recognition model from the network-accessible data store;
perform second speech recognition processing using the additional speech recognition model and using at least one of the audio data or the speech recognition results; and
transmit a response to the client device based at least in part on the second speech recognition processing.
1 Assignment
0 Petitions
Accused Products
Abstract
Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.
237 Citations
29 Claims
-
1. A system comprising:
-
a computer-readable memory storing executable instructions; and one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to; receive, from a client device, audio data comprising a user utterance; determining that an additional speech recognition model is not available; perform first speech recognition processing on the audio data using a base speech recognition model to produce first speech recognition results; request the additional speech recognition model from a network-accessible data store, wherein the request is initiated prior to completion of the first speech recognition processing; receive the additional speech recognition model from the network-accessible data store; perform second speech recognition processing using the additional speech recognition model and using at least one of the audio data or the speech recognition results; and transmit a response to the client device based at least in part on the second speech recognition processing. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method comprising:
-
under control of one or more computing devices configured with specific computer executable instructions, performing first speech processing on audio data regarding an utterance of a user to produce speech processing results; requesting speech processing data from a network-accessible data store, wherein the request is initiated prior to completion of the first speech processing; receiving the speech processing data from the network-accessible data store; and performing second speech processing using the speech processing data and at least one of the audio data or the speech processing results. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-transitory computer readable medium comprising executable code that, when executed by a processor, causes a computing device to perform a process comprising:
-
performing first speech recognition processing on audio data regarding an utterance of a user to produce speech recognition results; requesting speech recognition data from a network-accessible data store, wherein the request is initiated prior to completion of the first speech recognition processing; receiving the speech recognition data from a network-accessible data store; and performing second speech recognition processing using the speech recognition data and at least one of the audio data or the speech recognition results. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
Specification