Speech model retrieval in distributed speech recognition systems
First Claim
1. A system comprising:
- a computer-readable memory storing executable instructions; and
one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least;
receive audio data from a client computing device separate from the system, wherein the audio data comprises data regarding an utterance of a user;
produce first speech processing results using a base speech processing model and the audio data, wherein the base speech processing model is stored at the system;
obtain a specialized speech processing model from a network-accessible data store separate from the system and separate from the client computing device, wherein the obtaining is initiated by the system subsequent to receipt of the audio data and prior to completion of producing the first speech processing results;
determine, based at least partly on a time at which the specialized speech processing model is obtained, that the system is to produce second speech processing results using the specialized speech processing model subsequent to initiating production of the first speech processing results; and
produce the second speech processing results using the specialized speech processing model and at least one of the audio data or the first speech processing results.
0 Assignments
0 Petitions
Accused Products
Abstract
Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.
80 Citations
19 Claims
-
1. A system comprising:
-
a computer-readable memory storing executable instructions; and one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least; receive audio data from a client computing device separate from the system, wherein the audio data comprises data regarding an utterance of a user; produce first speech processing results using a base speech processing model and the audio data, wherein the base speech processing model is stored at the system; obtain a specialized speech processing model from a network-accessible data store separate from the system and separate from the client computing device, wherein the obtaining is initiated by the system subsequent to receipt of the audio data and prior to completion of producing the first speech processing results; determine, based at least partly on a time at which the specialized speech processing model is obtained, that the system is to produce second speech processing results using the specialized speech processing model subsequent to initiating production of the first speech processing results; and produce the second speech processing results using the specialized speech processing model and at least one of the audio data or the first speech processing results. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented method comprising:
under control of a server system comprising one or more computing devices configured with specific computer executable instructions, receiving audio data from a client device separate from the server system, wherein the audio data comprises data regarding an utterance of a user; producing first speech processing results using a base speech processing model and the audio data, wherein the base speech processing model is stored at the server system; obtaining a specialized speech processing model from a network-accessible data store separate from the server system and separate from the client device, wherein the obtaining is initiated based at least partly on an attribute of the specialized speech processing model and prior to completion of producing the first speech processing results; determining, based at least partly on a time at which the specialized speech processing model is obtained, that the server system is to produce second speech processing results using the specialized speech processing model subsequent to initiating production of the first speech processing results; and producing the second speech processing results using the specialized speech processing model and at least one of the audio data or the first speech processing results. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
19. A non-transitory computer readable medium comprising executable code that, when executed by one or more processors of a server system, causes the server system to perform a process comprising:
-
receiving audio data from a client computing device separate from the server system, wherein the audio data comprises data regarding an utterance of a user; producing first speech processing results using a base speech processing model and the audio data, wherein the base speech processing model is stored at the server system; obtaining a specialized speech processing model from a network-accessible data store separate from the server system and separate from the client computing device, wherein the obtaining is initiated subsequent to receipt of the audio data and prior to completion of producing the first speech processing results; determining, based at least partly on a time at which the specialized speech processing model is obtained, that the server system is to produce second speech processing results using the specialized speech processing model subsequent to initiating production of the first speech processing results; and producing the second speech processing results using the specialized speech processing model and at least one of the audio data or the first speech processing results.
-
Specification