Speech model retrieval in distributed speech recognition systems

US 9,190,057 B2
Filed: 12/12/2012
Issued: 11/17/2015
Est. Priority Date: 12/12/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a computer-readable memory storing executable instructions; and

one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to;

receive, from a client device, audio data comprising a user utterance;

determine that an additional speech recognition model is not available;

perform first speech recognition processing on the audio data using a base speech recognition model to produce first speech recognition results;

request the additional speech recognition model from a network-accessible data store, wherein the request is initiated prior to completion of the first speech recognition processing;

receive the additional speech recognition model from the network-accessible data store;

perform second speech recognition processing using the additional speech recognition model and using at least one of the audio data or the first speech recognition results;

transmit a response to the client device based at least in part on the second speech recognition processing;

remove the additional speech recognition model from a cache;

determine a predicted time at which the client device is predicted to initiate a subsequent speech recognition session; and

pre-cache, in the cache, the additional speech recognition model at substantially the predicted time.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.

46 Citations

View as Search Results

29 Claims

1. A system comprising:
- a computer-readable memory storing executable instructions; and
  
  one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to;
  
  receive, from a client device, audio data comprising a user utterance;
  
  determine that an additional speech recognition model is not available;
  
  perform first speech recognition processing on the audio data using a base speech recognition model to produce first speech recognition results;
  
  request the additional speech recognition model from a network-accessible data store, wherein the request is initiated prior to completion of the first speech recognition processing;
  
  receive the additional speech recognition model from the network-accessible data store;
  
  perform second speech recognition processing using the additional speech recognition model and using at least one of the audio data or the first speech recognition results;
  
  transmit a response to the client device based at least in part on the second speech recognition processing;
  
  remove the additional speech recognition model from a cache;
  
  determine a predicted time at which the client device is predicted to initiate a subsequent speech recognition session; and
  
  pre-cache, in the cache, the additional speech recognition model at substantially the predicted time.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system of claim 1, wherein the base speech recognition model comprises at least one of a general acoustic model, a gender-specific acoustic model, or a general language model, and wherein the additional speech recognition model is selected based at least in part on a characteristic of a user associated with the user utterance.
  - 3. The system of claim 1, wherein the one or more processors are further programmed by the executable instructions to:
    - receive, from the client device, second audio data comprising a second user utterance;
      
      determine that the additional speech recognition model is available; and
      
      perform speech recognition processing on the second audio data using the additional speech recognition model.
  - 4. The system of claim 1, wherein the one or more processors are further programmed by the executable instructions to use multi-threaded processing to retrieve the additional speech recognition model in parallel with performance of the first speech recognition processing.
  - 5. The system of claim 1, wherein the executable instructions that program the one or more processors to determine the predicted time comprise instructions to detect a pattern in data regarding times of prior speech recognition sessions.

6. A computer-implemented method comprising:
- under control of one or more computing devices configured with specific computer executable instructions,performing, by the one or more computing devices, first speech processing on audio data regarding an utterance of a user to produce speech processing results;
  
  requesting, by the one or more computing devices, speech processing data from a network-accessible data store, wherein the request is initiated prior to completion of the first speech processing;
  
  receiving, by the one or more computing devices, the speech processing data from the network-accessible data store;
  
  performing, by the one or more computing devices, second speech processing using the speech processing data and at least one of the audio data or the speech processing results;
  
  removing, by the one or more computing devices, the speech processing data from a storage of the one or more computing devices;
  
  determining, by the one or more computing devices, a predicted time at which the user is predicted to initiate a subsequent speech recognition session; and
  
  receiving, by the one or more computing devices, the speech processing data at substantially the predicted time.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 7. The computer-implemented method of claim 6, further comprising:
    - selecting speech processing data to request based at least in part on a characteristic of the user.
  - 8. The computer-implemented method of claim 7, wherein the characteristic of the user comprises a gender, age, regional accent, or identify of the user.
  - 9. The computer-implemented method of claim 6, wherein the speech processing data comprises at least one of an acoustic model, a language model, language model statistics, Constrained Maximum-Likelihood Linear Regression (“
    - CMLLR”
      
      ) transforms, Vocal Tract Length Normalization (“
      
      VTLN”
      
      ) warping factors, Cepstral mean and variance data, an intent model, a named entity model, or a gazetteer.
  - 10. The computer-implemented method of claim 9, further comprising:
    - requesting statistics for updating the speech processing data, wherein the request for statistics is initiated prior to completion of the first speech processing.
  - 11. The computer-implemented method of claim 10, further comprising updating the speech processing data based at least in part on the statistics and results of the second speech processing.
  - 12. The computer-implemented method of claim 6, further comprising:
    - receiving second audio data regarding a second utterance of the user;
      
      retrieving the speech processing data from a cache; and
      
      performing speech processing on the second audio data using the speech processing data.
  - 13. The computer-implemented method of claim 6, further comprising storing the speech processing data at a cache server separate from the network-accessible data store at a time preceding the predicted time.
  - 14. The computer-implemented method of claim 6, wherein receiving the speech processing data at substantially the predicted time comprises retrieving a cached copy of the speech processing data.
  - 15. The computer-implemented method of claim 6, further comprising:
    - receiving the audio data from a client device operated by the user; and
      
      transmitting a response to the client device, the response based at least in part on the second speech recognition processing.
  - 16. The computer-implemented method of claim 6, further comprising:
    - performing an action based at least in part on the second speech recognition processing.
  - 17. The computer-implemented method of claim 6, further comprising detecting a pattern in data regarding times of prior speech processing sessions, wherein determining the predicted time is based at least partly on the pattern.
  - 18. The computer-implemented method of claim 6, wherein a first computing device of the one or more computing devices performs the second speech processing using the speech processing data, and wherein a second computing device of the one or more computing devices receives the speech processing data at substantially the predicted time.

19. A non-transitory computer readable medium comprising executable code that, when executed by a processor, causes a computing device to perform a process comprising:
- performing first speech recognition processing on audio data regarding an utterance of a user to produce speech recognition results;
  
  requesting speech recognition data from a network-accessible data store, wherein the request is initiated prior to completion of the first speech recognition processing;
  
  receiving the speech recognition data from a network-accessible data store;
  
  performing second speech recognition processing using the speech recognition data and at least one of the audio data or the speech recognition results;
  
  removing the speech recognition data from a storage of the computing device;
  
  determining a predicted time at which the user is predicted to initiate a subsequent speech recognition session; and
  
  receiving the speech recognition data at substantially the predicted time.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 20. The non-transitory computer readable medium of claim 19, wherein the process further comprises:
    - selecting speech recognition data to request based at least in part on one of a date or time at which the audio data is received.
  - 21. The non-transitory computer readable medium of claim 19, wherein the process further comprises:
    - selecting speech recognition data to request based at least in part on a characteristic associated with the user.
  - 22. The non-transitory computer readable medium of claim 21, wherein the characteristic associated with the user comprises one of a gender, age, regional accent, identify of the user, or identity of a group with which the user is associated.
  - 23. The non-transitory computer readable medium of claim 19, wherein the speech recognition data comprises an acoustic model, a language model, language model statistics, Constrained Maximum-Likelihood Linear Regression (“
    - CMLLR”
      
      ) transforms, Vocal Tract Length Normalization (“
      
      VTLN”
      
      ) warping factors, Cepstral mean and variance data, an intent model, a named entity model, or a gazetteer.
  - 24. The non-transitory computer readable medium of claim 23, wherein the process further comprises:
    - requesting statistics for updating the speech recognition data, wherein the request for statistics is initiated prior to completion of the first speech recognition processing.
  - 25. The non-transitory computer readable medium of claim 24, wherein the process further comprises:
    - updating the speech recognition data based at least in part on the statistics and results of the second speech recognition processing.
  - 26. The non-transitory computer readable medium of claim 19, wherein receiving the speech recognition data at substantially the predicted time comprises retrieving a cached copy of the speech recognition data.
  - 27. The non-transitory computer readable medium of claim 19, wherein the process further comprises:
    - receiving the audio data from a client device operated by the user; and
      
      transmitting a response to the client device, the response based at least in part on the second speech recognition processing.
  - 28. The non-transitory computer readable medium of claim 19, wherein the process further comprises:
    - performing an action based at least in part on the second speech recognition processing.
  - 29. The non-transitory computer readable medium of claim 19, wherein receiving the speech recognition data at substantially the predicted time comprises receiving the speech recognition data from a second computing device separate from the computing device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Hoffmeister, Bjorn, Secker-Walker, Hugh Evan, O'Neill, Jeffrey Cornelius
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Le, Thuykhanh

Application Number

US13/712,891
Publication Number

US 20140163977A1
Time in Patent Office

1,070 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

Speech model retrieval in distributed speech recognition systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

46 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Speech model retrieval in distributed speech recognition systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

46 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links