SPEECH MODEL RETRIEVAL IN DISTRIBUTED SPEECH RECOGNITION SYSTEMS

US 20140163977A1
Filed: 12/12/2012
Published: 06/12/2014
Est. Priority Date: 12/12/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a computer-readable memory storing executable instructions; and

one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to;

receive, from a client device, audio data comprising a user utterance;

determining that an additional speech recognition model is not available;

perform first speech recognition processing on the audio data using a base speech recognition model to produce first speech recognition results;

request the additional speech recognition model from a network-accessible data store, wherein the request is initiated prior to completion of the first speech recognition processing;

receive the additional speech recognition model from the network-accessible data store;

perform second speech recognition processing using the additional speech recognition model and using at least one of the audio data or the speech recognition results; and

transmit a response to the client device based at least in part on the second speech recognition processing.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.

237 Citations

29 Claims

1. A system comprising:
- a computer-readable memory storing executable instructions; and
  
  one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to;
  
  receive, from a client device, audio data comprising a user utterance;
  
  determining that an additional speech recognition model is not available;
  
  perform first speech recognition processing on the audio data using a base speech recognition model to produce first speech recognition results;
  
  request the additional speech recognition model from a network-accessible data store, wherein the request is initiated prior to completion of the first speech recognition processing;
  
  receive the additional speech recognition model from the network-accessible data store;
  
  perform second speech recognition processing using the additional speech recognition model and using at least one of the audio data or the speech recognition results; and
  
  transmit a response to the client device based at least in part on the second speech recognition processing.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system of claim 1, wherein the base speech recognition model comprises at least one of a general acoustic model, a gender-specific acoustic model, or a general language model, and wherein the additional speech recognition model is selected based at least in part on a characteristic of a user associated with the user utterance.
  - 3. The system of claim 1, wherein the one or more processors are further programmed by the executable instructions to:
    - receive, from the client device, second audio data comprising a second user utterance;
      
      determine that the additional speech recognition model is available; and
      
      perform speech recognition processing on the second audio data using the additional speech recognition model.
  - 4. The system of claim 1, wherein the one or more processors are further programmed by the executable instructions to use multi-threaded processing to retrieve the additional speech recognition model in parallel with performance of the first speech recognition processing.
  - 5. The system of claim 1, wherein the one or more processors are further programmed by the executable instructions to cache the additional speech recognition model.

6. A computer-implemented method comprising:
- under control of one or more computing devices configured with specific computer executable instructions,performing first speech processing on audio data regarding an utterance of a user to produce speech processing results;
  
  requesting speech processing data from a network-accessible data store,wherein the request is initiated prior to completion of the first speech processing;
  
  receiving the speech processing data from the network-accessible data store; and
  
  performing second speech processing using the speech processing data and at least one of the audio data or the speech processing results.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 7. The computer-implemented method of claim 6, further comprising:
    - selecting speech processing data to request based at least in part on a characteristic of the user.
  - 8. The computer-implemented method of claim 7, wherein the characteristic of the user comprises a gender, age, regional accent, or identify of the user.
  - 9. The computer-implemented method of claim 6, wherein the speech processing data comprises at least one of an acoustic model, a language model, language model statistics, Constrained Maximum-Likelihood Linear Regression (“
    - CMLLR”
      
      ) transforms, Vocal Tract Length Normalization (“
      
      VTLN”
      
      ) warping factors, Cepstral mean and variance data, an intent model, a named entity model, or a gazetteer.
  - 10. The computer-implemented method of claim 9, further comprising:
    - requesting statistics for updating the speech processing data, wherein the request for statistics is initiated prior to completion of the first speech processing.
  - 11. The computer-implemented method of claim 10, further comprising updating the speech processing data based at least in part on the statistics and results of the second speech processing.
  - 12. The computer-implemented method of claim 6, further comprising caching the speech processing data.
  - 13. The computer-implemented method of claim 12, further comprising:
    - receiving second audio data regarding a second utterance of the user;
      
      retrieving the speech processing data from a cache; and
      
      performing speech processing on the second audio data using the speech processing data.
  - 14. The computer-implemented method of claim 12, wherein caching the speech recognition data comprises storing the speech recognition data at a cache server separate from the network-accessible data store.
  - 15. The computer-implemented method of claim 12, wherein retrieving the speech recognition data comprises retrieving a cached copy of the speech recognition data.
  - 16. The computer-implemented method of claim 12, wherein caching the speech recognition data comprises:
    - determining a time that the user is likely to initiate a speech recognition session; and
      
      caching the speech recognition data at substantially the determined time.
  - 17. The computer-implemented method of claim 6, further comprising:
    - receiving the audio data from a client device operated by the user; and
      
      transmitting a response to the client device, the response based at least in part on the second speech recognition processing.
  - 18. The computer-implemented method of claim 6, further comprising:
    - performing an action based at least in part on the second speech recognition processing.

19. A non-transitory computer readable medium comprising executable code that, when executed by a processor, causes a computing device to perform a process comprising:
- performing first speech recognition processing on audio data regarding an utterance of a user to produce speech recognition results;
  
  requesting speech recognition data from a network-accessible data store, wherein the request is initiated prior to completion of the first speech recognition processing;
  
  receiving the speech recognition data from a network-accessible data store; and
  
  performing second speech recognition processing using the speech recognition data and at least one of the audio data or the speech recognition results.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 20. The non-transitory computer readable medium of claim 19, wherein the process further comprises:
    - selecting speech recognition data to request based at least in part on one of a date or time at which the audio data is received.
  - 21. The non-transitory computer readable medium of claim 19, wherein the process further comprises:
    - selecting speech recognition data to request based at least in part on a characteristic associated with the user.
  - 22. The non-transitory computer readable medium of claim 21, wherein the characteristic associated with the user comprises one of a gender, age, regional accent, identify of the user, or identity of a group with which the user is associated.
  - 23. The non-transitory computer readable medium of claim 19, wherein the speech recognition data comprises an acoustic model, a language model, language model statistics, Constrained Maximum-Likelihood Linear Regression (“
    - CMLLR”
      
      ) transforms, Vocal Tract Length Normalization (“
      
      VTLN”
      
      ) warping factors, Cepstral mean and variance data, an intent model, a named entity model, or a gazetteer.
  - 24. The non-transitory computer readable medium of claim 23, wherein the process further comprises:
    - requesting statistics for updating the speech recognition data, wherein the request for statistics is initiated prior to completion of the first speech recognition processing.
  - 25. The non-transitory computer readable medium of claim 24, wherein the process further comprises:
    - updating the speech recognition data based at least in part on the statistics and results of the second speech recognition processing.
  - 26. The non-transitory computer readable medium of claim 19, wherein the process further comprises:
    - caching the speech recognition data.
  - 27. The non-transitory computer readable medium of claim 19, wherein retrieving the speech recognition data comprises retrieving a cached copy of the speech recognition data.
  - 28. The non-transitory computer readable medium of claim 19, wherein the process further comprises:
    - receiving the audio data from a client device operated by the user; and
      
      transmitting a response to the client device, the response based at least in part on the second speech recognition processing.
  - 29. The non-transitory computer readable medium of claim 19, wherein the process further comprises:
    - performing an action based at least in part on the second speech recognition processing.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Hoffmeister, Bjorn, Secker-Walker, Hugh Evan, O'Neill, Jeffrey Cornelius

Granted Patent

US 9,190,057 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/232
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

SPEECH MODEL RETRIEVAL IN DISTRIBUTED SPEECH RECOGNITION SYSTEMS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

237 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH MODEL RETRIEVAL IN DISTRIBUTED SPEECH RECOGNITION SYSTEMS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

237 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links