Speech model retrieval in distributed speech recognition systems

US 10,152,973 B2
Filed: 11/16/2015
Issued: 12/11/2018
Est. Priority Date: 12/12/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a computer-readable memory storing executable instructions; and

one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least;

receive audio data from a client computing device separate from the system, wherein the audio data comprises data regarding an utterance of a user;

produce first speech processing results using a base speech processing model and the audio data, wherein the base speech processing model is stored at the system;

obtain a specialized speech processing model from a network-accessible data store separate from the system and separate from the client computing device, wherein the obtaining is initiated by the system subsequent to receipt of the audio data and prior to completion of producing the first speech processing results;

determine, based at least partly on a time at which the specialized speech processing model is obtained, that the system is to produce second speech processing results using the specialized speech processing model subsequent to initiating production of the first speech processing results; and

produce the second speech processing results using the specialized speech processing model and at least one of the audio data or the first speech processing results.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.

80 Citations

View as Search Results

19 Claims

1. A system comprising:
- a computer-readable memory storing executable instructions; and
  
  one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least;
  
  receive audio data from a client computing device separate from the system, wherein the audio data comprises data regarding an utterance of a user;
  
  produce first speech processing results using a base speech processing model and the audio data, wherein the base speech processing model is stored at the system;
  
  obtain a specialized speech processing model from a network-accessible data store separate from the system and separate from the client computing device, wherein the obtaining is initiated by the system subsequent to receipt of the audio data and prior to completion of producing the first speech processing results;
  
  determine, based at least partly on a time at which the specialized speech processing model is obtained, that the system is to produce second speech processing results using the specialized speech processing model subsequent to initiating production of the first speech processing results; and
  
  produce the second speech processing results using the specialized speech processing model and at least one of the audio data or the first speech processing results.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The system of claim 1, wherein the executable instructions further program the one or more processors to at least:
    - determine that the user is associated with a characteristic, wherein the base speech processing model comprises a speech processing model that is not associated with the characteristic;
      
      determine that a specialized speech processing model is associated with the characteristic; and
      
      determine to obtain the specialized speech processing model from the network-accessible data store.
  - 3. The system of claim 2, wherein the specialized speech processing model being associated with the characteristic comprises the specialized speech processing model being specialized for processing utterances of users who are associated with the characteristic.
  - 4. The system of claim 1, wherein the executable instructions further program the one or more processors to determine, using a first thread for managing retrieval of speech processing models, that the specialized speech processing model is not stored locally, wherein the first thread is different than a second thread for performing speech processing, and wherein obtaining the specialized speech processing model is initiated by the first thread.
  - 5. The system of claim 1, wherein the executable instructions to produce second speech processing results comprise executable instructions to re-score at least a subset of the first speech processing results, wherein the second speech processing results are produced from the subset of the first speech processing results.
  - 6. The system of claim 1, wherein the executable instructions to determine, based at least partly on the time at which the specialized speech processing model is obtained, that the system is to produce the second speech processing results using the specialized speech processing model comprise executable instructions to determine, based on the specialized speech processing model being obtained prior to sending speech processing results to the client computing device, that the system is to produce the second speech processing results.

7. A computer-implemented method comprising:
- under control of a server system comprising one or more computing devices configured with specific computer executable instructions,receiving audio data from a client device separate from the server system, wherein the audio data comprises data regarding an utterance of a user;
  
  producing first speech processing results using a base speech processing model and the audio data, wherein the base speech processing model is stored at the server system;
  
  obtaining a specialized speech processing model from a network-accessible data store separate from the server system and separate from the client device, wherein the obtaining is initiated based at least partly on an attribute of the specialized speech processing model and prior to completion of producing the first speech processing results;
  
  determining, based at least partly on a time at which the specialized speech processing model is obtained, that the server system is to produce second speech processing results using the specialized speech processing model subsequent to initiating production of the first speech processing results; and
  
  producing the second speech processing results using the specialized speech processing model and at least one of the audio data or the first speech processing results.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 8. The computer-implemented method of claim 7, further comprising:
    - determining that the user is associated with a characteristic, wherein the base speech processing model comprises a speech processing model that is not associated with the characteristic;
      
      determining that the attribute of specialized speech processing model is associated with the characteristic; and
      
      determining to obtain the specialized speech processing model from the network-accessible data store.
  - 9. The computer-implemented method of claim 8, wherein the characteristic comprises at least one of:
    - a gender, an age, an accent, a vocabulary, or a user identity.
  - 10. The computer-implemented method of claim 8, wherein the attribute of the specialized speech processing model being associated with the characteristic comprises the specialized speech processing model being specialized for processing utterances of users who are associated with the characteristic.
  - 11. The computer-implemented method of claim 7, wherein the specialized speech processing model comprises at least one of an acoustic model, a language model, an intent model, a named entity model, a Constrained Maximum-Likelihood Linear Regression (“
    - CMLLR”
      
      ) transform, a Vocal Tract Length Normalization (“
      
      VTLN”
      
      ) warping factor, or Cepstral mean and variance data.
  - 12. The computer-implemented method of claim 7, further comprising determining, using a first thread for managing retrieval of speech processing models, that the specialized speech processing model is not locally stored on the one or more computing devices, wherein the first thread is different than a second thread for performing speech processing, and wherein obtaining the specialized speech processing model is initiated by the first thread.
  - 13. The computer-implemented method of claim 7, wherein producing second speech processing results comprises re-scoring a subset of the first speech processing results, wherein the second speech processing results are produced using the subset.
  - 14. The computer-implemented method of claim 7, further comprising determining that the utterance is expected to be associated with a subject, wherein the base speech processing model is not specialized for processing utterances expected to be associated with the subject, and wherein the attribute of the specialized speech processing model comprises the specialized speech processing model being specialized for processing utterances expected to be associated with the subject.
  - 15. The computer-implemented method of claim 7, wherein the determining, based at least partly on the time at which the specialized speech processing model is obtained, that the server system is to produce the second speech processing results using the specialized speech processing model comprises determining that producing the second speech processing results using the specialized speech processing model will cause a delay of less than a threshold amount of time to send speech processing results to the client device.
  - 16. The computer-implemented method of claim 7, further comprising:
    - obtaining a second specialized speech processing model from the network-accessible data store, wherein the obtaining the second specialized speech processing model is based at least partly on an attribute of the second specialized speech processing model and a corresponding attribute of the user, and wherein the obtaining the second specialized speech processing model is initiated prior to completion of producing the first speech processing results;
      
      wherein the determining, based at least partly on the time at which the specialized speech processing model is obtained, that the server system is to produce the second speech processing results using the specialized speech processing model comprises determining that the specialized speech processing model has been obtained prior to the second specialized speech processing model.
  - 17. The computer-implemented method of claim 7, wherein the determining that the server system is to produce the second speech processing results using the specialized speech processing model is performed prior to completion of producing the first speech processing results and is triggered by completion of obtaining the specialized speech processing model.
  - 18. The computer-implemented method of claim 7, wherein the determining that the server system is to produce the second speech processing results using the specialized speech processing model is performed subsequent to completion of producing the first speech processing results and is based at least partly on an expected time at which obtaining the specialized speech processing model is to be completed.

19. A non-transitory computer readable medium comprising executable code that, when executed by one or more processors of a server system, causes the server system to perform a process comprising:
- receiving audio data from a client computing device separate from the server system, wherein the audio data comprises data regarding an utterance of a user;
  
  producing first speech processing results using a base speech processing model and the audio data, wherein the base speech processing model is stored at the server system;
  
  obtaining a specialized speech processing model from a network-accessible data store separate from the server system and separate from the client computing device, wherein the obtaining is initiated subsequent to receipt of the audio data and prior to completion of producing the first speech processing results;
  
  determining, based at least partly on a time at which the specialized speech processing model is obtained, that the server system is to produce second speech processing results using the specialized speech processing model subsequent to initiating production of the first speech processing results; and
  
  producing the second speech processing results using the specialized speech processing model and at least one of the audio data or the first speech processing results.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Hoffmeister, Bjorn, Secker-Walker, Hugh Evan, O'Neill, Jeffrey Cornelius
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Le, Thuykhanh

Application Number

US14/942,551
Publication Number

US 20160071519A1
Time in Patent Office

1,121 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

Speech model retrieval in distributed speech recognition systems

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

80 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Speech model retrieval in distributed speech recognition systems

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

80 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links