Speech recognition power management

US 9,704,486 B2
Filed: 12/11/2012
Issued: 07/11/2017
Est. Priority Date: 12/11/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

an audio input module;

an audio detection module in communication with the audio input module;

a speech detection module in communication with the audio detection module;

a wakeword recognition module in communication with the speech detection module; and

a network interface module in communication with the wakeword recognition module,wherein;

the audio detection module is configured to;

receive audio input from the audio input module;

determine a volume of at least a portion of the audio input;

cause the audio input module to increase a sampling rate of the audio input based at least in part on the volume exceeding a threshold; and

cause activation of the speech detection module based at least in part on the volume exceeding the threshold;

the speech detection module is configured to determine a first score indicating a likelihood that the audio input comprises speech and cause activation of the wakeword recognition module based at least on part on the score; and

the wakeword recognition module is configured to;

determine a second score indicating a likelihood that the audio input comprises a wakeword; and

cause activation of a network interface module based on the second score by providing power to the network interface module; and

the network interface module is configured to transmit at least a portion of the obtained audio input to a remote computing device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Power consumption for a computing device may be managed by one or more keywords. For example, if an audio input obtained by the computing device includes a keyword, a network interface module and/or an application processing module of the computing device may be activated. The audio input may then be transmitted via the network interface module to a remote computing device, such as a speech recognition server. Alternately, the computing device may be provided with a speech recognition engine configured to process the audio input for on-device speech recognition.

Citations

34 Claims

1. A system comprising:
- an audio input module;
  
  an audio detection module in communication with the audio input module;
  
  a speech detection module in communication with the audio detection module;
  
  a wakeword recognition module in communication with the speech detection module; and
  
  a network interface module in communication with the wakeword recognition module,wherein;
  
  the audio detection module is configured to;
  
  receive audio input from the audio input module;
  
  determine a volume of at least a portion of the audio input;
  
  cause the audio input module to increase a sampling rate of the audio input based at least in part on the volume exceeding a threshold; and
  
  cause activation of the speech detection module based at least in part on the volume exceeding the threshold;
  
  the speech detection module is configured to determine a first score indicating a likelihood that the audio input comprises speech and cause activation of the wakeword recognition module based at least on part on the score; and
  
  the wakeword recognition module is configured to;
  
  determine a second score indicating a likelihood that the audio input comprises a wakeword; and
  
  cause activation of a network interface module based on the second score by providing power to the network interface module; and
  
  the network interface module is configured to transmit at least a portion of the obtained audio input to a remote computing device.
- View Dependent Claims (2, 3, 4)
- - 2. The system of claim 1, wherein the audio input device comprises a microphone, the audio detection module comprises a first digital signal processor, the speech detection module comprises a second digital signal processor, and the wakeword recognition module comprises a microprocessor.
  - 3. The system of claim 1, wherein:
    - the speech detection module is further configured to determine the first score using at least one of a hidden Markov model, a Gaussian mixture model, energies in a plurality of spectral bands, or signal to noise ratios in a plurality of spectral bands; and
      
      the wakeword recognition module is further configured to determine the second score using at least one of an application processing module, a hidden Markov model, and a Gaussian mixture model.
  - 4. The system of claim 1, wherein:
    - the wakeword recognition module is further configured to cause deactivation of the audio detection module based at least in part on the first score; and
      
      the wakeword recognition module is further configured to cause deactivation of the speech detection module based at least in part on the second score.

5. A computer-implemented method of operating a first computing device, the method comprising:
- receiving an audio input;
  
  determining one or more values from the audio input, wherein the one or more values comprise at least one of;
  
  a first value indicating an energy level of the audio input;
  
  ora second value indicating a likelihood that the audio input comprises speech;
  
  increasing a sampling rate of the audio input, from a first lower sampling rate to a second higher sampling rate, based at least in part on the one or more values;
  
  activating a first module of the first computing device based at least in part on the one or more values;
  
  performing an operation, by the first module, wherein the operation comprises at least one of;
  
  determining that the audio input comprises a wakeword and causing activation of a network interface module in response to determining that the audio input comprises a wakeword, wherein causing activation of the network interface module comprises providing power to the network interface module;
  
  performing speech recognition on at least a portion of the audio input to obtain speech recognition results;
  
  orcausing transmission of at least a portion of the audio input to a second computing device.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 6. The computer-implemented method of claim 5, wherein:
    - the first module comprises a processor that is switchable between a low-power state and a high-power state; and
      
      the processor only performs the operation when it is in the high-power state.
  - 7. The computer-implemented method of claim 6, wherein activating the first module comprises switching the processor from the low-power state to the high-power state.
  - 8. The computer-implemented method of claim 6 further comprising deactivating the first module, wherein deactivating the first module comprises switching the processor from the high-power state to the low-power state.
  - 9. The computer-implemented method of claim 6, wherein the processor comprises at least one of a digital signal processor or a microprocessor.
  - 10. The computer-implemented method of claim 5, wherein the first module comprises a software module configured to be executed by a microprocessor.
  - 11. The computer-implemented method of claim 10, wherein activating the first module comprises causing the microprocessor to execute the software module.
  - 12. The computer-implemented method of claim 5, wherein the operation further comprising receiving speech recognition results from the second computing device.
  - 13. The computer-implemented method of claim 12, wherein the speech recognition results comprise at least one of a transcription of at least a portion of the audio input and a response to an intelligent agent query included in at least a portion of the audio input.
  - 14. The computer-implemented method of claim 12 further comprising:
    - activating a second module of the first computing device based at least in part on the one or more values, wherein the second module is configured to implement a speech recognition application; and
      
      processing the speech recognition results with the speech recognition application.
  - 15. The computer-implemented method of claim 5, wherein providing power to the network interface module causes the network interface module to transition from a deactivated state to an activated state.
  - 16. The computer-implemented method of claim 15, wherein communications sent via the network interface module are prevented while the network interface module is in the deactivated state.
  - 17. The computer-implemented method of claim 15, wherein communications sent via the network interface module are enabled while the network interface module is in the activated state.
  - 18. The computer-implemented method of claim 5, wherein providing power to the network interface module comprises providing power to a processor of the first computing device.
  - 19. The computer-implemented method of claim 5, further comprising determining that the energy level of the audio input satisfies a threshold, wherein the increasing the sampling rate is performed in response to determining that the energy level of the audio input satisfies the threshold.

20. A device comprising:
- a first processor configured to;
  
  determine one or more values, wherein the one or more values comprise at least one of a first value indicating an energy level of an audio input or a second value indicating a likelihood that the audio input comprises speech; and
  
  cause an increase in a sampling rate of the audio input, from a first lower sampling rate to a second higher sampling rate, based at least in part on the one or more values;
  
  cause activation of a second processor based at least in part on the one or more values;
  
  the second processor configured to perform an operation, wherein the operation comprises at least one of;
  
  determining that the audio input comprises a wakeword and causing activation of a network interface module in response to determining that the audio input comprises a wakeword, wherein causing activation of the network interface module comprises providing power to the network interface module;
  
  performing speech recognition on at least a portion of the audio input to obtain speech recognition results;
  
  orcausing transmission of at least a portion of the audio input to a second device.
- View Dependent Claims (21, 22, 23, 24, 25)
- - 21. The device of claim 20, wherein the first processor comprises at least one of a digital signal processor or a microprocessor.
  - 22. The device of claim 20, wherein the second processor comprises at least one of a digital signal processor or a microprocessor.
  - 23. The device of claim 20 further comprising a memory buffer module configured to store the audio input.
  - 24. The device of claim 23, wherein the memory buffer module configured to store the audio input comprises a ring buffer.
  - 25. The device of claim 20 further comprising an audio input module in communication with the first processor, wherein the audio input module is configured to obtain the audio input.

26. A system comprising:
- an audio input module configured to obtain an audio input;
  
  a first module in communication with the audio input module;
  
  a second module in communication with the first module; and
  
  a network interface module in communication with the first module;
  
  wherein the first module is configured to;
  
  determine one or more values based at least in part on the audio input, wherein the one or more values comprises at least one of;
  
  a first value indicating an energy level of the audio input;
  
  ora second value indicating a likelihood that the audio input comprises data representing speech;
  
  cause the audio input module to increase a sampling rate of the audio input, from a first lower sampling rate to a second higher sampling rate, based at least in part on the one or more values;
  
  cause activation of the network interface module based on the one or more values by providing power to the network interface module; and
  
  cause activation of the second module based at least in part on the one or more values; and
  
  wherein the second module is configured to;
  
  determine that the audio input likely comprises data representing a wakeword; and
  
  cause speech recognition to be performed on at least a portion of the audio input.
- View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34)
- - 27. The system of claim 26, wherein the one or more values comprise a volume of the audio input.
  - 28. The system of claim 27, wherein the second module is only caused to be activated if the volume of the audio input satisfies a volume threshold for at least a threshold duration.
  - 29. The system of claim 27, wherein the first module is further configured to determine that the volume of the audio input satisfies a threshold, and wherein the first module being configured to cause the audio input module to increase the sampling rate comprises the first module being configured to cause the audio input module to increase the sampling rate in response to determining that the volume of the audio input satisfies the threshold.
  - 30. The system of claim 26, wherein the one or more values comprise a likelihood that the audio input comprises speech.
  - 31. The system of claim 26, wherein the one or more values comprise a score indicating a likelihood that the audio input comprises a wakeword.
  - 32. The system of claim 31, wherein the one or more values further comprise a score indicating a likelihood that the wakeword was spoken by a user associated with the wakeword.
  - 33. The system of claim 26, wherein the second module is configured to cause speech recognition to be performed on at least a portion of the audio input by generating speech recognition results for at least a portion of the audio input.
  - 34. The system of claim 26, wherein the second module is configured to cause speech recognition to be performed on at least a portion of the audio input by:
    - causing transmission of the audio input to a remote computing device; and
      
      receiving speech recognition results for at least a portion of the audio input from the remote computing device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Basye, Kenneth John, David, Tony, Kneser, Reinhard, Adams, Jeffrey Penrod, Salvador, Stan Weidner, Krishnamoorthy, Mahesh, Secker-Walker, Hugh Evan
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Le, Thuykhanh

Application Number

US13/711,510
Publication Number

US 20140163978A1
Time in Patent Office

1,673 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/28   Constructional details of s...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 2015/088   Word spotting

G10L 25/78   Detection of presence or ab...

Speech recognition power management

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition power management

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links