Speech recognition power management

US 10,325,598 B2
Filed: 07/10/2017
Issued: 06/18/2019
Est. Priority Date: 12/11/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a network interface component;

an audio input component configured to receive an audio input; and

one or more processors configured to;

determine that an energy level of the audio input satisfies a threshold;

determine, in response to determining that the energy level satisfies the threshold, that the audio input likely comprises data representing an utterance;

determine, in response to determining that the audio input likely comprises data representing the utterance, that the audio input likely comprises data representing a wakeword indicative of device-directed speech; and

cause transmission of the audio input by the network interface component in response to determining that the audio input likely comprises data representing the wakeword;

wherein the network interface component is configured to;

transmit the audio input to a remote computing system;

receive speech recognition results from the remote computing system;

receive confirmation data from the remote computing system, wherein the confirmation data indicates that the audio input likely comprises data representing the wakeword;

transmit a subsequent audio input to the remote computing system based at least partly on receiving the confirmation data; and

receive subsequent speech recognition results from the remote computing system.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Power consumption for a computing device may be managed by one or more keywords. For example, if an audio input obtained by the computing device includes a keyword, a network interface module and/or an application processing module of the computing device may be activated. The audio input may then be transmitted via the network interface module to a remote computing device, such as a speech recognition server. Alternately, the computing device may be provided with a speech recognition engine configured to process the audio input for on-device speech recognition.

Citations

20 Claims

1. A system comprising:
- a network interface component;
  
  an audio input component configured to receive an audio input; and
  
  one or more processors configured to;
  
  determine that an energy level of the audio input satisfies a threshold;
  
  determine, in response to determining that the energy level satisfies the threshold, that the audio input likely comprises data representing an utterance;
  
  determine, in response to determining that the audio input likely comprises data representing the utterance, that the audio input likely comprises data representing a wakeword indicative of device-directed speech; and
  
  cause transmission of the audio input by the network interface component in response to determining that the audio input likely comprises data representing the wakeword;
  
  wherein the network interface component is configured to;
  
  transmit the audio input to a remote computing system;
  
  receive speech recognition results from the remote computing system;
  
  receive confirmation data from the remote computing system, wherein the confirmation data indicates that the audio input likely comprises data representing the wakeword;
  
  transmit a subsequent audio input to the remote computing system based at least partly on receiving the confirmation data; and
  
  receive subsequent speech recognition results from the remote computing system.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The system of claim 1, wherein the one or more processors are further configured to determine the energy level of the audio input using at least one of:
    - a volume of the audio input, an intensity of the audio input, or an amplitude of the audio input.
  - 3. The system of claim 1, wherein the one or more processors comprise a first digital signal processor and a second digital signal processor, wherein the first digital signal processor is configured to activate the second digital signal processor in response to determining that the energy level satisfies the threshold, and wherein the second digital signal processor determines that the audio input likely comprises data representing the utterance.
  - 4. The system of claim 1, wherein the one or more processors comprise a digital signal processor and a microprocessor, wherein the digital signal processor is configured to activate the microprocessor in response to determining that the audio input likely comprises data representing the utterance, and wherein the microprocessor determines that the audio input likely comprises data representing the wakeword.
  - 5. The system of claim 1, wherein the one or more processors are further configured to cause presentation of audio output using the speech recognition results, wherein the speech recognition results comprise audio data representing a response to the utterance.
  - 6. The system of claim 1, wherein the one or more processors are further configured to determine a response to the utterance using the speech recognition results, wherein the speech recognition results comprise a transcription of the utterance.

7. A computer-implemented method comprising:
- under control of a computing system configured to execute specific computer-executable instructions,receiving an audio input;
  
  determining that an energy level of the audio input satisfies a threshold;
  
  in response to determining that the energy level satisfies the threshold, determining that the audio input likely comprises data representing an utterance;
  
  in response to determining that audio input likely comprises data representing the utterance, determining that the audio input likely comprises data representing a wakeword indicative of device-directed speech;
  
  in response to determining that the audio input likely comprises data representing the wakeword, transmitting the audio input to a remote computing system;
  
  receiving speech recognition results from the remote computing system;
  
  receiving confirmation data from the remote computing system, wherein the confirmation data indicates that the audio input likely comprises data representing the wakeword;
  
  transmitting a subsequent audio input to the remote computing system based at least partly on receiving the confirmation data; and
  
  receiving subsequent speech recognition results from the remote computing system.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
- - 8. The computer-implemented method of claim 7, further comprising determining the energy level of the audio input using at least one of:
    - a volume of the audio input, an intensity of the audio input, or an amplitude of the audio input.
  - 9. The computer-implemented method of claim 7, wherein determining that the audio input likely comprises data representing a wakeword comprises performing speech recognition using the audio input and a wakeword model.
  - 10. The computer-implemented method of claim 7, further comprising activating a digital signal processor in response to determining that the energy level satisfies the threshold, wherein the determining that the audio input likely comprises data representing the utterance is performed using the digital signal processor.
  - 11. The computer-implemented method of claim 7, further comprising activating a microprocessor in response to determining that the audio input likely comprises data representing the utterance, wherein the determining that the audio input likely comprises data representing the wakeword is performed using the microprocessor.
  - 12. The computer-implemented method of claim 7, further comprising presenting audio output using the speech recognition results, wherein the speech recognition results comprise audio data representing a response to the utterance.
  - 13. The computer-implemented method of claim 7, further comprising determining a response to the utterance using the speech recognition results, wherein the speech recognition results comprise a transcription of the utterance.
  - 14. The computer-implemented method of claim 7, wherein the receiving the audio input comprises:
    - receiving a first portion of the audio input and storing the first portion of the audio input in a buffer; and
      
      receiving a second portion of the audio input and storing the second portion of the audio input in the buffer.
  - 15. The computer-implemented method of claim 14, further comprising accessing the first portion of the audio input from the buffer,wherein the determining that the audio input likely comprises data representing the wakeword comprises determining that the first portion of the audio input likely comprises data representing the wakeword;
    - andwherein the transmitting the audio input comprises transmitting the first portion of the audio input and the second portion of the audio input.

16. A system comprising:
- an input component configured to receive audio input; and
  
  one or more processors configured to at least;
  
  determine a value representing at least one of an energy level of the audio input or a first likelihood that the audio input comprises data representing an utterance;
  
  determine, based at least partly on the value, a second likelihood that the utterance comprises data representing a wakeword;
  
  determine, based at least partly on the second likelihood, to cause transmission of the audio input to a computing system;
  
  cause transmission of the audio input to the computing system;
  
  receive speech recognition results generated by the computing system;
  
  receive confirmation data from the computing system, wherein the confirmation data indicates that the audio input likely comprises data representing the wakeword;
  
  transmit a subsequent audio input to the computing system based at least partly on receiving the confirmation data;
  
  receive subsequent speech recognition results generated by the computing system.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16, wherein the one or more processors are configured to determine the first likelihood using a speech model and a value representing at least one of:
    - a spectral slope between two frames of the audio input, an energy level of the audio input within a spectral band, or a signal-to-noise ratio of the audio input within a spectral band.
  - 18. The system of claim 16, wherein the one or more processors are configured to determine the second likelihood using the audio input and a wakeword model.
  - 19. The system of claim 16, wherein the speech recognition results comprise at least one of:
    - a transcription of at least a portion of the utterance, a response to the utterance, or an executable command.
  - 20. The system of claim 16, wherein the one or more processors comprises a digital signal processor and a microprocessor, wherein the digital signal processor is configured to determine the value, and wherein the microprocessor is configured to determine the second likelihood.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Basye, Kenneth John, Secker-Walker, Hugh Evan, David, Tony, Kneser, Reinhard, Adams, Jeffrey Penrod, Salvador, Stan Weidner, Krishnamoorthy, Mahesh
Primary Examiner(s)
Le, Thuykhanh

Application Number

US15/645,918
Publication Number

US 20180096689A1
Time in Patent Office

708 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/28   Constructional details of s...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 2015/088   Word spotting

G10L 25/78   Detection of presence or ab...

Speech recognition power management

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition power management

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links