PRE-WAKEWORD SPEECH PROCESSING

US 20190156818A1
Filed: 01/24/2019
Published: 05/23/2019
Est. Priority Date: 03/30/2015
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for processing a spoken command when the wakeword does not begin the command, the method comprising:

receiving audio comprising speech;

buffering audio data representing the speech;

determining that the audio data includes a number of consecutive audio frames with an energy level below a threshold;

determining that a new utterance has occurred in the speech using a tone quality of the speech and the number of consecutive audio frames;

determining a first location in the speech corresponding to the new utterance;

storing the first location;

detecting a wakeword in the speech, wherein the wakeword corresponds to a second location in the speech, the second location being after the first location;

sending a portion of audio data to a remote server for speech processing, where a beginning of the portion of audio data corresponds to speech at the first location;

determining an end of the new utterance at a third location in the speech, the third location being after the second location;

concluding the sending of the portion of audio data to the remote server so that an end of the portion of audio data corresponds to speech at the third location;

receiving command data from the remote server; and

executing the command data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for capturing and processing portions of a spoken utterance command that may occur before a wakeword. The system buffers incoming audio and indicates locations in the audio where the utterance changes, for example when a long pause is detected. When the system detects a wakeword within a particular utterance, the system determines the most recent utterance change location prior to the wakeword and sends the audio from that location to the end of the command utterance to a server for further speech processing.

Citations

20 Claims

1. A computer-implemented method for processing a spoken command when the wakeword does not begin the command, the method comprising:
- receiving audio comprising speech;
  
  buffering audio data representing the speech;
  
  determining that the audio data includes a number of consecutive audio frames with an energy level below a threshold;
  
  determining that a new utterance has occurred in the speech using a tone quality of the speech and the number of consecutive audio frames;
  
  determining a first location in the speech corresponding to the new utterance;
  
  storing the first location;
  
  detecting a wakeword in the speech, wherein the wakeword corresponds to a second location in the speech, the second location being after the first location;
  
  sending a portion of audio data to a remote server for speech processing, where a beginning of the portion of audio data corresponds to speech at the first location;
  
  determining an end of the new utterance at a third location in the speech, the third location being after the second location;
  
  concluding the sending of the portion of audio data to the remote server so that an end of the portion of audio data corresponds to speech at the third location;
  
  receiving command data from the remote server; and
  
  executing the command data.
- View Dependent Claims (2, 3)
- - 2. The computer-implemented method of claim 1, further comprising:
    - receiving the audio using a plurality of microphones of a microphone array;
      
      determining a direction of a source of the speech using the microphone array;
      
      determining a first microphone of the microphone array that is closest to the source;
      
      buffering the audio data received by the first microphone using a first buffer associated with the first microphone;
      
      determining the tone quality using the buffered audio data; and
      
      determining that the audio data includes the number of consecutive audio frames using the buffered audio data.
  - 3. The computer-implemented method of claim 1, further comprising:
    - determining an identity of an individual that is a source of the speech;
      
      determining a typical speech pause length based on a speech history associated with the identity;
      
      determining a number threshold using the typical speech pause length; and
      
      determining that the number of consecutive audio frames exceeds the number threshold.

4. A computer-implemented method comprising:
- receiving audio comprising speech;
  
  storing audio data representing the speech in a non-transitory memory;
  
  determining a first location in the audio data associated with a change in a characteristic of the speech;
  
  determining a wakeword at a second location in the audio data;
  
  determining a speech endpoint at a third location in the audio data;
  
  determining a first portion of audio data, wherein the first portion of audio data begins proximate to the first location and ends proximate to the third location; and
  
  selecting the first portion of audio data for speech processing.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
- - 5. The computer-implemented method of claim 4, further comprising:
    - sending the first portion of audio data from a first device to a second device;
      
      receiving command data associated with the wakeword; and
      
      executing a command based at least in part on the command data.
  - 6. The computer-implemented method of claim 4, further comprising:
    - receiving the audio using a first microphone; and
      
      storing the audio data representing the speech in the first non-transitory memory associated with the first microphone.
  - 7. The computer-implemented method of claim 4, further comprising determining a pause in the audio data, wherein the first location is associated with the pause.
  - 8. The computer-implemented method of claim 7, further comprising determining a length of the pause exceeds a threshold length.
  - 9. The computer-implemented method of claim 7, further comprising:
    - determining a language of the speech; and
      
      configuring the threshold length based on the language.
  - 10. The computer-implemented method of claim 7, further comprising:
    - determining an identity of a speaker detected in the audio; and
      
      configuring the threshold length based on the identity.
  - 11. The computer-implemented method of claim 4, further comprising determining a change in at least one of a tone, speed, pitch, source direction, frequency, volume, prosody, or energy of the speech, wherein the first location is associated with the change.
  - 12. The computer-implemented method of claim 4, further comprising determining a confidence score associated with the first location.

13. A computing system comprising:
- at least one processor;
  
  a memory including instructions operable to be executed by the at least one processor to cause the system to perform a set of actions comprising;
  
  receiving audio comprising speech;
  
  storing audio data representing the speech in a non-transitory memory;
  
  determining a first location in the audio data associated with a change in a characteristic of the speech;
  
  determining a wakeword at a second location in the audio data;
  
  determining a speech endpoint at a third location in the audio data;
  
  determining a first portion of audio data, wherein the first portion of audio data begins proximate to the first location and ends proximate to the third location; and
  
  selecting the first portion of audio data for speech processing.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The computing system of claim 13, the set of actions further comprising:
    - sending the first portion of audio data from a first device to a second device;
      
      receiving command data associated with the wakeword; and
      
      executing a command based at least in part on the command data.
  - 15. The computing system of claim 13, the set of actions further comprising:
    - receiving the audio using a first microphone; and
      
      storing the audio data representing the speech in the first non-transitory memory associated with the first microphone.
  - 16. The computing system of claim 13, the set of actions further comprising determining a pause in the audio data, wherein the first location is associated with the pause.
  - 17. The computing system of claim 16, the set of actions further comprising determining a length of the pause exceeds a threshold length.
  - 18. The computing system of claim 16, the set of actions further comprising:
    - determining a language of the speech; and
      
      configuring the threshold length based on the language.
  - 19. The computing system of claim 16, the set of actions further comprising:
    - determining an identity of a speaker detected in the audio; and
      
      configuring the threshold length based on the identity.
  - 20. The computing system of claim 13, the set of actions further comprising determining a change in at least one of a tone, speed, pitch, prosody, or energy of the speech, wherein the first location is associated with the change.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Piersol, Kurt Wesley, Beddingfield, Gabriel

Granted Patent

US 10,643,606 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 17/22   Interactive procedures; Man...

G10L 25/87   Detection of discrete point...

PRE-WAKEWORD SPEECH PROCESSING

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

PRE-WAKEWORD SPEECH PROCESSING

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links