Pre-wakeword speech processing

US 10,192,546 B1
Filed: 03/30/2015
Issued: 01/29/2019
Est. Priority Date: 03/30/2015
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

receiving audio;

storing, in non-transitory memory, audio data representing the audio;

determining a first location in the audio data that includes a first amount of non-speech audio data;

determining a wakeword at a second location in the audio data, the audio data including non-wakeword speech between the first location and the second location;

determining a third location in the audio data that includes a second amount of non-speech audio data, the third location being after the second location in the audio data; and

selecting, for speech processing, a portion of the audio data starting with the first location and ending with the third location, the portion of the audio data comprising at least the non-wakeword speech.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for capturing and processing portions of a spoken utterance command that may occur before a wakeword. The system buffers incoming audio and indicates locations in the audio where the utterance changes, for example when a long pause is detected. When the system detects a wakeword within a particular utterance, the system determines the most recent utterance change location prior to the wakeword and sends the audio from that location to the end of the command utterance to a server for further speech processing.

223 Citations

17 Claims

1. A computer-implemented method, comprising:
- receiving audio;
  
  storing, in non-transitory memory, audio data representing the audio;
  
  determining a first location in the audio data that includes a first amount of non-speech audio data;
  
  determining a wakeword at a second location in the audio data, the audio data including non-wakeword speech between the first location and the second location;
  
  determining a third location in the audio data that includes a second amount of non-speech audio data, the third location being after the second location in the audio data; and
  
  selecting, for speech processing, a portion of the audio data starting with the first location and ending with the third location, the portion of the audio data comprising at least the non-wakeword speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computer-implemented method of claim 1, further comprising:
    - sending, to at least one remote device, the portion of the audio data;
      
      receiving, from the at least one remote device, output data; and
      
      presenting output content corresponding to the output data.
  - 3. The computer-implemented method of claim 1, wherein:
    - the audio is received using a microphone; and
      
      the non-transitory memory is associated with the microphone.
  - 4. The computer-implemented method of claim 1, wherein the first amount is configured based at least in part on a language of the non-wakeword speech.
  - 5. The computer-implemented method of claim 1, wherein the first amount is configured based at least in part on an identity of a user that spoke the non-wakeword speech.
  - 6. The computer-implemented method of claim 1, further comprising:
    - determining a change in at least one of a tone, speed, pitch, source direction, frequency, volume, prosody, or energy of the non-wakeword speech,wherein the first location is associated with the change.
  - 7. The computer-implemented method of claim 1, further comprising comprising:
    - determining a confidence score associated with the first location.
  - 8. The computer-implemented method of claim 1, wherein the audio data includes second non-wakeword speech between the second location and the third location.
  - 9. The computer-implemented method of claim 8, wherein the portion of the audio data comprises the non-wakeword speech and the second non-wakeword speech.

10. A computing device, comprising:
- at least one processor; and
  
  at least one memory including instructions that, when executed by the at least one processor, cause the computing device to;
  
  receive audio;
  
  store, in non-transitory memory, audio data representing at least some of the audio;
  
  determine a first location in the audio data that includes a first amount of non-speech audio data;
  
  determine a wakeword at a second location in the audio data, the audio data including non-wakeword speech between the first location and the second location;
  
  determine a third location in the audio data that includes a second number amount of non-speech audio data, the third location being after the second location in the audio data; and
  
  determine, for speech processing, a portion of the audio data starting with the first location and ending with the third location, the portion of the audio data comprising at least the non-wakeword speech.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The computing device of claim 10, wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the computing device to:
    - send, to at least one remote device, the portion of the audio data;
      
      receive, from the at least one remote device, output data; and
      
      present output content corresponding to the output data.
  - 12. The computing device of claim 10, wherein:
    - the audio is received using a microphone; and
      
      the non-transitory memory is associated with the microphone.
  - 13. The computing device of claim 10, wherein the first amount is configured based at least in part on a language of the non-wakeword speech.
  - 14. The computing device of claim 10, wherein the first amount is configured based at least in part on an identity of a user that spoke the non-wakeword speech.
  - 15. The computing device of claim 10, wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the computing device to:
    - determine a change in at least one of a tone, speed, pitch, prosody, or energy of the non-wakeword speech, wherein the first location is associated with the change.
  - 16. The computing device of claim 10, wherein the audio data includes second non-wakeword speech between the second location and the third location.
  - 17. The computing device of claim 16, wherein the portion of the audio data comprises the non-wakeword speech and the second non-wakeword speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Piersol, Kurt Wesley, Beddingfield, Gabriel
Primary Examiner(s)
Leland, III, Edwin S

Application Number

US14/672,277
Time in Patent Office

1,401 Days
Field of Search

704254
US Class Current
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 17/22   Interactive procedures; Man...

G10L 25/87   Detection of discrete point...

Pre-wakeword speech processing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

223 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Pre-wakeword speech processing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

223 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links